All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz

kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.

The documentation in the first patch in this series explains the
protocol and the API details.

Full details on what has changed from the v2 submission are at the
bottom of this email.

Reasons why this should be done in the kernel, instead of userspace as
it is currently done today include the following:

- performance: fewer process context switches, fewer copies, fewer
  syscalls, larger memory chunks via memfd.  This is really important
  for a whole class of userspace programs that are ported from other
  operating systems that are run on tiny ARM systems that rely on
  hundreds of thousands of messages passed at boot time, and at
  "critical" times in their user interaction loops.
- security: the peers which communicate do not have to trust each other,
  as the only trustworthy compoenent in the game is the kernel which
  adds metadata and ensures that all data passed as payload is either
  copied or sealed, so that the receiver can parse the data without
  having to protect against changing memory while parsing buffers.  Also,
  all the data transfer is controlled by the kernel, so that LSMs can
  track and control what is going on, without involving userspace.
  Because of the LSM issue, security people are much happier with this
  model than the current scheme of having to hook into dbus to mediate
  things.
- more metadata can be attached to messages than in userspace
- semantics for apps with heavy data payloads (media apps, for instance)
  with optinal priority message dequeuing, and global message ordering.
  Some "crazy" people are playing with using kdbus for audio data in the
  system.  I'm not saying that this is the best model for this, but
  until now, there wasn't any other way to do this without having to
  create custom "busses", one for each application library.
- being in the kernle closes a lot of races which can't be fixed with
  the current userspace solutions.  For example, with kdbus, there is a
  way a client can disconnect from a bus, but do so only if no further
  messages present in its queue, which is crucial for implementing
  race-free "exit-on-idle" services
- eavesdropping on the kernel level, so privileged users can hook into
  the message stream without hacking support for that into their
  userspace processes
- a number of smaller benefits: for example kdbus learned a way to peek
  full messages without dequeing them, which is really useful for
  logging metadata when handling bus-activation requests.

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details.  For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other.  And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down.  On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system.  Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds.  kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Regarding binder: binder and kdbus follow very different design
concepts.  Binder implies the use of thread-pools to dispatch incoming
method calls.  This is a very efficient scheme, and completely natural
in programming languages like Java.  On most Linux programs, however,
there's a much stronger focus on central poll() loops that dispatch all
sources a program cares about.  kdbus is much more usable in such
environments, as it doesn't enforce a threading model, and it is happy
with serialized dispatching.  In fact, this major difference had an
effect on much of the design decisions: binder does not guarantee global
message ordering due to the parallel dispatching in the thread-pools,
but  kdbus does.  Moreover, there's also a difference in the way message
handling.  In kdbus, every message is basically taken and dispatched as
one blob, while in binder, continious connections to other peers are
created, which are then used to send messages on.  Hence, the models are
quite different, and they serve different needs.  I believe that the
D-Bus/kdbus model is more compatible and friendly with how Linux
programs are usually implemented.

This can also be found in a git tree, the kdbus branch of char-misc.git at:
        https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/

Changes since v2:

  * Add FS_USERNS_MOUNT to the file system flags, so users can mount
    their own kdbusfs instances without being root in the parent
    user-ns. Spotted by Andy Lutomirski.

  * Rewrite major parts of the metadata implementation to allow for
    per-recipient namespace translations. For this, namespaces are
    now not pinned by domains anymore. Instead, metadata is recorded
    in kernel scope, and exported into the currently active namespaces
    at the time of message installing.

  * Split PID and TID from KDBUS_ITEM_CREDS into KDBUS_ITEM_PIDS.
    The starttime is there to detect re-used PIDs, so move it to that
    new item type as well. Consequently, introduce struct kdbus_pids
    to accommodate the information. Requested by Andy Lutomirski.

  * Add {e,s,fs}{u,g}id to KDBUS_ITEM_CREDS, so users have a way to
    get more fine-grained credential information.

  * Removed KDBUS_CMD_CANCEL. The interface was not usable from
    threaded userspace implementation due to inherent races. Instead,
    add an item type CANCEL_FD which can be used to pass a file
    descriptor to the CMD_SEND ioctl. When the SEND is done
    synchronously, it will get cancelled as soon as the passed
    FD signals POLLIN.

  * Dropped startttime from KDBUS_ITEM_PIDS

  * Restrict names of custom endpoints to names with a "<uid>-" prefix,
    just like we do for buses.

  * Provide module-parameter "kdbus.attach_flags_mask" to specify the
    a mask of metadata items that is applied on all exported items.

  * Monitors are now entirely invisible (IOW, there won't be any
    notification when they are created) and they don't need to install
    filters for broadcast messages anymore.

  * All information exposed via a connection's pool now also reports
    the length in addition to the offset. That way, userspace
    applications can mmap() only parts of the pool on demand.

  * Due to the metadata rework, KDBUS_ITEM_PAYLOAD_OFF items now
    describe the offset relative to the pool, where they used to be
    relative to the message header.

  * Added return_flags bitmask to all kdbus_cmd_* structs, so the
    kernel can report details of the command processing. This is
    mostly reserved for future extensions.

  * Some fixes in kdbus.txt and tests, spotted by Harald Hoyer, Andy
    Lutomirski, Michele Curti, Sergei Zviagintsev, Sheng Yong, Torstein
    Husebø and Hristo Venev.

  * Fixed compiler warnings in test-message by Michele Curti

  * Unexpected items are now rejected with -EINVAL

  * Split signal and broadcast handling. Unicast signals are now
    supported, and messages have a new KDBUS_MSG_SIGNAL flag.

  * KDBUS_CMD_MSG_SEND was renamed to KDBUS_CMD_SEND, and now takes
    a struct kdbus_cmd_send instead of a kdbus_msg.

  * KDBUS_CMD_MSG_RECV was renamed to KDBUS_CMD_RECV.

  * Test case memory leak plugged, and various other cleanups and
    fixes, by Rui Miguel Silva.

  * Build fix for s390

  * Test case fix for 32bit archs

  * The test framework now supports mount, pid and user namespaces.

  * The test framework learned a --tap command line parameter to
    format its output in the "Test Anything Protocol". This format
    is chosen by default when "make kselftest" is invoked.

  * Fixed buses and custom endpoints name validation, reported by
    Andy Lutomirski.

  * copy_from_user() return code issue fixed, reported by
    Dan Carpenter.

  * Avoid signed int overflow on archs without atomic_sub

  * Avoid variable size stack items. Fixes a sparse warning in queue.c.

  * New test case for kernel notification quota

  * Switched back to enums for the list of ioctls. This has advantages
    for userspace code as gdb, for instance, is able to resolve the
    numbers into names. Added features can easily be detected with
    autotools, and new iotcls can get #defines as well. Having #defines
    for the initial set of ioctls is uncecessary.

Daniel Mack (13):
  kdbus: add documentation
  kdbus: add header file
  kdbus: add driver skeleton, ioctl entry points and utility functions
  kdbus: add connection pool implementation
  kdbus: add connection, queue handling and message validation code
  kdbus: add node and filesystem implementation
  kdbus: add code to gather metadata
  kdbus: add code for notifications and matches
  kdbus: add code for buses, domains and endpoints
  kdbus: add name registry implementation
  kdbus: add policy database implementation
  kdbus: add Makefile, Kconfig and MAINTAINERS entry
  kdbus: add selftests

 Documentation/ioctl/ioctl-number.txt              |    1 +
 Documentation/kdbus.txt                           | 2107 +++++++++++++++++++++
 MAINTAINERS                                       |   12 +
 include/uapi/linux/Kbuild                         |    1 +
 include/uapi/linux/kdbus.h                        | 1049 ++++++++++
 include/uapi/linux/magic.h                        |    2 +
 init/Kconfig                                      |   12 +
 ipc/Makefile                                      |    2 +-
 ipc/kdbus/Makefile                                |   22 +
 ipc/kdbus/bus.c                                   |  553 ++++++
 ipc/kdbus/bus.h                                   |  103 +
 ipc/kdbus/connection.c                            | 2004 ++++++++++++++++++++
 ipc/kdbus/connection.h                            |  262 +++
 ipc/kdbus/domain.c                                |  350 ++++
 ipc/kdbus/domain.h                                |   84 +
 ipc/kdbus/endpoint.c                              |  232 +++
 ipc/kdbus/endpoint.h                              |   68 +
 ipc/kdbus/fs.c                                    |  519 +++++
 ipc/kdbus/fs.h                                    |   25 +
 ipc/kdbus/handle.c                                | 1134 +++++++++++
 ipc/kdbus/handle.h                                |   20 +
 ipc/kdbus/item.c                                  |  309 +++
 ipc/kdbus/item.h                                  |   57 +
 ipc/kdbus/limits.h                                |   95 +
 ipc/kdbus/main.c                                  |   72 +
 ipc/kdbus/match.c                                 |  535 ++++++
 ipc/kdbus/match.h                                 |   32 +
 ipc/kdbus/message.c                               |  598 ++++++
 ipc/kdbus/message.h                               |  133 ++
 ipc/kdbus/metadata.c                              | 1066 +++++++++++
 ipc/kdbus/metadata.h                              |   52 +
 ipc/kdbus/names.c                                 |  891 +++++++++
 ipc/kdbus/names.h                                 |   82 +
 ipc/kdbus/node.c                                  |  910 +++++++++
 ipc/kdbus/node.h                                  |   87 +
 ipc/kdbus/notify.c                                |  244 +++
 ipc/kdbus/notify.h                                |   30 +
 ipc/kdbus/policy.c                                |  481 +++++
 ipc/kdbus/policy.h                                |   51 +
 ipc/kdbus/pool.c                                  |  784 ++++++++
 ipc/kdbus/pool.h                                  |   47 +
 ipc/kdbus/queue.c                                 |  505 +++++
 ipc/kdbus/queue.h                                 |  108 ++
 ipc/kdbus/reply.c                                 |  262 +++
 ipc/kdbus/reply.h                                 |   68 +
 ipc/kdbus/util.c                                  |  317 ++++
 ipc/kdbus/util.h                                  |  133 ++
 tools/testing/selftests/Makefile                  |    1 +
 tools/testing/selftests/kdbus/.gitignore          |   11 +
 tools/testing/selftests/kdbus/Makefile            |   46 +
 tools/testing/selftests/kdbus/kdbus-enum.c        |   95 +
 tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
 tools/testing/selftests/kdbus/kdbus-test.c        |  920 +++++++++
 tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
 tools/testing/selftests/kdbus/kdbus-util.c        | 1646 ++++++++++++++++
 tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
 tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
 tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++
 tools/testing/selftests/kdbus/test-benchmark.c    |  427 +++++
 tools/testing/selftests/kdbus/test-bus.c          |  174 ++
 tools/testing/selftests/kdbus/test-chat.c         |  123 ++
 tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++
 tools/testing/selftests/kdbus/test-daemon.c       |   66 +
 tools/testing/selftests/kdbus/test-endpoint.c     |  344 ++++
 tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++
 tools/testing/selftests/kdbus/test-free.c         |   36 +
 tools/testing/selftests/kdbus/test-match.c        |  442 +++++
 tools/testing/selftests/kdbus/test-message.c      |  658 +++++++
 tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++
 tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
 tools/testing/selftests/kdbus/test-names.c        |  184 ++
 tools/testing/selftests/kdbus/test-policy-ns.c    |  633 +++++++
 tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 +++++++++++++
 tools/testing/selftests/kdbus/test-policy.c       |   81 +
 tools/testing/selftests/kdbus/test-race.c         |  313 +++
 tools/testing/selftests/kdbus/test-sync.c         |  368 ++++
 tools/testing/selftests/kdbus/test-timeout.c      |   99 +
 77 files changed, 27818 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/kdbus.txt
 create mode 100644 include/uapi/linux/kdbus.h
 create mode 100644 ipc/kdbus/Makefile
 create mode 100644 ipc/kdbus/bus.c
 create mode 100644 ipc/kdbus/bus.h
 create mode 100644 ipc/kdbus/connection.c
 create mode 100644 ipc/kdbus/connection.h
 create mode 100644 ipc/kdbus/domain.c
 create mode 100644 ipc/kdbus/domain.h
 create mode 100644 ipc/kdbus/endpoint.c
 create mode 100644 ipc/kdbus/endpoint.h
 create mode 100644 ipc/kdbus/fs.c
 create mode 100644 ipc/kdbus/fs.h
 create mode 100644 ipc/kdbus/handle.c
 create mode 100644 ipc/kdbus/handle.h
 create mode 100644 ipc/kdbus/item.c
 create mode 100644 ipc/kdbus/item.h
 create mode 100644 ipc/kdbus/limits.h
 create mode 100644 ipc/kdbus/main.c
 create mode 100644 ipc/kdbus/match.c
 create mode 100644 ipc/kdbus/match.h
 create mode 100644 ipc/kdbus/message.c
 create mode 100644 ipc/kdbus/message.h
 create mode 100644 ipc/kdbus/metadata.c
 create mode 100644 ipc/kdbus/metadata.h
 create mode 100644 ipc/kdbus/names.c
 create mode 100644 ipc/kdbus/names.h
 create mode 100644 ipc/kdbus/node.c
 create mode 100644 ipc/kdbus/node.h
 create mode 100644 ipc/kdbus/notify.c
 create mode 100644 ipc/kdbus/notify.h
 create mode 100644 ipc/kdbus/policy.c
 create mode 100644 ipc/kdbus/policy.h
 create mode 100644 ipc/kdbus/pool.c
 create mode 100644 ipc/kdbus/pool.h
 create mode 100644 ipc/kdbus/queue.c
 create mode 100644 ipc/kdbus/queue.h
 create mode 100644 ipc/kdbus/reply.c
 create mode 100644 ipc/kdbus/reply.h
 create mode 100644 ipc/kdbus/util.c
 create mode 100644 ipc/kdbus/util.h
 create mode 100644 tools/testing/selftests/kdbus/.gitignore
 create mode 100644 tools/testing/selftests/kdbus/Makefile
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
 create mode 100644 tools/testing/selftests/kdbus/test-activator.c
 create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
 create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
 create mode 100644 tools/testing/selftests/kdbus/test-bus.c
 create mode 100644 tools/testing/selftests/kdbus/test-chat.c
 create mode 100644 tools/testing/selftests/kdbus/test-connection.c
 create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
 create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
 create mode 100644 tools/testing/selftests/kdbus/test-fd.c
 create mode 100644 tools/testing/selftests/kdbus/test-free.c
 create mode 100644 tools/testing/selftests/kdbus/test-match.c
 create mode 100644 tools/testing/selftests/kdbus/test-message.c
 create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
 create mode 100644 tools/testing/selftests/kdbus/test-names.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy.c
 create mode 100644 tools/testing/selftests/kdbus/test-race.c
 create mode 100644 tools/testing/selftests/kdbus/test-sync.c
 create mode 100644 tools/testing/selftests/kdbus/test-timeout.c



^ permalink raw reply	[flat|nested] 143+ messages in thread

* [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.

The documentation in the first patch in this series explains the
protocol and the API details.

Full details on what has changed from the v2 submission are at the
bottom of this email.

Reasons why this should be done in the kernel, instead of userspace as
it is currently done today include the following:

- performance: fewer process context switches, fewer copies, fewer
  syscalls, larger memory chunks via memfd.  This is really important
  for a whole class of userspace programs that are ported from other
  operating systems that are run on tiny ARM systems that rely on
  hundreds of thousands of messages passed at boot time, and at
  "critical" times in their user interaction loops.
- security: the peers which communicate do not have to trust each other,
  as the only trustworthy compoenent in the game is the kernel which
  adds metadata and ensures that all data passed as payload is either
  copied or sealed, so that the receiver can parse the data without
  having to protect against changing memory while parsing buffers.  Also,
  all the data transfer is controlled by the kernel, so that LSMs can
  track and control what is going on, without involving userspace.
  Because of the LSM issue, security people are much happier with this
  model than the current scheme of having to hook into dbus to mediate
  things.
- more metadata can be attached to messages than in userspace
- semantics for apps with heavy data payloads (media apps, for instance)
  with optinal priority message dequeuing, and global message ordering.
  Some "crazy" people are playing with using kdbus for audio data in the
  system.  I'm not saying that this is the best model for this, but
  until now, there wasn't any other way to do this without having to
  create custom "busses", one for each application library.
- being in the kernle closes a lot of races which can't be fixed with
  the current userspace solutions.  For example, with kdbus, there is a
  way a client can disconnect from a bus, but do so only if no further
  messages present in its queue, which is crucial for implementing
  race-free "exit-on-idle" services
- eavesdropping on the kernel level, so privileged users can hook into
  the message stream without hacking support for that into their
  userspace processes
- a number of smaller benefits: for example kdbus learned a way to peek
  full messages without dequeing them, which is really useful for
  logging metadata when handling bus-activation requests.

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details.  For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other.  And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down.  On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system.  Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds.  kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Regarding binder: binder and kdbus follow very different design
concepts.  Binder implies the use of thread-pools to dispatch incoming
method calls.  This is a very efficient scheme, and completely natural
in programming languages like Java.  On most Linux programs, however,
there's a much stronger focus on central poll() loops that dispatch all
sources a program cares about.  kdbus is much more usable in such
environments, as it doesn't enforce a threading model, and it is happy
with serialized dispatching.  In fact, this major difference had an
effect on much of the design decisions: binder does not guarantee global
message ordering due to the parallel dispatching in the thread-pools,
but  kdbus does.  Moreover, there's also a difference in the way message
handling.  In kdbus, every message is basically taken and dispatched as
one blob, while in binder, continious connections to other peers are
created, which are then used to send messages on.  Hence, the models are
quite different, and they serve different needs.  I believe that the
D-Bus/kdbus model is more compatible and friendly with how Linux
programs are usually implemented.

This can also be found in a git tree, the kdbus branch of char-misc.git at:
        https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/

Changes since v2:

  * Add FS_USERNS_MOUNT to the file system flags, so users can mount
    their own kdbusfs instances without being root in the parent
    user-ns. Spotted by Andy Lutomirski.

  * Rewrite major parts of the metadata implementation to allow for
    per-recipient namespace translations. For this, namespaces are
    now not pinned by domains anymore. Instead, metadata is recorded
    in kernel scope, and exported into the currently active namespaces
    at the time of message installing.

  * Split PID and TID from KDBUS_ITEM_CREDS into KDBUS_ITEM_PIDS.
    The starttime is there to detect re-used PIDs, so move it to that
    new item type as well. Consequently, introduce struct kdbus_pids
    to accommodate the information. Requested by Andy Lutomirski.

  * Add {e,s,fs}{u,g}id to KDBUS_ITEM_CREDS, so users have a way to
    get more fine-grained credential information.

  * Removed KDBUS_CMD_CANCEL. The interface was not usable from
    threaded userspace implementation due to inherent races. Instead,
    add an item type CANCEL_FD which can be used to pass a file
    descriptor to the CMD_SEND ioctl. When the SEND is done
    synchronously, it will get cancelled as soon as the passed
    FD signals POLLIN.

  * Dropped startttime from KDBUS_ITEM_PIDS

  * Restrict names of custom endpoints to names with a "<uid>-" prefix,
    just like we do for buses.

  * Provide module-parameter "kdbus.attach_flags_mask" to specify the
    a mask of metadata items that is applied on all exported items.

  * Monitors are now entirely invisible (IOW, there won't be any
    notification when they are created) and they don't need to install
    filters for broadcast messages anymore.

  * All information exposed via a connection's pool now also reports
    the length in addition to the offset. That way, userspace
    applications can mmap() only parts of the pool on demand.

  * Due to the metadata rework, KDBUS_ITEM_PAYLOAD_OFF items now
    describe the offset relative to the pool, where they used to be
    relative to the message header.

  * Added return_flags bitmask to all kdbus_cmd_* structs, so the
    kernel can report details of the command processing. This is
    mostly reserved for future extensions.

  * Some fixes in kdbus.txt and tests, spotted by Harald Hoyer, Andy
    Lutomirski, Michele Curti, Sergei Zviagintsev, Sheng Yong, Torstein
    Husebø and Hristo Venev.

  * Fixed compiler warnings in test-message by Michele Curti

  * Unexpected items are now rejected with -EINVAL

  * Split signal and broadcast handling. Unicast signals are now
    supported, and messages have a new KDBUS_MSG_SIGNAL flag.

  * KDBUS_CMD_MSG_SEND was renamed to KDBUS_CMD_SEND, and now takes
    a struct kdbus_cmd_send instead of a kdbus_msg.

  * KDBUS_CMD_MSG_RECV was renamed to KDBUS_CMD_RECV.

  * Test case memory leak plugged, and various other cleanups and
    fixes, by Rui Miguel Silva.

  * Build fix for s390

  * Test case fix for 32bit archs

  * The test framework now supports mount, pid and user namespaces.

  * The test framework learned a --tap command line parameter to
    format its output in the "Test Anything Protocol". This format
    is chosen by default when "make kselftest" is invoked.

  * Fixed buses and custom endpoints name validation, reported by
    Andy Lutomirski.

  * copy_from_user() return code issue fixed, reported by
    Dan Carpenter.

  * Avoid signed int overflow on archs without atomic_sub

  * Avoid variable size stack items. Fixes a sparse warning in queue.c.

  * New test case for kernel notification quota

  * Switched back to enums for the list of ioctls. This has advantages
    for userspace code as gdb, for instance, is able to resolve the
    numbers into names. Added features can easily be detected with
    autotools, and new iotcls can get #defines as well. Having #defines
    for the initial set of ioctls is uncecessary.

Daniel Mack (13):
  kdbus: add documentation
  kdbus: add header file
  kdbus: add driver skeleton, ioctl entry points and utility functions
  kdbus: add connection pool implementation
  kdbus: add connection, queue handling and message validation code
  kdbus: add node and filesystem implementation
  kdbus: add code to gather metadata
  kdbus: add code for notifications and matches
  kdbus: add code for buses, domains and endpoints
  kdbus: add name registry implementation
  kdbus: add policy database implementation
  kdbus: add Makefile, Kconfig and MAINTAINERS entry
  kdbus: add selftests

 Documentation/ioctl/ioctl-number.txt              |    1 +
 Documentation/kdbus.txt                           | 2107 +++++++++++++++++++++
 MAINTAINERS                                       |   12 +
 include/uapi/linux/Kbuild                         |    1 +
 include/uapi/linux/kdbus.h                        | 1049 ++++++++++
 include/uapi/linux/magic.h                        |    2 +
 init/Kconfig                                      |   12 +
 ipc/Makefile                                      |    2 +-
 ipc/kdbus/Makefile                                |   22 +
 ipc/kdbus/bus.c                                   |  553 ++++++
 ipc/kdbus/bus.h                                   |  103 +
 ipc/kdbus/connection.c                            | 2004 ++++++++++++++++++++
 ipc/kdbus/connection.h                            |  262 +++
 ipc/kdbus/domain.c                                |  350 ++++
 ipc/kdbus/domain.h                                |   84 +
 ipc/kdbus/endpoint.c                              |  232 +++
 ipc/kdbus/endpoint.h                              |   68 +
 ipc/kdbus/fs.c                                    |  519 +++++
 ipc/kdbus/fs.h                                    |   25 +
 ipc/kdbus/handle.c                                | 1134 +++++++++++
 ipc/kdbus/handle.h                                |   20 +
 ipc/kdbus/item.c                                  |  309 +++
 ipc/kdbus/item.h                                  |   57 +
 ipc/kdbus/limits.h                                |   95 +
 ipc/kdbus/main.c                                  |   72 +
 ipc/kdbus/match.c                                 |  535 ++++++
 ipc/kdbus/match.h                                 |   32 +
 ipc/kdbus/message.c                               |  598 ++++++
 ipc/kdbus/message.h                               |  133 ++
 ipc/kdbus/metadata.c                              | 1066 +++++++++++
 ipc/kdbus/metadata.h                              |   52 +
 ipc/kdbus/names.c                                 |  891 +++++++++
 ipc/kdbus/names.h                                 |   82 +
 ipc/kdbus/node.c                                  |  910 +++++++++
 ipc/kdbus/node.h                                  |   87 +
 ipc/kdbus/notify.c                                |  244 +++
 ipc/kdbus/notify.h                                |   30 +
 ipc/kdbus/policy.c                                |  481 +++++
 ipc/kdbus/policy.h                                |   51 +
 ipc/kdbus/pool.c                                  |  784 ++++++++
 ipc/kdbus/pool.h                                  |   47 +
 ipc/kdbus/queue.c                                 |  505 +++++
 ipc/kdbus/queue.h                                 |  108 ++
 ipc/kdbus/reply.c                                 |  262 +++
 ipc/kdbus/reply.h                                 |   68 +
 ipc/kdbus/util.c                                  |  317 ++++
 ipc/kdbus/util.h                                  |  133 ++
 tools/testing/selftests/Makefile                  |    1 +
 tools/testing/selftests/kdbus/.gitignore          |   11 +
 tools/testing/selftests/kdbus/Makefile            |   46 +
 tools/testing/selftests/kdbus/kdbus-enum.c        |   95 +
 tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
 tools/testing/selftests/kdbus/kdbus-test.c        |  920 +++++++++
 tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
 tools/testing/selftests/kdbus/kdbus-util.c        | 1646 ++++++++++++++++
 tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
 tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
 tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++
 tools/testing/selftests/kdbus/test-benchmark.c    |  427 +++++
 tools/testing/selftests/kdbus/test-bus.c          |  174 ++
 tools/testing/selftests/kdbus/test-chat.c         |  123 ++
 tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++
 tools/testing/selftests/kdbus/test-daemon.c       |   66 +
 tools/testing/selftests/kdbus/test-endpoint.c     |  344 ++++
 tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++
 tools/testing/selftests/kdbus/test-free.c         |   36 +
 tools/testing/selftests/kdbus/test-match.c        |  442 +++++
 tools/testing/selftests/kdbus/test-message.c      |  658 +++++++
 tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++
 tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
 tools/testing/selftests/kdbus/test-names.c        |  184 ++
 tools/testing/selftests/kdbus/test-policy-ns.c    |  633 +++++++
 tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 +++++++++++++
 tools/testing/selftests/kdbus/test-policy.c       |   81 +
 tools/testing/selftests/kdbus/test-race.c         |  313 +++
 tools/testing/selftests/kdbus/test-sync.c         |  368 ++++
 tools/testing/selftests/kdbus/test-timeout.c      |   99 +
 77 files changed, 27818 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/kdbus.txt
 create mode 100644 include/uapi/linux/kdbus.h
 create mode 100644 ipc/kdbus/Makefile
 create mode 100644 ipc/kdbus/bus.c
 create mode 100644 ipc/kdbus/bus.h
 create mode 100644 ipc/kdbus/connection.c
 create mode 100644 ipc/kdbus/connection.h
 create mode 100644 ipc/kdbus/domain.c
 create mode 100644 ipc/kdbus/domain.h
 create mode 100644 ipc/kdbus/endpoint.c
 create mode 100644 ipc/kdbus/endpoint.h
 create mode 100644 ipc/kdbus/fs.c
 create mode 100644 ipc/kdbus/fs.h
 create mode 100644 ipc/kdbus/handle.c
 create mode 100644 ipc/kdbus/handle.h
 create mode 100644 ipc/kdbus/item.c
 create mode 100644 ipc/kdbus/item.h
 create mode 100644 ipc/kdbus/limits.h
 create mode 100644 ipc/kdbus/main.c
 create mode 100644 ipc/kdbus/match.c
 create mode 100644 ipc/kdbus/match.h
 create mode 100644 ipc/kdbus/message.c
 create mode 100644 ipc/kdbus/message.h
 create mode 100644 ipc/kdbus/metadata.c
 create mode 100644 ipc/kdbus/metadata.h
 create mode 100644 ipc/kdbus/names.c
 create mode 100644 ipc/kdbus/names.h
 create mode 100644 ipc/kdbus/node.c
 create mode 100644 ipc/kdbus/node.h
 create mode 100644 ipc/kdbus/notify.c
 create mode 100644 ipc/kdbus/notify.h
 create mode 100644 ipc/kdbus/policy.c
 create mode 100644 ipc/kdbus/policy.h
 create mode 100644 ipc/kdbus/pool.c
 create mode 100644 ipc/kdbus/pool.h
 create mode 100644 ipc/kdbus/queue.c
 create mode 100644 ipc/kdbus/queue.h
 create mode 100644 ipc/kdbus/reply.c
 create mode 100644 ipc/kdbus/reply.h
 create mode 100644 ipc/kdbus/util.c
 create mode 100644 ipc/kdbus/util.h
 create mode 100644 tools/testing/selftests/kdbus/.gitignore
 create mode 100644 tools/testing/selftests/kdbus/Makefile
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
 create mode 100644 tools/testing/selftests/kdbus/test-activator.c
 create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
 create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
 create mode 100644 tools/testing/selftests/kdbus/test-bus.c
 create mode 100644 tools/testing/selftests/kdbus/test-chat.c
 create mode 100644 tools/testing/selftests/kdbus/test-connection.c
 create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
 create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
 create mode 100644 tools/testing/selftests/kdbus/test-fd.c
 create mode 100644 tools/testing/selftests/kdbus/test-free.c
 create mode 100644 tools/testing/selftests/kdbus/test-match.c
 create mode 100644 tools/testing/selftests/kdbus/test-message.c
 create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
 create mode 100644 tools/testing/selftests/kdbus/test-names.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy.c
 create mode 100644 tools/testing/selftests/kdbus/test-race.c
 create mode 100644 tools/testing/selftests/kdbus/test-sync.c
 create mode 100644 tools/testing/selftests/kdbus/test-timeout.c

^ permalink raw reply	[flat|nested] 143+ messages in thread

* [PATCH 01/13] kdbus: add documentation
  2015-01-16 19:16 ` Greg Kroah-Hartman
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  2015-01-20 13:53     ` Michael Kerrisk (man-pages)
                     ` (2 more replies)
  -1 siblings, 3 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

kdbus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).

The interface to all functions in this driver is implemented via ioctls
on files exposed through a filesystem called 'kdbusfs'. The default
mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
documentation about the kernel level API design.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/kdbus.txt | 2107 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2107 insertions(+)
 create mode 100644 Documentation/kdbus.txt

diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
new file mode 100644
index 000000000000..2592a7e37079
--- /dev/null
+++ b/Documentation/kdbus.txt
@@ -0,0 +1,2107 @@
+D-Bus is a system for powerful, easy to use interprocess communication (IPC).
+
+The focus of this document is an overview of the low-level, native kernel D-Bus
+transport called kdbus. Kdbus exposes its functionality via files in a
+filesystem called 'kdbusfs'. All communication between processes takes place
+via ioctls on files exposed through the mount point of a kdbusfs. The default
+mount point of kdbusfs is /sys/fs/kdbus.
+
+Please note that kdbus was designed as transport layer for D-Bus, but is in no
+way limited, nor controlled by the D-Bus protocol specification. The D-Bus
+protocol is one possible application layer on top of kdbus.
+
+For the general D-Bus protocol specification, the payload format, the
+marshaling, and the communication semantics, please refer to:
+  http://dbus.freedesktop.org/doc/dbus-specification.html
+
+For a kdbus specific userspace library implementation please refer to:
+  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
+
+Articles about D-Bus and kdbus:
+  http://lwn.net/Articles/580194/
+
+
+1. Terminology
+===============================================================================
+
+  Domain:
+    A domain is created each time a kdbusfs is mounted. Each process that is
+    capable to mount a new instance of a kdbusfs will have its own kdbus
+    hierarchy. Each domain (ie, each mount point) offers its own "control"
+    file to create new buses. Domains have no connection to each other and
+    cannot see nor talk to each other. See section 5 for more details.
+
+  Bus:
+    A bus is a named object inside a domain. Clients exchange messages
+    over a bus. Multiple buses themselves have no connection to each other;
+    messages can only be exchanged on the same bus. The default endpoint of
+    a bus, where clients establish the connection to, is the "bus" file
+    /sys/fs/kdbus/<bus name>/bus.
+    Common operating system setups create one "system bus" per system, and one
+    "user bus" for every logged-in user. Applications or services may create
+    their own private buses.  See section 5 for more details.
+
+  Endpoint:
+    An endpoint provides a file to talk to a bus. Opening an endpoint
+    creates a new connection to the bus to which the endpoint belongs. All
+    endpoints have unique names and are accessible as files underneath the
+    directory of a bus, e.g., /sys/fs/kdbus/<bus>/<endpoint>
+    Every bus has a default endpoint called "bus". A bus can optionally offer
+    additional endpoints with custom names to provide restricted access to the
+    bus. Custom endpoints carry additional policy which can be used to create
+    sandboxes with locked-down, limited, filtered access to a bus.  See
+    section 5 for more details.
+
+  Connection:
+    A connection to a bus is created by opening an endpoint file of a bus and
+    becoming an active client with the HELLO exchange. Every ordinary client
+    connection has a unique identifier on the bus and can address messages to
+    every other connection on the same bus by using the peer's connection id
+    as the destination.  See section 6 for more details.
+
+  Pool:
+    Each connection allocates a piece of shmem-backed memory that is used
+    to receive messages and answers to ioctl commands from the kernel. It is
+    never used to send anything to the kernel. In order to access that memory,
+    userspace must mmap() it into its address space.
+    See section 12 for more details.
+
+  Well-known Name:
+    A connection can, in addition to its implicit unique connection id, request
+    the ownership of a textual well-known name. Well-known names are noted in
+    reverse-domain notation, such as com.example.service1. Connections offering
+    a service on a bus are usually reached by its well-known name. The analogy
+    of connection id and well-known name is an IP address and a DNS name
+    associated with that address.
+
+  Message:
+    Connections can exchange messages with other connections by addressing
+    the peers with their connection id or well-known name. A message consists
+    of a message header with kernel-specific information on how to route the
+    message, and the message payload, which is a logical byte stream of
+    arbitrary size. Messages can carry additional file descriptors to be passed
+    from one connection to another. Every connection can specify which set of
+    metadata the kernel should attach to the message when it is delivered
+    to the receiving connection. Metadata contains information like: system
+    timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm,
+    process exe, process argv, cgroup, capabilities, seclabel, audit session,
+    loginuid and the connection's human-readable name.
+    See section 7 and 13 for more details.
+
+  Item:
+    The API of kdbus implements a notion of items, submitted through and
+    returned by most ioctls, and stored inside data structures in the
+    connection's pool. See section 4 for more details.
+
+  Broadcast and Match:
+    Broadcast messages are potentially sent to all connections of a bus. By
+    default, the connections will not actually receive any of the sent
+    broadcast messages; only after installing a match for specific message
+    properties, a broadcast message passes this filter.
+    See section 10 for more details.
+
+  Policy:
+    A policy is a set of rules that define which connections can see, talk to,
+    or register a well-know name on the bus. A policy is attached to buses and
+    custom endpoints, and modified by policy holder connections or owners of
+    custom endpoints. See section 11 for more details.
+    See section 11 for more details.
+
+  Privileged bus users:
+    A user connecting to the bus is considered privileged if it is either the
+    creator of the bus, or if it has the CAP_IPC_OWNER capability flag set.
+
+
+2. Control Files Layout
+===============================================================================
+
+The kdbus interface is exposed through files in its kdbusfs mount point
+(defaults to /sys/fs/kdbus):
+
+  /sys/fs/kdbus                 (mount point of kdbusfs)
+  |-- control                   (domain control-file)
+  |-- 0-system                  (bus of user uid=0)
+  |   |-- bus                   (default endpoint of bus '0-system')
+  |   `-- ep.apache             (custom endpoint of bus '0-system')
+  |-- 1000-user                 (bus of user uid=1000)
+  |   `-- bus                   (default endpoint of bus '1000-user')
+  `-- 2702-user                 (bus of user uid=2702)
+      |-- bus                   (default endpoint of bus '2702-user')
+      `-- ep.app                (custom endpoint of bus '2702-user')
+
+
+3. Data Structures and flags
+===============================================================================
+
+3.1 Data structures and interconnections
+----------------------------------------
+
+  +--------------------------------------------------------------------------+
+  | Domain (Mount Point)                                                     |
+  | /sys/fs/kdbus/control                                                    |
+  | +----------------------------------------------------------------------+ |
+  | | Bus (System Bus)                                                     | |
+  | | /sys/fs/kdbus/0-system/                                              | |
+  | | +-------------------------------+ +--------------------------------+ | |
+  | | | Endpoint                      | | Endpoint                       | | |
+  | | | /sys/fs/kdbus/0-system/bus    | | /sys/fs/kdbus/0-system/ep.app  | | |
+  | | +-------------------------------+ +--------------------------------+ | |
+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
+  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
+  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
+  | +----------------------------------------------------------------------+ |
+  |                                                                          |
+  | +----------------------------------------------------------------------+ |
+  | | Bus (User Bus for UID 2702)                                          | |
+  | | /sys/fs/kdbus/2702-user/                                             | |
+  | | +-------------------------------+ +--------------------------------+ | |
+  | | | Endpoint                      | | Endpoint                       | | |
+  | | | /sys/fs/kdbus/2702-user/bus   | | /sys/fs/kdbus/2702-user/ep.app | | |
+  | | +-------------------------------+ +--------------------------------+ | |
+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
+  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
+  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
+  | | +--------------+ +--------------+ +--------------------------------+ | |
+  | +----------------------------------------------------------------------+ |
+  +--------------------------------------------------------------------------+
+
+The above description uses the D-Bus notation of unique connection names that
+adds a ":1." prefix to the connection's unique ID. kdbus itself doesn't
+use that notation, neither internally nor externally. However, libraries and
+other userspace code that aims for compatibility to D-Bus might.
+
+3.2 Flags
+---------
+
+All ioctls used in the communication with the driver contain three 64-bit
+fields: 'flags', 'kernel_flags' and 'return_flags'. All of them are specific
+to the ioctl used.
+
+In 'flags', the behavior of the command can be tweaked. All bits that are not
+recognized by the kernel in this field are rejected, and the ioctl fails with
+-EINVAL.
+
+In 'kernel_flags', the kernel driver writes back the mask of supported bits
+upon each call, and sets the KDBUS_FLAGS_KERNEL bit. This is a way to probe
+possible kernel features and make userspace code forward and backward
+compatible.
+
+In 'return_flags', the kernel can return results of the command, in addition
+to the actual return value. This is mostly to inform userspace about non-fatal
+conditions that occurred during the execution of the command.
+
+
+4. Items
+===============================================================================
+
+To flexibly augment transport structures, data blobs of type struct kdbus_item
+can be attached to the structs passed into the ioctls. Some ioctls make items
+of certain types mandatory, others are optional. Unsupported items will cause
+the ioctl to fail -EINVAL.
+
+The total size of an item is variable and is in some cases defined by the item
+type. In other cases, they can be of arbitrary length (for instance, a string).
+
+Items are also used for information stored in a connection's pool, such as
+received messages, name lists or requested connection or bus owner information.
+
+Whenever items are used as part of the kdbus kernel API, they are embedded in
+structs that have an overall size of their own, so there can be multiple items
+per ioctl.
+
+The kernel expects all items to be aligned to 8-byte boundaries. Unaligned
+items or such that are unsupported by the ioctl are rejected.
+
+A simple iterator in userspace would iterate over the items until the items
+have reached the embedding structure's overall size. An example implementation
+of such an iterator can be found in tools/testing/selftests/kdbus/kdbus-util.h.
+
+
+5. Creation of new domains, buses and endpoints
+===============================================================================
+
+
+5.1 Domains
+-----------
+
+A domain is a container of buses. Domains themselves do not provide any IPC
+functionality. Their sole purpose is to manage buses allocated in their
+domain. Each time kdbusfs is mounted, a new kdbus domain is created, with its
+own 'control' file. The lifetime of the domain ends once the user has unmounted
+the kdbusfs. If you mount kdbusfs multiple times, each will have its own kdbus
+domain internally. Operations performed on one domain do not affect any
+other domain.
+
+The full kdbusfs hierarchy, any sub-directory, or file can be bind-mounted to
+an external mount point and will remain fully functional. The kdbus domain and
+any linked resources stay available until the original mount and all subsequent
+bind-mounts have been unmounted.
+
+During creation, domains pin the user-namespace of the creator and use
+it as controlling user-namespace for this domain. Any user accounting is done
+relative to that user-namespace.
+
+Newly created kdbus domains do not have any bus pre-created. The only resource
+available is a 'control' file, which is used to manage kdbus domains.
+Currently, 'control' files are exclusively used to create buses via
+KDBUS_CMD_BUS_MAKE, but further ioctls might be added in the future.
+
+
+5.2 Buses
+---------
+
+A bus is a shared resource between connections to transmit messages. Each bus
+is independent and operations on the bus will not have any effect on other
+buses. A bus is a management entity, that controls the addresses of its
+connections, their policies and message transactions performed via this bus.
+
+Each bus is bound to the domain it was created on. It has a custom name that is
+unique across all buses of a domain. In kdbusfs, a bus is presented as a
+directory. No operations can be performed on the bus itself, instead you need
+to perform those on an endpoint associated with the bus. Endpoints are
+accessible as files underneath the bus directory. A default endpoint called
+"bus" is provided on each bus.
+
+Bus names may be chosen freely except for one restriction: the name
+must be prefixed with the numeric UID of the creator and a dash. This
+is required to avoid namespace clashes between different users. When
+creating a bus the name must be passed in properly formatted, or the
+kernel will refuse creation of the bus. Example: "1047-foobar" is an
+OK name for a bus registered by a user with UID 1047. However,
+"1024-foobar" is not, and neither is "foobar".
+The UID must be provided in the user-namespace of the parent domain.
+
+To create a new bus, you need to open the control file of a domain and run the
+KDBUS_CMD_BUS_MAKE ioctl. The control file descriptor that was used to issue
+KDBUS_CMD_BUS_MAKE must not have been used for any other control-ioctl before
+and needs to be kept open for the entire life-time of the created bus. Closing
+it will immediately cleanup the entire bus and all its associated resources and
+endpoints. Every control file descriptor can only be used to create a single
+new bus; from that point on, it is not used for any further communication until
+the final close().
+
+Each bus will generate a random, 128-bit UUID upon creation. It will be
+returned to creators of connections through kdbus_cmd_hello.id128 and can
+be used by userspace to uniquely identify buses, even across different machines
+or containers. The UUID will have its variant bits set to 'DCE', and denote
+version 4 (random).
+
+When creating buses, a variable list of items that must be passed in
+the items array is expected otherwise bus creation will fail.
+
+
+5.3 Endpoints
+-------------
+
+Endpoints are entry points to a bus. By default, each bus has a default
+endpoint called 'bus'. The bus owner has the ability to create custom
+endpoints with specific names, permissions, and policy databases (see below).
+An endpoint is presented as file underneath the directory of the parent bus.
+
+To create a custom endpoint, open the default endpoint ('bus') and use the
+KDBUS_CMD_ENDPOINT_MAKE ioctl with "struct kdbus_cmd_make". Custom endpoints
+always have a policy database that, by default, forbids any operation. You have
+to explicitly install policy entries to allow any operation on this endpoint.
+Once KDBUS_CMD_ENDPOINT_MAKE succeeded, this file descriptor will manage the
+newly created endpoint resource. It cannot be used to manage further resources.
+
+Endpoint names may be chosen freely except for one restriction: the name
+must be prefixed with the numeric UID of the creator and a dash. This
+is required to avoid namespace clashes between different users. When
+creating an endpoint the name must be passed in properly formatted, or the
+kernel will refuse creation of the endpoint. Example: "1047-foobar" is an
+OK name for an endpoint registered by a user with UID 1047. However,
+"1024-foobar" is not, and neither is "foobar".
+The UID must be provided in the user-namespace of the parent domain.
+
+To create connections to a bus, you use KDBUS_CMD_HELLO. See section 6 for
+details. Note that once KDBUS_CMD_HELLO succeeded, this file descriptor manages
+the newly created connection resource. It cannot be used to manage further
+resources.
+
+
+5.4 Creating buses and endpoints
+--------------------------------
+
+KDBUS_CMD_BUS_MAKE, and KDBUS_CMD_ENDPOINT_MAKE take a
+struct kdbus_cmd_make argument.
+
+struct kdbus_cmd_make {
+  __u64 size;
+    The overall size of the struct, including its items.
+
+  __u64 flags;
+    The flags for creation.
+
+    KDBUS_MAKE_ACCESS_GROUP
+      Make the bus or endpoint file group-accessible
+
+    KDBUS_MAKE_ACCESS_WORLD
+      Make the bus or endpoint file world-accessible
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    A list of items that has specific meanings for KDBUS_CMD_BUS_MAKE
+    and KDBUS_CMD_ENDPOINT_MAKE (see above).
+
+    Following items are expected for KDBUS_CMD_BUS_MAKE:
+    KDBUS_ITEM_MAKE_NAME
+      Contains a string to identify the bus name.
+
+    KDBUS_ITEM_BLOOM_PARAMETER
+      Bus-wide bloom parameters passed in a dbus_bloom_parameter struct
+
+    KDBUS_ITEM_ATTACH_FLAGS_RECV
+      An optional item that contains a set of required attach flags
+      that connections must allow. This item is used as a negotiation
+      measure during connection creation. If connections do not satisfy
+      the bus requirements, they are not allowed on the bus.
+      If not set, the bus does not require any metadata to be attached,
+      in this case connections are free to set their own attach flags.
+
+    KDBUS_ITEM_ATTACH_FLAGS_SEND
+      An optional item that contains a set of attach flags that are
+      returned to connections when they query the bus creator metadata.
+      If not set, no metadata is returned.
+
+    Unrecognized items are rejected, and the ioctl will fail with -EINVAL.
+};
+
+
+6. Connections
+===============================================================================
+
+
+6.1 Connection IDs and well-known connection names
+--------------------------------------------------
+
+Connections are identified by their connection id, internally implemented as a
+uint64_t counter. The IDs of every newly created bus start at 1, and every new
+connection will increment the counter by 1. The ids are not reused.
+
+In higher level tools, the user visible representation of a connection is
+defined by the D-Bus protocol specification as ":1.<id>".
+
+Messages with a specific uint64_t destination id are directly delivered to
+the connection with the corresponding id. Messages with the special destination
+id KDBUS_DST_ID_BROADCAST are broadcast messages and are potentially delivered
+to all known connections on the bus; clients interested in broadcast messages
+need to subscribe to the specific messages they are interested, though before
+any broadcast message reaches them.
+
+Messages synthesized and sent directly by the kernel will carry the special
+source id KDBUS_SRC_ID_KERNEL (0).
+
+In addition to the unique uint64_t connection id, established connections can
+request the ownership of well-known names, under which they can be found and
+addressed by other bus clients. A well-known name is associated with one and
+only one connection at a time. See section 8 on name acquisition and the
+name registry, and the validity of names.
+
+Messages can specify the special destination id 0 and carry a well-known name
+in the message data. Such a message is delivered to the destination connection
+which owns that well-known name.
+
+  +-------------------------------------------------------------------------+
+  | +---------------+     +---------------------------+                     |
+  | | Connection    |     | Message                   | -----------------+  |
+  | | :1.22         | --> | src: 22                   |                  |  |
+  | |               |     | dst: 25                   |                  |  |
+  | |               |     |                           |                  |  |
+  | |               |     |                           |                  |  |
+  | |               |     +---------------------------+                  |  |
+  | |               |                                                    |  |
+  | |               | <--------------------------------------+           |  |
+  | +---------------+                                        |           |  |
+  |                                                          |           |  |
+  | +---------------+     +---------------------------+      |           |  |
+  | | Connection    |     | Message                   | -----+           |  |
+  | | :1.25         | --> | src: 25                   |                  |  |
+  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
+  | |               |     |  (KDBUS_DST_ID_BROADCAST) |              |   |  |
+  | |               |     |                           | ---------+   |   |  |
+  | |               |     +---------------------------+          |   |   |  |
+  | |               |                                            |   |   |  |
+  | |               | <--------------------------------------------------+  |
+  | +---------------+                                            |   |      |
+  |                                                              |   |      |
+  | +---------------+     +---------------------------+          |   |      |
+  | | Connection    |     | Message                   | --+      |   |      |
+  | | :1.55         | --> | src: 55                   |   |      |   |      |
+  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
+  | |               |     |                           |   |      |   |      |
+  | |               |     |                           |   |      |   |      |
+  | |               |     +---------------------------+   |      |   |      |
+  | |               |                                     |      |   |      |
+  | |               | <------------------------------------------+   |      |
+  | +---------------+                                     |          |      |
+  |                                                       |          |      |
+  | +---------------+                                     |          |      |
+  | | Connection    |                                     |          |      |
+  | | :1.81         |                                     |          |      |
+  | | org.foo.bar   |                                     |          |      |
+  | |               |                                     |          |      |
+  | |               |                                     |          |      |
+  | |               | <-----------------------------------+          |      |
+  | |               |                                                |      |
+  | |               | <----------------------------------------------+      |
+  | +---------------+                                                       |
+  +-------------------------------------------------------------------------+
+
+
+6.2 Creating connections
+------------------------
+
+A connection to a bus is created by opening an endpoint file of a bus and
+becoming an active client with the KDBUS_CMD_HELLO ioctl. Every connected client
+connection has a unique identifier on the bus and can address messages to every
+other connection on the same bus by using the peer's connection id as the
+destination.
+
+The KDBUS_CMD_HELLO ioctl takes the following struct as argument.
+
+struct kdbus_cmd_hello {
+  __u64 size;
+    The overall size of the struct, including all attached items.
+
+  __u64 flags;
+    Flags to apply to this connection:
+
+    KDBUS_HELLO_ACCEPT_FD
+      When this flag is set, the connection can be sent file descriptors
+      as message payload. If it's not set, any attempt of doing so will
+      result in -ECOMM on the sender's side.
+
+    KDBUS_HELLO_ACTIVATOR
+      Make this connection an activator (see below). With this bit set,
+      an item of type KDBUS_ITEM_NAME has to be attached which describes
+      the well-known name this connection should be an activator for.
+
+    KDBUS_HELLO_POLICY_HOLDER
+      Make this connection a policy holder (see below). With this bit set,
+      an item of type KDBUS_ITEM_NAME has to be attached which describes
+      the well-known name this connection should hold a policy for.
+
+    KDBUS_HELLO_MONITOR
+      Make this connection an eaves-dropping connection. See section 6.8 for
+      more information.
+
+To also receive broadcast messages,
+      the connection has to upload appropriate matches as well.
+      This flag is only valid for privileged bus connections.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  __u64 attach_flags_send;
+      Set the bits for metadata this connection permits to be sent to the
+      receiving peer. Only metadata items that are both allowed to be sent by
+      the sender and that are requested by the receiver will effectively be
+      attached to the message eventually. Note, however, that the bus may
+      optionally enforce some of those bits to be set. If the match fails,
+      -ECONNREFUSED will be returned. In either case, this field will be set
+      to the mask of metadata items that are enforced by the bus. The
+      KDBUS_FLAGS_KERNEL bit will as well be set.
+
+  __u64 attach_flags_recv;
+      Request the attachment of metadata for each message received by this
+      connection. The metadata actually attached may actually augment the list
+      of requested items. See section 13 for more details.
+
+  __u64 bus_flags;
+      Upon successful completion of the ioctl, this member will contain the
+      flags of the bus it connected to.
+
+  __u64 id;
+      Upon successful completion of the ioctl, this member will contain the
+      id of the new connection.
+
+  __u64 pool_size;
+      The size of the communication pool, in bytes. The pool can be accessed
+      by calling mmap() on the file descriptor that was used to issue the
+      KDBUS_CMD_HELLO ioctl.
+
+  __u64 offset;
+      The kernel will return the offset in the pool where returned details
+      will be stored.
+
+  __u8 id128[16];
+      Upon successful completion of the ioctl, this member will contain the
+      128 bit wide UUID of the connected bus.
+
+  struct kdbus_item items[0];
+      Variable list of items to add optional additional information. The
+      following items are currently expected/valid:
+
+      KDBUS_ITEM_CONN_DESCRIPTION
+        Contains a string to describes this connection's name, so it can be
+        identified later.
+
+      KDBUS_ITEM_NAME
+      KDBUS_ITEM_POLICY_ACCESS
+        For activators and policy holders only, combinations of these two
+        items describe policy access entries (see section about policy).
+
+      KDBUS_ITEM_CREDS
+      KDBUS_ITEM_PIDS
+      KDBUS_ITEM_SECLABEL
+        Privileged bus users may submit these types in order to create
+        connections with faked credentials. This information will be returned
+        when peer information is queried by KDBUS_CMD_CONN_INFO. See section
+        13 for more information.
+
+      Items of other types are rejected, and the ioctl will fail with -EINVAL.
+};
+
+At the offset returned in the 'offset' field of struct kdbus_cmd_hello, the
+kernel will store items of the following types:
+
+  KDBUS_ITEM_BLOOM_PARAMETER
+      Bloom filter parameter as defined by the bus creator (see below).
+
+The offset in the pool has to be freed with the KDBUS_CMD_FREE ioctl.
+
+6.3 Activator and policy holder connection
+------------------------------------------
+
+An activator connection is a placeholder for a well-known name. Messages sent
+to such a connection can be used by userspace to start an implementer
+connection, which will then get all the messages from the activator copied
+over. An activator connection cannot be used to send any message.
+
+A policy holder connection only installs a policy for one or more names.
+These policy entries are kept active as long as the connection is alive, and
+are removed once it terminates. Such a policy connection type can be used to
+deploy restrictions for names that are not yet active on the bus. A policy
+holder connection cannot be used to send any message.
+
+The creation of activator, policy holder or monitor connections is an operation
+restricted to privileged users on the bus (see section "Terminology").
+
+
+6.4 Retrieving information on a connection
+------------------------------------------
+
+The KDBUS_CMD_CONN_INFO ioctl can be used to retrieve credentials and
+properties of the initial creator of a connection. This ioctl uses the
+following struct:
+
+struct kdbus_cmd_info {
+  __u64 size;
+    The overall size of the struct, including the name with its 0-byte string
+    terminator.
+
+  __u64 flags;
+    Specify which metadata items should be attached to the answer.
+    See section 13 for more details.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  __u64 id;
+    The connection's numerical ID to retrieve information for. If set to
+    non-zero value, the 'name' field is ignored.
+
+  __u64 offset;
+    When the ioctl returns, this value will yield the offset of the connection
+    information inside the caller's pool.
+
+  __u64 info_size;
+    The kernel will return the size of the returned information, so applications
+    can optionally mmap specific parts of the pool.
+
+  struct kdbus_item items[0];
+    The optional item list, containing the well-known name to look up as
+    a KDBUS_ITEM_OWNED_NAME. Only required if the 'id' field is set to 0.
+    Items of other types are rejected, and the ioctl will fail with -EINVAL.
+};
+
+After the ioctl returns, the following struct will be stored in the caller's
+pool at 'offset'.
+
+struct kdbus_info {
+  __u64 size;
+    The overall size of the struct, including all its items.
+
+  __u64 id;
+    The connection's unique ID.
+
+  __u64 flags;
+    The connection's flags as specified when it was created.
+
+  struct kdbus_item items[0];
+    Depending on the 'flags' field in struct kdbus_cmd_info, items of
+    types KDBUS_ITEM_OWNED_NAME and KDBUS_ITEM_CONN_DESCRIPTION are followed
+    here.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.5 Getting information about a connection's bus creator
+--------------------------------------------------------
+
+The KDBUS_CMD_BUS_CREATOR_INFO ioctl takes the same struct as
+KDBUS_CMD_CONN_INFO but is used to retrieve information about the creator of
+the bus the connection is attached to. The metadata returned by this call is
+collected during the creation of the bus and is never altered afterwards, so
+it provides pristine information on the task that created the bus, at the
+moment when it did so.
+
+In response to this call, a slice in the connection's pool is allocated and
+filled with an object of type struct kdbus_info, pointed to by the ioctl's
+'offset' field.
+
+struct kdbus_info {
+  __u64 size;
+    The overall size of the struct, including all its items.
+
+  __u64 id;
+    The bus ID
+
+  __u64 flags;
+    The bus flags as specified when it was created.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  struct kdbus_item items[0];
+    Metadata information is stored in items here. The item list contains
+    a KDBUS_ITEM_MAKE_NAME item that indicates the bus name of the
+    calling connection.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.6 Updating connection details
+-------------------------------
+
+Some of a connection's details can be updated with the KDBUS_CMD_CONN_UPDATE
+ioctl, using the file descriptor that was used to create the connection.
+The update command uses the following struct.
+
+struct kdbus_cmd_update {
+  __u64 size;
+    The overall size of the struct, including all its items.
+
+  __u64 flags;
+    Currently no flags are supported. Reserved for future use.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    Items to describe the connection details to be updated. The following item
+    types are supported:
+
+    KDBUS_ITEM_ATTACH_FLAGS_SEND
+      Supply a new set of items that this connection permits to be sent along
+      with messages.
+
+    KDBUS_ITEM_ATTACH_FLAGS_RECV
+      Supply a new set of items to be attached to each message.
+
+    KDBUS_ITEM_NAME
+    KDBUS_ITEM_POLICY_ACCESS
+      Policy holder connections may supply a new set of policy information
+      with these items. For other connection types, -EOPNOTSUPP is returned.
+
+    Items of other types are rejected, and the ioctl will fail with -EINVAL.
+};
+
+
+6.7 Termination
+---------------
+
+A connection can be terminated by simply closing its file descriptor. All
+pending incoming messages will be discarded, and the memory in the pool will
+be freed.
+
+An alternative way of closing down a connection is calling the KDBUS_CMD_BYEBYE
+ioctl on it, which will only succeed if the message queue of the connection is
+empty at the time of closing, otherwise, -EBUSY is returned.
+
+When this ioctl returns successfully, the connection has been terminated and
+won't accept any new messages from remote peers. This way, a connection can
+be terminated race-free, without losing any messages.
+
+
+6.8 Monitor connections ('eavesdropper')
+----------------------------------------
+
+Eavesdropping connections are created by setting the KDBUS_HELLO_MONITOR flag
+in struct kdbus_hello.flags. Such connections have all properties of any other,
+regular connection, except for the following details:
+
+  * They will get every message sent over the bus, both unicasts and broadcasts
+
+  * Installing matches for broadcast messages is neither necessary nor allowed
+
+  * They cannot send messages or be directly addressed as receiver
+
+  * Their creation and destruction will not cause KDBUS_ITEM_ID_{ADD,REMOVE}
+    notifications to be generated, so other connections cannot detect the
+    presence of an eavesdropper.
+
+
+7. Messages
+===============================================================================
+
+Messages consist of a fixed-size header followed directly by a list of
+variable-sized data 'items'. The overall message size is specified in the
+header of the message. The chain of data items can contain well-defined
+message metadata fields, raw data, references to data, or file descriptors.
+
+
+7.1 Sending messages
+--------------------
+
+Messages are passed to the kernel with the KDBUS_CMD_SEND ioctl. Depending
+on the destination address of the message, the kernel delivers the message to
+the specific destination connection or to all connections on the same bus.
+Sending messages across buses is not possible. Messages are always queued in
+the memory pool of the destination connection (see below).
+
+The KDBUS_CMD_SEND ioctl uses struct kdbus_cmd_send to describe the message
+transfer.
+
+struct kdbus_cmd_send {
+  __u64 size;
+    The overall size of the struct, including the attached items.
+
+  __u64 flags;
+    Flags for message delivery:
+
+    KDBUS_SEND_SYNC_REPLY
+      By default, all calls to kdbus are considered asynchronous,
+      non-blocking. However, as there are many use cases that need to wait
+      for a remote peer to answer a method call, there's a way to send a
+      message and wait for a reply in a synchronous fashion. This is what
+      the KDBUS_MSG_SYNC_REPLY controls. The KDBUS_CMD_SEND ioctl will block
+      until the reply has arrived, the timeout limit is reached, in case the
+      remote connection was shut down, or if interrupted by a signal before
+      any reply; see signal(7).
+
+      The offset of the reply message in the sender's pool is stored in in
+      'offset_reply' when the ioctl has returned without error. Hence, there
+      is no need for another KDBUS_CMD_RECV ioctl or anything else to receive
+      the reply.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call of
+    KDBUS_CMD_SEND.
+
+  __u64 kernel_msg_flags;
+    Valid bits for message flags, returned by the kernel upon each call of
+    KDBUS_CMD_SEND.
+
+  __u64 return_flags;
+    Kernel-provided flags, returning non-fatal errors that occurred during
+    send. Currently unused.
+
+  __u64 msg_address;
+    Userspace has to provide a pointer to a message (struct kdbus_msg) to send.
+
+  struct kdbus_msg_info reply;
+    Only used for synchronous replies. See description of struct kdbus_cmd_recv
+    for more details.
+
+  struct kdbus_item items[0];
+    The following items are currently recognized:
+
+    KDBUS_ITEM_CANCEL_FD
+      When this optional item is passed in, and the call is executed as SYNC
+      call, the passed in file descriptor can be used as alternative
+      cancellation point. The kernel will call poll() on this file descriptor,
+      and if it reports any incoming bytes, the blocking send operation will
+      be canceled, and the call will return -ECANCELED. Any type of file
+      descriptor that implements poll() can be used as payload to this item.
+      For asynchronous message sending, this item is accepted but ignored.
+
+    All other items are rejected, and the ioctl will fail with -EINVAL.
+};
+
+The message referenced by 'msg_address' above has the following layout.
+
+struct kdbus_msg {
+  __u64 size;
+    The overall size of the struct, including the attached items.
+
+  __u64 flags;
+    KDBUS_MSG_EXPECT_REPLY
+      Expect a reply from the remote peer to this message. With this bit set,
+      the timeout_ns field must be set to a non-zero number of nanoseconds in
+      which the receiving peer is expected to reply. If such a reply is not
+      received in time, the sender will be notified with a timeout message
+      (see below). The value must be an absolute value, in nanoseconds and
+      based on CLOCK_MONOTONIC.
+
+      For a message to be accepted as reply, it must be a direct message to
+      the original sender (not a broadcast), and its kdbus_msg.reply_cookie
+      must match the previous message's kdbus_msg.cookie.
+
+      Expected replies also temporarily open the policy of the sending
+      connection, so the other peer is allowed to respond within the given
+      time window.
+
+    KDBUS_MSG_NO_AUTO_START
+      By default, when a message is sent to an activator connection, the
+      activator notified and will start an implementer. This flag inhibits
+      that behavior. With this bit set, and the remote being an activator,
+      -EADDRNOTAVAIL is returned from the ioctl.
+
+  __s64 priority;
+    The priority of this message. Receiving messages (see below) may
+    optionally be constrained to messages of a minimal priority. This
+    allows for use cases where timing critical data is interleaved with
+    control data on the same connection. If unused, the priority should be
+    set to zero.
+
+  __u64 dst_id;
+    The numeric ID of the destination connection, or KDBUS_DST_ID_BROADCAST
+    (~0ULL) to address every peer on the bus, or KDBUS_DST_ID_NAME (0) to look
+    it up dynamically from the bus' name registry. In the latter case, an item
+    of type KDBUS_ITEM_DST_NAME is mandatory.
+
+  __u64 src_id;
+    Upon return of the ioctl, this member will contain the sending
+    connection's numerical ID. Should be 0 at send time.
+
+  __u64 payload_type;
+    Type of the payload in the actual data records. Currently, only
+    KDBUS_PAYLOAD_DBUS is accepted as input value of this field. When
+    receiving messages that are generated by the kernel (notifications),
+    this field will yield KDBUS_PAYLOAD_KERNEL.
+
+  __u64 cookie;
+    Cookie of this message, for later recognition. Also, when replying
+    to a message (see above), the cookie_reply field must match this value.
+
+  __u64 timeout_ns;
+    If the message sent requires a reply from the remote peer (see above),
+    this field contains the timeout in absolute nanoseconds based on
+    CLOCK_MONOTONIC.
+
+  __u64 cookie_reply;
+    If the message sent is a reply to another message, this field must
+    match the cookie of the formerly received message.
+
+  struct kdbus_item items[0];
+    A dynamically sized list of items to contain additional information.
+    The following items are expected/valid:
+
+    KDBUS_ITEM_PAYLOAD_VEC
+    KDBUS_ITEM_PAYLOAD_MEMFD
+    KDBUS_ITEM_FDS
+      Actual data records containing the payload. See section "Passing of
+      Payload Data".
+
+    KDBUS_ITEM_BLOOM_FILTER
+      Bloom filter for matches (see below).
+
+    KDBUS_ITEM_DST_NAME
+      Well-known name to send this message to. Required if dst_id is set
+      to KDBUS_DST_ID_NAME. If a connection holding the given name can't
+      be found, -ESRCH is returned.
+      For messages to a unique name (ID), this item is optional. If present,
+      the kernel will make sure the name owner matches the given unique name.
+      This allows userspace tie the message sending to the condition that a
+      name is currently owned by a certain unique name.
+};
+
+The message will be augmented by the requested metadata items when queued into
+the receiver's pool. See also section 13.2 ("Metadata and namespaces").
+
+
+7.2 Message layout
+------------------
+
+The layout of a message is shown below.
+
+  +-------------------------------------------------------------------------+
+  | Message                                                                 |
+  | +---------------------------------------------------------------------+ |
+  | | Header                                                              | |
+  | | size:          overall message size, including the data records     | |
+  | | destination:   connection id of the receiver                        | |
+  | | source:        connection id of the sender (set by kernel)          | |
+  | | payload_type:  "DBusDBus" textual identifier stored as uint64_t     | |
+  | +---------------------------------------------------------------------+ |
+  | +---------------------------------------------------------------------+ |
+  | | Data Record                                                         | |
+  | | size:  overall record size (without padding)                        | |
+  | | type:  type of data                                                 | |
+  | | data:  reference to data (address or file descriptor)               | |
+  | +---------------------------------------------------------------------+ |
+  | +---------------------------------------------------------------------+ |
+  | | padding bytes to the next 8 byte alignment                          | |
+  | +---------------------------------------------------------------------+ |
+  | +---------------------------------------------------------------------+ |
+  | | Data Record                                                         | |
+  | | size:  overall record size (without padding)                        | |
+  | | ...                                                                 | |
+  | +---------------------------------------------------------------------+ |
+  | +---------------------------------------------------------------------+ |
+  | | padding bytes to the next 8 byte alignment                          | |
+  | +---------------------------------------------------------------------+ |
+  | +---------------------------------------------------------------------+ |
+  | | Data Record                                                         | |
+  | | size:  overall record size                                          | |
+  | | ...                                                                 | |
+  | +---------------------------------------------------------------------+ |
+  |   ... further data records ...                                          |
+  +-------------------------------------------------------------------------+
+
+
+7.3 Passing of Payload Data
+---------------------------
+
+When connecting to the bus, receivers request a memory pool of a given size,
+large enough to carry all backlog of data enqueued for the connection. The
+pool is internally backed by a shared memory file which can be mmap()ed by
+the receiver.
+
+KDBUS_MSG_PAYLOAD_VEC:
+  Messages are directly copied by the sending process into the receiver's pool,
+  that way two peers can exchange data by effectively doing a single-copy from
+  one process to another, the kernel will not buffer the data anywhere else.
+
+KDBUS_MSG_PAYLOAD_MEMFD:
+  Messages can reference memfd files which contain the data.
+  memfd files are tmpfs-backed files that allow sealing of the content of the
+  file, which prevents all writable access to the file content.
+  Only memfds that have (F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL) set
+  are accepted as payload data, which enforces reliable passing of data.
+  The receiver can assume that neither the sender nor anyone else can alter the
+  content after the message is sent.
+  Apart from the sender filling-in the content into memfd files, the data will
+  be passed as zero-copy from one process to another, read-only, shared between
+  the peers.
+
+The sender must not make any assumptions on the type how data is received by the
+remote peer. The kernel is free to re-pack multiple VEC and MEMFD payloads. For
+instance, the kernel may decide to merge multiple VECs into a single VEC, inline
+MEMFD payloads into memory or merge all passed VECs into a single MEMFD.
+However, the kernel preserves the order of passed data. This means, the order of
+all VEC and MEMFD items is not changed in respect to each other.
+
+In other words: All passed VEC and MEMFD data payloads are treated as a single
+stream of data that may be received by the remote peer in a different set of
+hunks than it was sent as.
+
+
+7.4 Receiving messages
+----------------------
+
+Messages are received by the client with the KDBUS_CMD_RECV ioctl. The endpoint
+file of the bus supports poll() to wake up the receiving process when new
+messages are queued up to be received.
+
+With the KDBUS_CMD_RECV ioctl, a struct kdbus_cmd_recv is used.
+
+struct kdbus_cmd_recv {
+  __u64 size;
+    The overall size of the struct, including the attached items.
+
+  __u64 flags;
+    Flags to control the receive command.
+
+    KDBUS_RECV_PEEK
+      Just return the location of the next message. Do not install file
+      descriptors or anything else. This is usually used to determine the
+      sender of the next queued message.
+
+    KDBUS_RECV_DROP
+      Drop the next message without doing anything else with it, and free the
+      pool slice. This a short-cut for KDBUS_RECV_PEEK and KDBUS_CMD_FREE.
+
+    KDBUS_RECV_USE_PRIORITY
+      Use the priority field (see below).
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Kernel-provided flags, returning non-fatal errors that occurred during
+    send. Currently unused.
+
+  __s64 priority;
+    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
+    the queue with at least the given priority. If no such message is waiting
+    in the queue, -ENOMSG is returned.
+
+  __u64 dropped_msgs;
+    If the CMD_RECV ioctl fails with EOVERFLOW, this field is filled by
+    the kernel with the number of messages that couldn't be transmitted to
+    this connection. In that case, the @offset member must not be accessed.
+
+  struct kdbus_msg_info msg;
+   Embedded struct to be filled when the command succeeded (see below).
+
+  struct kdbus_item items[0];
+    Items to specify further details for the receive command. Currently unused.
+};
+
+Both 'struct kdbus_cmd_recv' and 'struct kdbus_cmd_send' embed 'struct
+kdbus_msg_info'. For the SEND ioctl, it is used to catch synchronous replies,
+if one was requested, and is unused otherwise.
+
+struct kdbus_msg_info {
+  __u64 offset;
+    Upon return of the ioctl, this field contains the offset in the receiver's
+    memory pool. The memory must be freed with KDBUS_CMD_FREE.
+
+  __u64 msg_size;
+    Upon successful return of the ioctl, this field contains the size of the
+    allocated slice at offset @offset. It is the combination of the size of
+    the stored kdbus_msg object plus all appended VECs. You can use it in
+    combination with @offset to map a single message, instead of mapping the
+    whole pool.
+
+  __u64 return_flags;
+    Kernel-provided return flags. Currently, the following flags are defined.
+
+      KDBUS_RECV_RETURN_INCOMPLETE_FDS
+        The message contained file descriptors which couldn't be installed
+        into the receiver's task. Most probably that happened because the
+        maximum number of file descriptors for that task were exceeded.
+        The message is still delivered, so this is not a fatal condition.
+        File descriptors inside the KDBUS_ITEM_FDS item that could not be
+        installed will be set to -1.
+};
+
+Unless KDBUS_RECV_DROP was passed, and given that the ioctl succeeded, the
+offset field contains the location of the new message inside the receiver's
+pool. The message is stored as struct kdbus_msg at this offset, and can be
+interpreted with the semantics described above.
+
+Also, if the connection allowed for file descriptor to be passed
+(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
+installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
+The receiving task is obliged to close all of them appropriately. If
+KDBUS_RECV_PEEK is set, no file descriptors are installed. This allows for
+peeking at a message and dropping it via KDBUS_RECV_DROP, without installing
+the passed file descriptors into the receiving process.
+
+The caller is obliged to call KDBUS_CMD_FREE with the returned offset when
+the memory is no longer needed.
+
+
+8. Name registry
+===============================================================================
+
+Each bus instantiates a name registry to resolve well-known names into unique
+connection IDs for message delivery. The registry will be queried when a
+message is sent with kdbus_msg.dst_id set to KDBUS_DST_ID_NAME, or when a
+registry dump is requested.
+
+All of the below is subject to policy rules for SEE and OWN permissions.
+
+
+8.1 Name validity
+-----------------
+
+A name has to comply to the following rules to be considered valid:
+
+ - The name has two or more elements separated by a period ('.') character
+ - All elements must contain at least one character
+ - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_"
+   and must not begin with a digit
+ - The name must contain at least one '.' (period) character
+   (and thus at least two elements)
+ - The name must not begin with a '.' (period) character
+ - The name must not exceed KDBUS_NAME_MAX_LEN (255)
+
+
+8.2 Acquiring a name
+--------------------
+
+To acquire a name, a client uses the KDBUS_CMD_NAME_ACQUIRE ioctl with the
+following data structure.
+
+struct kdbus_cmd_name {
+  __u64 size;
+    The overall size of this struct, including the name with its 0-byte string
+    terminator.
+
+  __u64 flags;
+    Flags to control details in the name acquisition.
+
+    KDBUS_NAME_REPLACE_EXISTING
+      Acquiring a name that is already present usually fails, unless this flag
+      is set in the call, and KDBUS_NAME_ALLOW_REPLACEMENT or (see below) was
+      set when the current owner of the name acquired it, or if the current
+      owner is an activator connection (see below).
+
+    KDBUS_NAME_ALLOW_REPLACEMENT
+      Allow other connections to take over this name. When this happens, the
+      former owner of the connection will be notified of the name loss.
+
+    KDBUS_NAME_QUEUE (acquire)
+      A name that is already acquired by a connection, and which wasn't
+      requested with the KDBUS_NAME_ALLOW_REPLACEMENT flag set can not be
+      acquired again. However, a connection can put itself in a queue of
+      connections waiting for the name to be released. Once that happens, the
+      first connection in that queue becomes the new owner and is notified
+      accordingly.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
+    expected and allowed, and the contained string must be a valid bus name.
+    Items of other types are rejected, and the ioctl will fail with -EINVAL.
+};
+
+
+8.3 Releasing a name
+--------------------
+
+A connection may release a name explicitly with the KDBUS_CMD_NAME_RELEASE
+ioctl. If the connection was an implementer of an activatable name, its
+pending messages are moved back to the activator. If there are any connections
+queued up as waiters for the name, the oldest one of them will become the new
+owner. The same happens implicitly for all names once a connection terminates.
+
+The KDBUS_CMD_NAME_RELEASE ioctl uses the same data structure as the
+acquisition call, but with slightly different field usage.
+
+struct kdbus_cmd_name {
+  __u64 size;
+    The overall size of this struct, including the name with its 0-byte string
+    terminator.
+
+  __u64 flags;
+    Flags to the command. Currently unused.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
+    expected and allowed, and the contained string must be a valid bus name.
+};
+
+
+8.4 Dumping the name registry
+-----------------------------
+
+A connection may request a complete or filtered dump of currently active bus
+names with the KDBUS_CMD_NAME_LIST ioctl, which takes a struct
+kdbus_cmd_name_list as argument.
+
+struct kdbus_cmd_name_list {
+  __u64 flags;
+    Any combination of flags to specify which names should be dumped.
+
+    KDBUS_NAME_LIST_UNIQUE
+      List the unique (numeric) IDs of the connection, whether it owns a name
+      or not.
+
+    KDBUS_NAME_LIST_NAMES
+      List well-known names stored in the database which are actively owned by
+      a real connection (not an activator).
+
+    KDBUS_NAME_LIST_ACTIVATORS
+      List names that are owned by an activator.
+
+    KDBUS_NAME_LIST_QUEUED
+      List connections that are not yet owning a name but are waiting for it
+      to become available.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  __u64 offset;
+    When the ioctl returns successfully, the offset to the name registry dump
+    inside the connection's pool will be stored in this field.
+};
+
+The returned list of names is stored in a struct kdbus_name_list that in turn
+contains a dynamic number of struct kdbus_cmd_name that carry the actual
+information. The fields inside that struct kdbus_cmd_name is described next.
+
+struct kdbus_name_info {
+  __u64 size;
+    The overall size of this struct, including the name with its 0-byte string
+    terminator.
+
+  __u64 owner_id;
+    The owning connection's unique ID.
+
+  __u64 conn_flags;
+    The flags of the owning connection.
+
+  struct kdbus_item items[0];
+    Items containing the actual name. Currently, one item of type
+    KDBUS_ITEM_OWNED_NAME will be attached, including the name's flags. In that
+    item, the flags field of the name may carry the following bits:
+
+    KDBUS_NAME_ALLOW_REPLACEMENT
+      Other connections are allowed to take over this name from the
+      connection that owns it.
+
+    KDBUS_NAME_IN_QUEUE (list)
+      When retrieving a list of currently acquired name in the registry, this
+      flag indicates whether the connection actually owns the name or is
+      currently waiting for it to become available.
+
+    KDBUS_NAME_ACTIVATOR (list)
+      An activator connection owns a name as a placeholder for an implementer,
+      which is started on demand as soon as the first message arrives. There's
+      some more information on this topic below. In contrast to
+      KDBUS_NAME_REPLACE_EXISTING, when a name is taken over from an activator
+      connection, all the messages that have been queued in the activator
+      connection will be moved over to the new owner. The activator connection
+      will still be tracked for the name and will take control again if the
+      implementer connection terminates.
+      This flag can not be used when acquiring a name, but is implicitly set
+      through KDBUS_CMD_HELLO with KDBUS_HELLO_ACTIVATOR set in
+      kdbus_cmd_hello.conn_flags.
+};
+
+The returned buffer must be freed with the KDBUS_CMD_FREE ioctl when the user
+is finished with it.
+
+
+9. Notifications
+===============================================================================
+
+The kernel will notify its users of the following events.
+
+  * When connection A is terminated while connection B is waiting for a reply
+    from it, connection B is notified with a message with an item of type
+    KDBUS_ITEM_REPLY_DEAD.
+
+  * When connection A does not receive a reply from connection B within the
+    specified timeout window, connection A will receive a message with an item
+    of type KDBUS_ITEM_REPLY_TIMEOUT.
+
+  * When an ordinary connection (not a monitor) is created on or removed from
+    a bus, messages with an item of type KDBUS_ITEM_ID_ADD or
+    KDBUS_ITEM_ID_REMOVE, respectively, are sent to all bus members that match
+    these messages through their match database. Eavesdroppers (monitor
+    connections) do not cause such notifications to be sent. They are invisible
+    on the bus.
+
+  * When a connection gains or loses ownership of a name, messages with an item
+    of type KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE or
+    KDBUS_ITEM_NAME_CHANGE are sent to all bus members that match these
+    messages through their match database.
+
+A kernel notification is a regular kdbus message with the following details.
+
+  * kdbus_msg.src_id == KDBUS_SRC_ID_KERNEL
+  * kdbus_msg.dst_id == KDBUS_DST_ID_BROADCAST
+  * kdbus_msg.payload_type == KDBUS_PAYLOAD_KERNEL
+  * Has exactly one of the aforementioned items attached
+
+Kernel notifications have an item of type KDBUS_ITEM_TIMESTAMP attached.
+
+
+10. Message Matching, Bloom filters
+===============================================================================
+
+10.1 Matches for broadcast messages from other connections
+----------------------------------------------------------
+
+A message addressed at the connection ID KDBUS_DST_ID_BROADCAST (~0ULL) is a
+broadcast message, delivered to all connected peers which installed a rule to
+match certain properties of the message. Without any rules installed in the
+connection, no broadcast message or kernel-side notifications will be delivered
+to the connection. Broadcast messages are subject to policy rules and TALK
+access checks.
+
+See section 11 for details on policies, and section 11.5 for more
+details on implicit policies.
+
+Matches for messages from other connections (not kernel notifications) are
+implemented as bloom filters. The sender adds certain properties of the message
+as elements to a bloom filter bit field, and sends that along with the
+broadcast message.
+
+The connection adds the message properties it is interested as elements to a
+bloom mask bit field, and uploads the mask to the match rules of the
+connection.
+
+The kernel will match the broadcast message's bloom filter against the
+connections bloom mask (simply by &-ing it), and decide whether the message
+should be delivered to the connection.
+
+The kernel has no notion of any specific properties of the message, all it
+sees are the bit fields of the bloom filter and mask to match against. The
+use of bloom filters allows simple and efficient matching, without exposing
+any message properties or internals to the kernel side. Clients need to deal
+with the fact that they might receive broadcasts which they did not subscribe
+to, as the bloom filter might allow false-positives to pass the filter.
+
+To allow the future extension of the set of elements in the bloom filter, the
+filter specifies a "generation" number. A later generation must always contain
+all elements of the set of the previous generation, but can add new elements
+to the set. The match rules mask can carry an array with all previous
+generations of masks individually stored. When the filter and mask are matched
+by the kernel, the mask with the closest matching "generation" is selected
+as the index into the mask array.
+
+
+10.2 Matches for kernel notifications
+------------------------------------
+
+To receive kernel generated notifications (see section 9), a connection must
+install special match rules that are different from the bloom filter matches
+described in the section above. They can be filtered by a sender connection's
+ID, by one of the name the sender connection owns at the time of sending the
+message, or by type of the notification (id/name add/remove/change).
+
+10.3 Adding a match
+-------------------
+
+To add a match, the KDBUS_CMD_MATCH_ADD ioctl is used, which takes a struct
+of the struct described below.
+
+Note that each of the items attached to this command will internally create
+one match 'rule', and the collection of them, which is submitted as one block
+via the ioctl is called a 'match'. To allow a message to pass, all rules of a
+match have to be satisfied. Hence, adding more items to the command will only
+narrow the possibility of a match to effectively let the message pass, and will
+cause the connection's user space process to wake up less likely.
+
+Multiple matches can be installed per connection. As long as one of it has a
+set of rules which allows the message to pass, this one will be decisive.
+
+struct kdbus_cmd_match {
+  __u64 size;
+    The overall size of the struct, including its items.
+
+  __u64 cookie;
+    A cookie which identifies the match, so it can be referred to at removal
+    time.
+
+  __u64 flags;
+    Flags to control the behavior of the ioctl.
+
+    KDBUS_MATCH_REPLACE:
+      Remove all entries with the given cookie before installing the new one.
+      This allows for race-free replacement of matches.
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    Items to define the actual rules of the matches. The following item types
+    are expected. Each item will cause one new match rule to be created.
+
+    KDBUS_ITEM_BLOOM_MASK
+      An item that carries the bloom filter mask to match against in its
+      data field. The payload size must match the bloom filter size that
+      was specified when the bus was created.
+      See section 10.4 for more information.
+
+    KDBUS_ITEM_NAME
+      Specify a name that a sending connection must own at a time of sending
+      a broadcast message in order to match this rule.
+
+    KDBUS_ITEM_ID
+      Specify a sender connection's ID that will match this rule.
+
+    KDBUS_ITEM_NAME_ADD
+    KDBUS_ITEM_NAME_REMOVE
+    KDBUS_ITEM_NAME_CHANGE
+      These items request delivery of broadcast messages that describe a name
+      acquisition, loss, or change. The details are stored in the item's
+      kdbus_notify_name_change member. All information specified must be
+      matched in order to make the message pass. Use KDBUS_MATCH_ID_ANY to
+      match against any unique connection ID.
+
+    KDBUS_ITEM_ID_ADD
+    KDBUS_ITEM_ID_REMOVE
+      These items request delivery of broadcast messages that are generated
+      when a connection is created or terminated. struct kdbus_notify_id_change
+      is used to store the actual match information. This item can be used to
+      monitor one particular connection ID, or, when the id field is set to
+      KDBUS_MATCH_ID_ANY, all of them.
+
+    Items of other types are rejected, and the ioctl will fail with -EINVAL.
+};
+
+
+10.4 Bloom filters
+------------------
+
+Bloom filters allow checking whether a given word is present in a dictionary.
+This allows connections to set up a mask for information it is interested in,
+and will be delivered signal messages that have a matching filter.
+
+For general information on bloom filters, see
+
+  https://en.wikipedia.org/wiki/Bloom_filter
+
+The size of the bloom filter is defined per bus when it is created, in
+kdbus_bloom_parameter.size. All bloom filters attached to signals on the bus
+must match this size, and all bloom filter matches uploaded by connections must
+also match the size, or a multiple thereof (see below).
+
+The calculation of the mask has to be done on the userspace side. The kernel
+just checks the bitmasks to decide whether or not to let the message pass. All
+bits in the mask must match the filter in and bit-wise AND logic, but the
+mask may have more bits set than the filter. Consequently, false positive
+matches are expected to happen, and userspace must deal with that fact.
+
+Masks are entities that are always passed to the kernel as part of a match
+(with an item of type KDBUS_ITEM_BLOOM_MASK), and filters can be attached to
+signals, with an item of type KDBUS_ITEM_BLOOM_FILTER.
+
+For a filter to match, all its bits have to be set in the match mask as well.
+For example, consider a bus has a bloom size of 8 bytes, and the following
+mask/filter combinations:
+
+    filter  0x0101010101010101
+    mask    0x0101010101010101
+            -> matches
+
+    filter  0x0303030303030303
+    mask    0x0101010101010101
+            -> doesn't match
+
+    filter  0x0101010101010101
+    mask    0x0303030303030303
+            -> matches
+
+Hence, in order to catch all messages, a mask filled with 0xff bytes can be
+installed as a wildcard match rule.
+
+Uploaded matches may contain multiple masks, each of which in the size of the
+bloom size defined by the bus. Each block of a mask is called a 'generation',
+starting at index 0.
+
+At match time, when a signal is about to be delivered, a bloom mask generation
+is passed, which denotes which of the bloom masks the filter should be matched
+against. This allows userspace to provide backward compatible masks at upload
+time, while older clients can still match against older versions of filters.
+
+
+10.5 Removing a match
+--------------------
+
+Matches can be removed through the KDBUS_CMD_MATCH_REMOVE ioctl, which again
+takes struct kdbus_cmd_match as argument, but its fields are used slightly
+differently.
+
+struct kdbus_cmd_match {
+  __u64 size;
+    The overall size of the struct. As it has no items in this use case, the
+    value should yield 16.
+
+  __u64 cookie;
+    The cookie of the match, as it was passed when the match was added.
+    All matches that have this cookie will be removed.
+
+  __u64 flags;
+    Unused for this use case,
+
+  __u64 kernel_flags;
+    Valid flags for this command, returned by the kernel upon each call.
+
+  __u64 return_flags;
+    Flags returned by the kernel. Currently unused.
+
+  struct kdbus_item items[0];
+    Unused und not allowed for this use case.
+};
+
+
+11. Policy
+===============================================================================
+
+A policy databases restrict the possibilities of connections to own, see and
+talk to well-known names. It can be associated with a bus (through a policy
+holder connection) or a custom endpoint.
+
+See section 8.1 for more details on the validity of well-known names.
+
+Default endpoints of buses always have a policy database. The default
+policy is to deny all operations except for operations that are covered by
+implicit policies. Custom endpoints always have a policy, and by default,
+a policy database is empty. Therefore, unless policy rules are added, all
+operations will also be denied by default.
+
+See section 11.5 for more details on implicit policies.
+
+A set of policy rules is described by a name and multiple access rules, defined
+by the following struct.
+
+struct kdbus_policy_access {
+  __u64 type;	/* USER, GROUP, WORLD */
+    One of the following.
+
+    KDBUS_POLICY_ACCESS_USER
+      Grant access to a user with the uid stored in the 'id' field.
+
+    KDBUS_POLICY_ACCESS_GROUP
+      Grant access to a user with the gid stored in the 'id' field.
+
+    KDBUS_POLICY_ACCESS_WORLD
+      Grant access to everyone. The 'id' field is ignored.
+
+  __u64 access;	/* OWN, TALK, SEE */
+    The access to grant.
+
+    KDBUS_POLICY_SEE
+      Allow the name to be seen.
+
+    KDBUS_POLICY_TALK
+      Allow the name to be talked to.
+
+    KDBUS_POLICY_OWN
+      Allow the name to be owned.
+
+  __u64 id;
+    For KDBUS_POLICY_ACCESS_USER, stores the uid.
+    For KDBUS_POLICY_ACCESS_GROUP, stores the gid.
+};
+
+Policies are set through KDBUS_CMD_HELLO (when creating a policy holder
+connection), KDBUS_CMD_CONN_UPDATE (when updating a policy holder connection),
+KDBUS_CMD_ENDPOINT_MAKE (creating a custom endpoint) or
+KDBUS_CMD_ENDPOINT_UPDATE (updating a custom endpoint). In all cases, the name
+and policy access information is stored in items of type KDBUS_ITEM_NAME and
+KDBUS_ITEM_POLICY_ACCESS. For this transport, the following rules apply.
+
+  * An item of type KDBUS_ITEM_NAME must be followed by at least one
+    KDBUS_ITEM_POLICY_ACCESS item
+  * An item of type KDBUS_ITEM_NAME can be followed by an arbitrary number of
+    KDBUS_ITEM_POLICY_ACCESS items
+  * An arbitrary number of groups of names and access levels can be passed
+
+uids and gids are internally always stored in the kernel's view of global ids,
+and are translated back and forth on the ioctl level accordingly.
+
+
+11.2 Wildcard names
+-------------------
+
+Policy holder connections may upload names that contain the wild card suffix
+(".*"). That way, a policy can be uploaded that is effective for every
+well-known name that extends the provided name by exactly one more level.
+
+For example, if an item of a set up uploaded policy rules contains the name
+"foo.bar.*", both "foo.bar.baz" and "foo.bar.bazbaz" are valid, but
+"foo.bar.baz.baz" is not.
+
+This allows connections to take control over multiple names that the policy
+holder doesn't need to know about when uploading the policy.
+
+Such wildcard entries are not allowed for custom endpoints.
+
+
+11.3 Policy example
+-------------------
+
+For example, a set of policy rules may look like this:
+
+  KDBUS_ITEM_NAME: str='org.foo.bar'
+  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=1000
+  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, id=1001
+  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
+  KDBUS_ITEM_NAME: str='org.blah.baz'
+  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=0
+  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
+
+That means that 'org.foo.bar' may only be owned by uid 1000, but every user on
+the bus is allowed to see the name. However, only uid 1001 may actually send
+a message to the connection and receive a reply from it.
+
+The second rule allows 'org.blah.baz' to be owned by uid 0 only, but every user
+may talk to it.
+
+
+11.4 TALK access and multiple well-known names per connection
+-------------------------------------------------------------
+
+Note that TALK access is checked against all names of a connection.
+For example, if a connection owns both 'org.foo.bar' and 'org.blah.baz', and
+the policy database allows 'org.blah.baz' to be talked to by WORLD, then this
+permission is also granted to 'org.foo.bar'. That might sound illogical, but
+after all, we allow messages to be directed to either the ID or a well-known
+name, and policy is applied to the connection, not the name. In other words,
+the effective TALK policy for a connection is the most permissive of all names
+the connection owns.
+
+For broadcast messages, the receiver needs TALK permissions to the sender to
+receive the broadcast.
+
+If a policy database exists for a bus (because a policy holder created one on
+demand) or for a custom endpoint (which always has one), each one is consulted
+during name registry listing, name owning or message delivery. If either one
+fails, the operation is failed with -EPERM.
+
+For best practices, connections that own names with a restricted TALK
+access should not install matches. This avoids cases where the sent
+message may pass the bloom filter due to false-positives and may also
+satisfy the policy rules.
+
+
+11.5 Implicit policies
+----------------------
+
+Depending on the type of the endpoint, a set of implicit rules that
+override installed policies might be enforced.
+
+On default endpoints, the following set is enforced and checked before
+any user-supplied policy is checked.
+
+  * Privileged connections always override any installed policy. Those
+    connections could easily install their own policies, so there is no
+    reason to enforce installed policies.
+  * Connections can always talk to connections of the same user. This
+    includes broadcast messages.
+
+Custom endpoints have stricter policies. The following rules apply:
+
+  * Policy rules are always enforced, even if the connection is a privileged
+    connection.
+  * Policy rules are always enforced for TALK access, even if both ends are
+    running under the same user. This includes broadcast messages.
+  * To restrict the set of names that can be seen, endpoint policies can
+    install "SEE" policies.
+
+
+12. Pool
+===============================================================================
+
+A pool for data received from the kernel is installed for every connection of
+the bus, and is sized according to the information stored in the
+KDBUS_ITEM_BLOOM_PARAMETER item that is returned by KDBUS_CMD_HELLO.
+
+The pool is written to by the kernel when one of the following ioctls is issued:
+
+  * KDBUS_CMD_HELLO, to receive details about the bus the connection was made to
+  * KDBUS_CMD_RECV, to receive a message
+  * KDBUS_CMD_NAME_LIST, to dump the name registry
+  * KDBUS_CMD_CONN_INFO, to retrieve information on a connection
+
+The offsets returned by either one of the aforementioned ioctls describe offsets
+inside the pool. In order to make the slice available for subsequent calls,
+KDBUS_CMD_FREE has to be called on the offset.
+
+To access the memory, the caller is expected to mmap() it to its task, like
+this:
+
+  /*
+   * POOL_SIZE has to be a multiple of PAGE_SIZE, and it must match the
+   * value that was previously returned through the KDBUS_ITEM_BLOOM_PARAMETER
+   * item field when the KDBUS_CMD_HELLO ioctl returned.
+   */
+
+  buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, conn_fd, 0);
+
+Alternatively, instead of mapping the entire pool buffer, only parts of it can
+be mapped. The length of the response is returned by the kernel along with the
+offset for each of the ioctls listed above.
+
+
+13. Metadata
+===============================================================================
+
+kdbus records data about the system in certain situations. Such metadata can
+refer to the currently active process (creds, PIDs, current user groups, process
+names and its executable path, cgroup membership, capabilities, security label
+and audit information), connection information (description string, currently
+owned names) and the timestamp.
+
+Metadata is collected in the following circumstances:
+
+  * When a bus is created (KDBUS_CMD_MAKE), information about the calling task
+    is collected. This data is returned by the kernel via the
+    KDBUS_CMD_BUS_CREATOR_INFO call-
+
+  * When a connection is created (KDBUS_CMD_HELLO), information about the
+    calling task is collected. Alternatively, a privileged connection may
+    provide 'faked' information about credentials, PIDs and a security labels
+    which will be taken instead. This data is returned by the kernel as
+    information on a connection (KDBUS_CMD_CONN_INFO).
+
+  * When a message is sent (KDBUS_CMD_SEND), information about the sending task
+    and the sending connection are collected. This metadata will be attached
+    to the message when it arrives in the receiver's pool. If the connection
+    sending the message installed faked credentials (see above), the message
+    will not be augmented by any information about the currently sending task.
+
+Which metadata items are actually delivered depends on the following sets and
+masks:
+
+    (a) the system-wide kmod creds mask (module parameter 'attach_flags_mask')
+    (b) the per-connection send creds mask, set by the connecting client
+    (c) the per-connection receive creds mask, set by the connecting client
+    (d) the per-bus minimal creds mask, set by the bus creator
+    (e) the per-bus owner creds mask, set by the bus creator
+    (f) the mask specified when querying creds of a bus peer
+    (g) the mask specified when querying creds of a bus owner
+
+With the following rules:
+
+    [1] The creds attached to messages are determined as (a & b & c).
+    [2] When connecting to a bus (KDBUS_CMD_HELLO), and (~b & d) != 0, the call
+        will fail, the connection is refused.
+    [3] When querying creds of a bus peer, the creds returned are  (a & b & f)
+    [4] When querying creds of a bus owner, the creds returned are (a & e & g)
+    [5] When creating a new bus, and (d & ~a) != 0, then bus creation will fail
+
+Hence, userspace might not always get all requested metadata items that it
+requested. Code must be written so that it can cope with this fact.
+
+
+13.1 Known item types
+---------------------
+
+The following attach flags are currently supported.
+
+  KDBUS_ATTACH_TIMESTAMP
+    Attaches an item of type KDBUS_ITEM_TIMESTAMP which contains both the
+    monotonic and the realtime timestamp, taken when the message was
+    processed on the kernel side.
+
+  KDBUS_ATTACH_CREDS
+    Attaches an item of type KDBUS_ITEM_CREDS, containing credentials as
+    described in struct kdbus_creds: the user and group IDs in the usual four
+    flavors: real, effective, saved and file-system related.
+
+  KDBUS_ATTACH_PIDS
+    Attaches an item of type KDBUS_ITEM_PIDS, containing information on the
+    process. In particular, the PID (process ID), TID (thread ID), and PPID
+    (PID of the parent process).
+
+  KDBUS_ATTACH_AUXGROUPS
+    Attaches an item of type KDBUS_ITEM_AUXGROUPS, containing a dynamic
+    number of auxiliary groups the sending task was a member of.
+
+  KDBUS_ATTACH_NAMES
+    Attaches items of type KDBUS_ITEM_OWNED_NAME, one for each name the sending
+    connection currently owns. The name and flags are stored in kdbus_item.name
+    for each of them.
+
+  KDBUS_ATTACH_TID_COMM [*]
+    Attaches an items of type KDBUS_ITEM_TID_COMM, transporting the sending
+    task's 'comm', for the tid.  The string is stored in kdbus_item.str.
+
+  KDBUS_ATTACH_PID_COMM [*]
+    Attaches an items of type KDBUS_ITEM_PID_COMM, transporting the sending
+    task's 'comm', for the pid.  The string is stored in kdbus_item.str.
+
+  KDBUS_ATTACH_EXE [*]
+    Attaches an item of type KDBUS_ITEM_EXE, containing the path to the
+    executable of the sending task, stored in kdbus_item.str.
+
+  KDBUS_ATTACH_CMDLINE [*]
+    Attaches an item of type KDBUS_ITEM_CMDLINE, containing the command line
+    arguments of the sending task, as an array of strings, stored in
+    kdbus_item.str.
+
+  KDBUS_ATTACH_CGROUP
+    Attaches an item of type KDBUS_ITEM_CGROUP with the task's cgroup path.
+
+  KDBUS_ATTACH_CAPS
+    Attaches an item of type KDBUS_ITEM_CAPS, carrying sets of capabilities
+    that should be accessed via kdbus_item.caps.caps. Also, userspace should
+    be written in a way that it takes kdbus_item.caps.last_cap into account,
+    and derive the number of sets and rows from the item size and the reported
+    number of valid capability bits.
+
+  KDBUS_ATTACH_SECLABEL
+    Attaches an item of type KDBUS_ITEM_SECLABEL, which contains the SELinux
+    security label of the sending task. SELinux and other MACs might want to
+    do additional per-service security checks. For example, a service manager
+    might want to check the security label of a service file against the
+    security label of the client process checking the SELinux database before
+    allowing access.  The label can be accessed via kdbus_item->str.
+
+  KDBUS_ATTACH_AUDIT
+    Attaches an item of type KDBUS_ITEM_AUDIT, which contains the audit
+    sessionid and loginuid of the sending task. Access via kdbus_item->audit.
+
+  KDBUS_ATTACH_CONN_DESCRIPTION
+    Attaches an item of type KDBUS_ITEM_CONN_DESCRIPTION that contains a
+    descriptive string of the sending peer. That string can be supplied
+    during HELLO by attaching an item of type KDBUS_ITEM_CONN_DESCRIPTION.
+
+
+[*] Note that the content stored in these items can easily be tampered by
+    the sending tasks. Therefore, they should NOT be used for any sort of
+    security relevant assumptions. The only reason why they are transmitted is
+    to let receivers know about details that were set when metadata was
+    collected, even though the task they were collected from is not active any
+    longer when the items are received.
+
+
+13.2 Metadata and namespaces
+----------------------------
+
+Metadata such as PIDs, UIDs or GIDs are automatically translated to the
+namespaces of the task that receives them.
+
+
+14. Error codes
+===============================================================================
+
+Below is a list of error codes that might be returned by the individual
+ioctl commands. The list focuses on the return values from kdbus code itself,
+and might not cover those of all kernel internal functions.
+
+For all ioctls:
+
+  -ENOMEM	The kernel memory is exhausted
+  -ENOTTY	Illegal ioctl command issued for the file descriptor
+  -ENOSYS	The requested functionality is not available
+  -EINVAL	Unsupported item attached to command
+
+For all ioctls that carry a struct as payload:
+
+  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
+		inaccessible from the kernel side.
+  -EINVAL	The size inside the supplied struct was smaller than expected
+  -EMSGSIZE	The size inside the supplied struct was bigger than expected
+  -ENAMETOOLONG	A supplied name is larger than the allowed maximum size
+
+For KDBUS_CMD_BUS_MAKE:
+
+  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid or
+		the supplied name does not start with the current uid and a '-'
+  -EEXIST	A bus of that name already exists
+  -ESHUTDOWN	The domain for the bus is already shut down
+  -EMFILE	The maximum number of buses for the current user is exhausted
+
+For KDBUS_CMD_ENDPOINT_MAKE:
+
+  -EPERM	The calling user is not privileged (see Terminology)
+  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid
+  -EEXIST	An endpoint of that name already exists
+
+For KDBUS_CMD_HELLO:
+
+  -EFAULT	The supplied pool size was 0 or not a multiple of the page size
+  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid, or
+		an illegal combination of KDBUS_HELLO_MONITOR,
+		KDBUS_HELLO_ACTIVATOR and KDBUS_HELLO_POLICY_HOLDER was passed
+		in the flags, or an invalid set of items was supplied
+  -ECONNREFUSED	The attach_flags_send field did not satisfy the requirements of
+		the bus
+  -EPERM	An KDBUS_ITEM_CREDS items was supplied, but the current user is
+		not privileged
+  -ESHUTDOWN	The bus has already been shut down
+  -EMFILE	The maximum number of connection on the bus has been reached
+  -EOPNOTSUPP	The endpoint does not support the connection flags
+		supplied in the kdbus_cmd_hello struct
+
+For KDBUS_CMD_BYEBYE:
+
+  -EALREADY	The connection has already been shut down
+  -EBUSY	There are still messages queued up in the connection's pool
+
+For KDBUS_CMD_SEND:
+
+  -EOPNOTSUPP	The connection is not an ordinary connection, or the passed
+		file descriptors are either kdbus handles or unix domain
+		sockets. Both are currently unsupported
+  -EINVAL	The submitted payload type is KDBUS_PAYLOAD_KERNEL,
+		KDBUS_MSG_EXPECT_REPLY was set without timeout or cookie
+		values, KDBUS_MSG_SYNC_REPLY was set without
+		KDBUS_MSG_EXPECT_REPLY, an invalid item was supplied,
+		src_id was != 0 and different from the current connection's ID,
+		a supplied memfd had a size of 0, a string was not properly
+		null-terminated
+  -ENOTUNIQ	The supplied destination is KDBUS_DST_ID_BROADCAST, a file
+		descriptor was passed, KDBUS_MSG_EXPECT_REPLY was set,
+		or a timeout was given for a broadcast message
+  -E2BIG	Too many items
+  -EMSGSIZE	The size of the message header and items or the payload vector
+		is too big.
+  -EEXIST	Multiple KDBUS_ITEM_FDS, KDBUS_ITEM_BLOOM_FILTER or
+		KDBUS_ITEM_DST_NAME items were supplied
+  -EBADF	The supplied KDBUS_ITEM_FDS or KDBUS_MSG_PAYLOAD_MEMFD items
+		contained an illegal file descriptor
+  -EMEDIUMTYPE	The supplied memfd is not a sealed kdbus memfd
+  -EMFILE	Too many file descriptors inside a KDBUS_ITEM_FDS
+  -EBADMSG	An item had illegal size, both a dst_id and a
+		KDBUS_ITEM_DST_NAME was given, or both a name and a bloom
+		filter was given
+  -ETXTBSY	The supplied kdbus memfd file cannot be sealed or the seal
+		was removed, because it is shared with other processes or
+		still mmap()ed
+  -ECOMM	A peer does not accept the file descriptors addressed to it
+  -EFAULT	The supplied bloom filter size was not 64-bit aligned
+  -EDOM		The supplied bloom filter size did not match the bloom filter
+		size of the bus
+  -EDESTADDRREQ	dst_id was set to KDBUS_DST_ID_NAME, but no KDBUS_ITEM_DST_NAME
+		was attached
+  -ESRCH	The name to look up was not found in the name registry
+  -EADDRNOTAVAIL KDBUS_MSG_NO_AUTO_START was given but the destination
+		 connection is an activator.
+  -ENXIO	The passed numeric destination connection ID couldn't be found,
+		or is not connected
+  -ECONNRESET	The destination connection is no longer active
+  -ETIMEDOUT	Timeout while synchronously waiting for a reply
+  -EINTR	System call interrupted while synchronously waiting for a reply
+  -EPIPE	When sending a message, a synchronous reply from the receiving
+		connection was expected but the connection died before
+		answering
+  -ENOBUFS	Too many pending messages on the receiver side
+  -EREMCHG	Both a well-known name and a unique name (ID) was given, but
+		the name is not currently owned by that connection.
+  -EXFULL	The memory pool of the receiver is full
+  -EREMOTEIO	While synchronously waiting for a reply, the remote peer
+		failed with an I/O error.
+
+For KDBUS_CMD_RECV:
+
+  -EINVAL	Invalid flags or offset
+  -EAGAIN	No message found in the queue
+  -ENOMSG	No message of the requested priority found
+  -EOVERFLOW	Broadcast messages have been lost
+
+For KDBUS_CMD_FREE:
+
+  -ENXIO	No pool slice found at given offset
+  -EINVAL	Invalid flags provided, the offset is valid, but the user is
+		not allowed to free the slice. This happens, for example, if
+		the offset was retrieved with KDBUS_RECV_PEEK.
+
+For KDBUS_CMD_NAME_ACQUIRE:
+
+  -EINVAL	Illegal command flags, illegal name provided, or an activator
+		tried to acquire a second name
+  -EPERM	Policy prohibited name ownership
+  -EALREADY	Connection already owns that name
+  -EEXIST	The name already exists and can not be taken over
+  -E2BIG	The maximum number of well-known names per connection
+		is exhausted
+  -ECONNRESET	The connection was reset during the call
+
+For KDBUS_CMD_NAME_RELEASE:
+
+  -EINVAL	Invalid command flags, or invalid name provided
+  -ESRCH	Name is not found found in the registry
+  -EADDRINUSE	Name is owned by a different connection and can't be released
+
+For KDBUS_CMD_NAME_LIST:
+
+  -EINVAL	Invalid flags
+  -ENOBUFS	No available memory in the connection's pool.
+
+For KDBUS_CMD_CONN_INFO:
+
+  -EINVAL	Invalid flags, or neither an ID nor a name was provided,
+		or the name is invalid.
+  -ESRCH	Connection lookup by name failed
+  -ENXIO	No connection with the provided connection ID found
+
+For KDBUS_CMD_CONN_UPDATE:
+
+  -EINVAL	Illegal flags or items
+  -EOPNOTSUPP	Operation not supported by connection.
+  -E2BIG	Too many policy items attached
+  -EINVAL	Wildcards submitted in policy entries, or illegal sequence
+		of policy items
+
+For KDBUS_CMD_ENDPOINT_UPDATE:
+
+  -E2BIG	Too many policy items attached
+  -EINVAL	Invalid flags, or wildcards submitted in policy entries,
+		or illegal sequence of policy items
+
+For KDBUS_CMD_MATCH_ADD:
+
+  -EINVAL	Illegal flags or items
+  -EDOM		Illegal bloom filter size
+  -EMFILE	Too many matches for this connection
+
+For KDBUS_CMD_MATCH_REMOVE:
+
+  -EINVAL	Illegal flags
+  -ENOENT	A match entry with the given cookie could not be found.
+
+
+15. Internal object relations
+===============================================================================
+
+This is a simplified outline of the internal kdbus object relations, for
+those interested in the inner life of the driver implementation.
+
+From the a mount point's (domain's) perspective:
+
+struct kdbus_domain
+  |» struct kdbus_domain_user *user (many, owned)
+  '» struct kdbus_node node (embedded)
+      |» struct kdbus_node children (many, referenced)
+      |» struct kdbus_node *parent (pinned)
+      '» struct kdbus_bus (many, pinned)
+          |» struct kdbus_node node (embedded)
+          '» struct kdbus_ep (many, pinned)
+              |» struct kdbus_node node (embedded)
+              |» struct kdbus_bus *bus (pinned)
+              |» struct kdbus_conn conn_list (many, pinned)
+              |   |» struct kdbus_ep *ep (pinned)
+              |   |» struct kdbus_name_entry *activator_of (owned)
+              |   |» struct kdbus_match_db *match_db (owned)
+              |   |» struct kdbus_meta *meta (owned)
+              |   |» struct kdbus_match_db *match_db (owned)
+              |   |    '» struct kdbus_match_entry (many, owned)
+              |   |
+              |   |» struct kdbus_pool *pool (owned)
+              |   |    '» struct kdbus_pool_slice *slices (many, owned)
+              |   |       '» struct kdbus_pool *pool (pinned)
+              |   |
+              |   |» struct kdbus_domain_user *user (pinned)
+              |   `» struct kdbus_queue_entry entries (many, embedded)
+              |        |» struct kdbus_pool_slice *slice (pinned)
+              |        |» struct kdbus_conn_reply *reply (owned)
+              |        '» struct kdbus_domain_user *user (pinned)
+              |
+              '» struct kdbus_domain_user *user (pinned)
+                  '» struct kdbus_policy_db policy_db (embedded)
+                       |» struct kdbus_policy_db_entry (many, owned)
+                       |   |» struct kdbus_conn (pinned)
+                       |   '» struct kdbus_ep (pinned)
+                       |
+                       '» struct kdbus_policy_db_cache_entry (many, owned)
+                           '» struct kdbus_conn (pinned)
+
+
+For the life-time of a file descriptor derived from calling open() on a file
+inside the mount point:
+
+struct kdbus_handle
+  |» struct kdbus_meta *meta (owned)
+  |» struct kdbus_ep *ep (pinned)
+  |» struct kdbus_conn *conn (owned)
+  '» struct kdbus_ep *ep (owned)
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 02/13] kdbus: add header file
  2015-01-16 19:16 ` Greg Kroah-Hartman
  (?)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds the header file which describes the low-level
transport protocol used by various ioctls. The header file is located
in include/uapi/linux/ as it is shared between kernel and userspace,
and it only contains data structure definitionsi, enums and #defines
for constants.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/uapi/linux/Kbuild  |    1 +
 include/uapi/linux/kdbus.h | 1049 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1050 insertions(+)
 create mode 100644 include/uapi/linux/kdbus.h

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 00b100023c47..7c6d0cfe28f0 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -213,6 +213,7 @@ header-y += ixjuser.h
 header-y += jffs2.h
 header-y += joystick.h
 header-y += kcmp.h
+header-y += kdbus.h
 header-y += kdev_t.h
 header-y += kd.h
 header-y += kernelcapi.h
diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
new file mode 100644
index 000000000000..bcf6b639a664
--- /dev/null
+++ b/include/uapi/linux/kdbus.h
@@ -0,0 +1,1049 @@
+/*
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef _KDBUS_UAPI_H_
+#define _KDBUS_UAPI_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define KDBUS_IOCTL_MAGIC		0x95
+#define KDBUS_SRC_ID_KERNEL		(0)
+#define KDBUS_DST_ID_NAME		(0)
+#define KDBUS_MATCH_ID_ANY		(~0ULL)
+#define KDBUS_DST_ID_BROADCAST		(~0ULL)
+#define KDBUS_FLAG_KERNEL		(1ULL << 63)
+
+/**
+ * struct kdbus_notify_id_change - name registry change message
+ * @id:			New or former owner of the name
+ * @flags:		flags field from KDBUS_HELLO_*
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ *   KDBUS_ITEM_ID_ADD
+ *   KDBUS_ITEM_ID_REMOVE
+ */
+struct kdbus_notify_id_change {
+	__u64 id;
+	__u64 flags;
+};
+
+/**
+ * struct kdbus_notify_name_change - name registry change message
+ * @old_id:		ID and flags of former owner of a name
+ * @new_id:		ID and flags of new owner of a name
+ * @name:		Well-known name
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ *   KDBUS_ITEM_NAME_ADD
+ *   KDBUS_ITEM_NAME_REMOVE
+ *   KDBUS_ITEM_NAME_CHANGE
+ */
+struct kdbus_notify_name_change {
+	struct kdbus_notify_id_change old_id;
+	struct kdbus_notify_id_change new_id;
+	char name[0];
+};
+
+/**
+ * struct kdbus_creds - process credentials
+ * @uid:		User ID
+ * @euid:		Effective UID
+ * @suid:		Saved UID
+ * @fsuid:		Filesystem UID
+ * @gid:		Group ID
+ * @egid:		Effective GID
+ * @sgid:		Saved GID
+ * @fsgid:		Filesystem GID
+ *
+ * Attached to:
+ *   KDBUS_ITEM_CREDS
+ */
+struct kdbus_creds {
+	__u32 uid;
+	__u32 euid;
+	__u32 suid;
+	__u32 fsuid;
+	__u32 gid;
+	__u32 egid;
+	__u32 sgid;
+	__u32 fsgid;
+};
+
+/**
+ * struct kdbus_pids - process identifiers
+ * @pid:		Process ID
+ * @tid:		Thread ID
+ * @ppid:		Parent process ID
+ *
+ * The PID and TID of a process.
+ *
+ * Attached to:
+ *   KDBUS_ITEM_PIDS
+ */
+struct kdbus_pids {
+	__u64 pid;
+	__u64 tid;
+	__u64 ppid;
+};
+
+/**
+ * struct kdbus_caps - process capabilities
+ * @last_cap:	Highest currently known capability bit
+ * @caps:	Variable number of 32-bit capabilities flags
+ *
+ * Contains a variable number of 32-bit capabilities flags.
+ *
+ * Attached to:
+ *   KDBUS_ITEM_CAPS
+ */
+struct kdbus_caps {
+	__u32 last_cap;
+	__u32 caps[0];
+};
+
+/**
+ * struct kdbus_audit - audit information
+ * @sessionid:		The audit session ID
+ * @loginuid:		The audit login uid
+ *
+ * Attached to:
+ *   KDBUS_ITEM_AUDIT
+ */
+struct kdbus_audit {
+	__u32 sessionid;
+	__u32 loginuid;
+};
+
+/**
+ * struct kdbus_timestamp
+ * @seqnum:		Global per-domain message sequence number
+ * @monotonic_ns:	Monotonic timestamp, in nanoseconds
+ * @realtime_ns:	Realtime timestamp, in nanoseconds
+ *
+ * Attached to:
+ *   KDBUS_ITEM_TIMESTAMP
+ */
+struct kdbus_timestamp {
+	__u64 seqnum;
+	__u64 monotonic_ns;
+	__u64 realtime_ns;
+};
+
+/**
+ * struct kdbus_vec - I/O vector for kdbus payload items
+ * @size:		The size of the vector
+ * @address:		Memory address of data buffer
+ * @offset:		Offset in the in-message payload memory,
+ *			relative to the message head
+ *
+ * Attached to:
+ *   KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
+ */
+struct kdbus_vec {
+	__u64 size;
+	union {
+		__u64 address;
+		__u64 offset;
+	};
+};
+
+/**
+ * struct kdbus_bloom_parameter - bus-wide bloom parameters
+ * @size:		Size of the bit field in bytes (m / 8)
+ * @n_hash:		Number of hash functions used (k)
+ */
+struct kdbus_bloom_parameter {
+	__u64 size;
+	__u64 n_hash;
+};
+
+/**
+ * struct kdbus_bloom_filter - bloom filter containing n elements
+ * @generation:		Generation of the element set in the filter
+ * @data:		Bit field, multiple of 8 bytes
+ */
+struct kdbus_bloom_filter {
+	__u64 generation;
+	__u64 data[0];
+};
+
+/**
+ * struct kdbus_memfd - a kdbus memfd
+ * @start:		The offset into the memfd where the segment starts
+ * @size:		The size of the memfd segment
+ * @fd:			The file descriptor number
+ * @__pad:		Padding to ensure proper alignment and size
+ *
+ * Attached to:
+ *   KDBUS_ITEM_PAYLOAD_MEMFD
+ */
+struct kdbus_memfd {
+	__u64 start;
+	__u64 size;
+	int fd;
+	__u32 __pad;
+};
+
+/**
+ * struct kdbus_name - a registered well-known name with its flags
+ * @flags:		Flags from KDBUS_NAME_*
+ * @name:		Well-known name
+ *
+ * Attached to:
+ *   KDBUS_ITEM_OWNED_NAME
+ */
+struct kdbus_name {
+	__u64 flags;
+	char name[0];
+};
+
+/**
+ * struct kdbus_policy_access - policy access item
+ * @type:		One of KDBUS_POLICY_ACCESS_* types
+ * @access:		Access to grant
+ * @id:			For KDBUS_POLICY_ACCESS_USER, the uid
+ *			For KDBUS_POLICY_ACCESS_GROUP, the gid
+ */
+struct kdbus_policy_access {
+	__u64 type;	/* USER, GROUP, WORLD */
+	__u64 access;	/* OWN, TALK, SEE */
+	__u64 id;	/* uid, gid, 0 */
+};
+
+/**
+ * enum kdbus_item_type - item types to chain data in a list
+ * @_KDBUS_ITEM_NULL:			Uninitialized/invalid
+ * @_KDBUS_ITEM_USER_BASE:		Start of user items
+ * @KDBUS_ITEM_PAYLOAD_VEC:		Vector to data
+ * @KDBUS_ITEM_PAYLOAD_OFF:		Data at returned offset to message head
+ * @KDBUS_ITEM_PAYLOAD_MEMFD:		Data as sealed memfd
+ * @KDBUS_ITEM_FDS:			Attached file descriptors
+ * @KDBUS_ITEM_CANCEL_FD:		FD used to cancel a synchronous
+ *					operation by writing to it from
+ *					userspace
+ * @KDBUS_ITEM_BLOOM_PARAMETER:		Bus-wide bloom parameters, used with
+ *					KDBUS_CMD_BUS_MAKE, carries a
+ *					struct kdbus_bloom_parameter
+ * @KDBUS_ITEM_BLOOM_FILTER:		Bloom filter carried with a message,
+ *					used to match against a bloom mask of a
+ *					connection, carries a struct
+ *					kdbus_bloom_filter
+ * @KDBUS_ITEM_BLOOM_MASK:		Bloom mask used to match against a
+ *					message'sbloom filter
+ * @KDBUS_ITEM_DST_NAME:		Destination's well-known name
+ * @KDBUS_ITEM_MAKE_NAME:		Name of domain, bus, endpoint
+ * @KDBUS_ITEM_ATTACH_FLAGS_SEND:	Attach-flags, used for updating which
+ *					metadata a connection opts in to send
+ * @KDBUS_ITEM_ATTACH_FLAGS_RECV:	Attach-flags, used for updating which
+ *					metadata a connection requests to
+ *					receive for each reeceived message
+ * @KDBUS_ITEM_ID:			Connection ID
+ * @KDBUS_ITEM_NAME:			Well-know name with flags
+ * @_KDBUS_ITEM_ATTACH_BASE:		Start of metadata attach items
+ * @KDBUS_ITEM_TIMESTAMP:		Timestamp
+ * @KDBUS_ITEM_CREDS:			Process credentials
+ * @KDBUS_ITEM_PIDS:			Process identifiers
+ * @KDBUS_ITEM_AUXGROUPS:		Auxiliary process groups
+ * @KDBUS_ITEM_OWNED_NAME:		A name owned by the associated
+ *					connection
+ * @KDBUS_ITEM_TID_COMM:		Thread ID "comm" identifier
+ *					(Don't trust this, see below.)
+ * @KDBUS_ITEM_PID_COMM:		Process ID "comm" identifier
+ *					(Don't trust this, see below.)
+ * @KDBUS_ITEM_EXE:			The path of the executable
+ *					(Don't trust this, see below.)
+ * @KDBUS_ITEM_CMDLINE:			The process command line
+ *					(Don't trust this, see below.)
+ * @KDBUS_ITEM_CGROUP:			The croup membership
+ * @KDBUS_ITEM_CAPS:			The process capabilities
+ * @KDBUS_ITEM_SECLABEL:		The security label
+ * @KDBUS_ITEM_AUDIT:			The audit IDs
+ * @KDBUS_ITEM_CONN_DESCRIPTION:	The connection's human-readable name
+ *					(debugging)
+ * @_KDBUS_ITEM_POLICY_BASE:		Start of policy items
+ * @KDBUS_ITEM_POLICY_ACCESS:		Policy access block
+ * @_KDBUS_ITEM_KERNEL_BASE:		Start of kernel-generated message items
+ * @KDBUS_ITEM_NAME_ADD:		Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_REMOVE:		Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_CHANGE:		Notification in kdbus_notify_name_change
+ * @KDBUS_ITEM_ID_ADD:			Notification in kdbus_notify_id_change
+ * @KDBUS_ITEM_ID_REMOVE:		Notification in kdbus_notify_id_change
+ * @KDBUS_ITEM_REPLY_TIMEOUT:		Timeout has been reached
+ * @KDBUS_ITEM_REPLY_DEAD:		Destination died
+ *
+ * N.B: The process and thread COMM fields, as well as the CMDLINE and
+ * EXE fields may be altered by unprivileged processes und should
+ * hence *not* used for security decisions. Peers should make use of
+ * these items only for informational purposes, such as generating log
+ * records.
+ */
+enum kdbus_item_type {
+	_KDBUS_ITEM_NULL,
+	_KDBUS_ITEM_USER_BASE,
+	KDBUS_ITEM_PAYLOAD_VEC	= _KDBUS_ITEM_USER_BASE,
+	KDBUS_ITEM_PAYLOAD_OFF,
+	KDBUS_ITEM_PAYLOAD_MEMFD,
+	KDBUS_ITEM_FDS,
+	KDBUS_ITEM_CANCEL_FD,
+	KDBUS_ITEM_BLOOM_PARAMETER,
+	KDBUS_ITEM_BLOOM_FILTER,
+	KDBUS_ITEM_BLOOM_MASK,
+	KDBUS_ITEM_DST_NAME,
+	KDBUS_ITEM_MAKE_NAME,
+	KDBUS_ITEM_ATTACH_FLAGS_SEND,
+	KDBUS_ITEM_ATTACH_FLAGS_RECV,
+	KDBUS_ITEM_ID,
+	KDBUS_ITEM_NAME,
+
+	/* keep these item types in sync with KDBUS_ATTACH_* flags */
+	_KDBUS_ITEM_ATTACH_BASE	= 0x1000,
+	KDBUS_ITEM_TIMESTAMP	= _KDBUS_ITEM_ATTACH_BASE,
+	KDBUS_ITEM_CREDS,
+	KDBUS_ITEM_PIDS,
+	KDBUS_ITEM_AUXGROUPS,
+	KDBUS_ITEM_OWNED_NAME,
+	KDBUS_ITEM_TID_COMM,
+	KDBUS_ITEM_PID_COMM,
+	KDBUS_ITEM_EXE,
+	KDBUS_ITEM_CMDLINE,
+	KDBUS_ITEM_CGROUP,
+	KDBUS_ITEM_CAPS,
+	KDBUS_ITEM_SECLABEL,
+	KDBUS_ITEM_AUDIT,
+	KDBUS_ITEM_CONN_DESCRIPTION,
+
+	_KDBUS_ITEM_POLICY_BASE	= 0x2000,
+	KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
+
+	_KDBUS_ITEM_KERNEL_BASE	= 0x8000,
+	KDBUS_ITEM_NAME_ADD	= _KDBUS_ITEM_KERNEL_BASE,
+	KDBUS_ITEM_NAME_REMOVE,
+	KDBUS_ITEM_NAME_CHANGE,
+	KDBUS_ITEM_ID_ADD,
+	KDBUS_ITEM_ID_REMOVE,
+	KDBUS_ITEM_REPLY_TIMEOUT,
+	KDBUS_ITEM_REPLY_DEAD,
+};
+
+/**
+ * struct kdbus_item - chain of data blocks
+ * @size:		Overall data record size
+ * @type:		Kdbus_item type of data
+ * @data:		Generic bytes
+ * @data32:		Generic 32 bit array
+ * @data64:		Generic 64 bit array
+ * @str:		Generic string
+ * @id:			Connection ID
+ * @vec:		KDBUS_ITEM_PAYLOAD_VEC
+ * @creds:		KDBUS_ITEM_CREDS
+ * @audit:		KDBUS_ITEM_AUDIT
+ * @timestamp:		KDBUS_ITEM_TIMESTAMP
+ * @name:		KDBUS_ITEM_NAME
+ * @bloom_parameter:	KDBUS_ITEM_BLOOM_PARAMETER
+ * @bloom_filter:	KDBUS_ITEM_BLOOM_FILTER
+ * @memfd:		KDBUS_ITEM_PAYLOAD_MEMFD
+ * @name_change:	KDBUS_ITEM_NAME_ADD
+ *			KDBUS_ITEM_NAME_REMOVE
+ *			KDBUS_ITEM_NAME_CHANGE
+ * @id_change:		KDBUS_ITEM_ID_ADD
+ *			KDBUS_ITEM_ID_REMOVE
+ * @policy:		KDBUS_ITEM_POLICY_ACCESS
+ */
+struct kdbus_item {
+	__u64 size;
+	__u64 type;
+	union {
+		__u8 data[0];
+		__u32 data32[0];
+		__u64 data64[0];
+		char str[0];
+
+		__u64 id;
+		struct kdbus_vec vec;
+		struct kdbus_creds creds;
+		struct kdbus_pids pids;
+		struct kdbus_audit audit;
+		struct kdbus_caps caps;
+		struct kdbus_timestamp timestamp;
+		struct kdbus_name name;
+		struct kdbus_bloom_parameter bloom_parameter;
+		struct kdbus_bloom_filter bloom_filter;
+		struct kdbus_memfd memfd;
+		int fds[0];
+		struct kdbus_notify_name_change name_change;
+		struct kdbus_notify_id_change id_change;
+		struct kdbus_policy_access policy_access;
+	};
+};
+
+/**
+ * struct kdbus_item_list - A list of items
+ * @size:		The total size of the structure
+ * @items:		Array of items
+ */
+struct kdbus_item_list {
+	__u64 size;
+	struct kdbus_item items[0];
+};
+
+/**
+ * enum kdbus_msg_flags - type of message
+ * @KDBUS_MSG_EXPECT_REPLY:	Expect a reply message, used for
+ *				method calls. The userspace-supplied
+ *				cookie identifies the message and the
+ *				respective reply carries the cookie
+ *				in cookie_reply
+ * @KDBUS_MSG_NO_AUTO_START:	Do not start a service, if the addressed
+ *				name is not currently active
+ * @KDBUS_MSG_SIGNAL:		Treat this message as signal
+ */
+enum kdbus_msg_flags {
+	KDBUS_MSG_EXPECT_REPLY	= 1ULL << 0,
+	KDBUS_MSG_NO_AUTO_START	= 1ULL << 1,
+	KDBUS_MSG_SIGNAL	= 1ULL << 2,
+};
+
+/**
+ * enum kdbus_payload_type - type of payload carried by message
+ * @KDBUS_PAYLOAD_KERNEL:	Kernel-generated simple message
+ * @KDBUS_PAYLOAD_DBUS:		D-Bus marshalling "DBusDBus"
+ *
+ * Any payload-type is accepted. Common types will get added here once
+ * established.
+ */
+enum kdbus_payload_type {
+	KDBUS_PAYLOAD_KERNEL,
+	KDBUS_PAYLOAD_DBUS	= 0x4442757344427573ULL,
+};
+
+/**
+ * struct kdbus_msg - the representation of a kdbus message
+ * @size:		Total size of the message
+ * @flags:		Message flags (KDBUS_MSG_*), userspace → kernel
+ * @priority:		Message queue priority value
+ * @dst_id:		64-bit ID of the destination connection
+ * @src_id:		64-bit ID of the source connection
+ * @payload_type:	Payload type (KDBUS_PAYLOAD_*)
+ * @cookie:		Userspace-supplied cookie, for the connection
+ *			to identify its messages
+ * @timeout_ns:		The time to wait for a message reply from the peer.
+ *			If there is no reply, a kernel-generated message
+ *			with an attached KDBUS_ITEM_REPLY_TIMEOUT item
+ *			is sent to @src_id. The timeout is expected in
+ *			nanoseconds and as absolute CLOCK_MONOTONIC value.
+ * @cookie_reply:	A reply to the requesting message with the same
+ *			cookie. The requesting connection can match its
+ *			request and the reply with this value
+ * @items:		A list of kdbus_items containing the message payload
+ */
+struct kdbus_msg {
+	__u64 size;
+	__u64 flags;
+	__s64 priority;
+	__u64 dst_id;
+	__u64 src_id;
+	__u64 payload_type;
+	__u64 cookie;
+	union {
+		__u64 timeout_ns;
+		__u64 cookie_reply;
+	};
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_msg_info - returned message container
+ * @offset:		Offset of kdbus_msg slice in pool
+ * @msg_size:		Copy of the kdbus_msg.size field
+ * @return_flags:	Command return flags, kernel → userspace
+ */
+struct kdbus_msg_info {
+	__u64 offset;
+	__u64 msg_size;
+	__u64 return_flags;
+};
+
+/**
+ * enum kdbus_send_flags - flags for sending messages
+ * @KDBUS_SEND_SYNC_REPLY:	Wait for destination connection to
+ *				reply to this message. The
+ *				KDBUS_CMD_SEND ioctl() will block
+ *				until the reply is received, and
+ *				offset_reply in struct kdbus_msg will
+ *				yield the offset in the sender's pool
+ *				where the reply can be found.
+ *				This flag is only valid if
+ *				@KDBUS_MSG_EXPECT_REPLY is set as well.
+ */
+enum kdbus_send_flags {
+	KDBUS_SEND_SYNC_REPLY		= 1ULL << 0,
+};
+
+/**
+ * struct kdbus_cmd_send - send message
+ * @size:		Overall size of this structure
+ * @flags:		Flags to change send behavior (KDBUS_SEND_*)
+ * @kernel_flags:	Supported send flags, kernel → userspace
+ * @kernel_msg_flags:	Supported message flags, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @msg_address:	Storage address of the kdbus_msg to send
+ * @reply:		Storage for message reply if KDBUS_SEND_SYNC_REPLY
+ *			was given
+ * @items:		Additional items for this command
+ */
+struct kdbus_cmd_send {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 kernel_msg_flags;
+	__u64 return_flags;
+	__u64 msg_address;
+	struct kdbus_msg_info reply;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_recv_flags - flags for de-queuing messages
+ * @KDBUS_RECV_PEEK:		Return the next queued message without
+ *				actually de-queuing it, and without installing
+ *				any file descriptors or other resources. It is
+ *				usually used to determine the activating
+ *				connection of a bus name.
+ * @KDBUS_RECV_DROP:		Drop and free the next queued message and all
+ *				its resources without actually receiving it.
+ * @KDBUS_RECV_USE_PRIORITY:	Only de-queue messages with the specified or
+ *				higher priority (lowest values); if not set,
+ *				the priority value is ignored.
+ */
+enum kdbus_recv_flags {
+	KDBUS_RECV_PEEK		= 1ULL <<  0,
+	KDBUS_RECV_DROP		= 1ULL <<  1,
+	KDBUS_RECV_USE_PRIORITY	= 1ULL <<  2,
+};
+
+/**
+ * enum kdbus_recv_return_flags - return flags for message receive commands
+ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS:	One or more file descriptors could not
+ *					be installed. These descriptors in
+ *					KDBUS_ITEM_FDS will carry the value -1.
+ */
+enum kdbus_recv_return_flags {
+	KDBUS_RECV_RETURN_INCOMPLETE_FDS	= 1ULL <<  0,
+};
+
+/**
+ * struct kdbus_cmd_recv - struct to de-queue a buffered message
+ * @size:		Overall size of this object
+ * @flags:		KDBUS_RECV_* flags, userspace → kernel
+ * @kernel_flags:	Supported KDBUS_RECV_* flags, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @priority:		Minimum priority of the messages to de-queue. Lowest
+ *			values have the highest priority.
+ * @dropped_msgs:	In case the KDBUS_CMD_RECV ioctl returns
+ *			-EOVERFLOW, this field will contain the number of
+ *			broadcast messages that have been lost since the
+ *			last call.
+ * @msg:		Return storage for received message.
+ * @items:		Additional items for this command.
+ *
+ * This struct is used with the KDBUS_CMD_RECV ioctl.
+ */
+struct kdbus_cmd_recv {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	__s64 priority;
+	__u64 dropped_msgs;
+	struct kdbus_msg_info msg;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
+ * @size:		Overall size of this structure
+ * @offset:		The offset of the memory slice, as returned by other
+ *			ioctls
+ * @flags:		Flags for the free command, userspace → kernel
+ * @return_flags:	Command return flags, kernel → userspace
+ * @kernel_flags:	Supported flags of the free command, userspace → kernel
+ * @items:		Additional items to modify the behavior
+ *
+ * This struct is used with the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_cmd_free {
+	__u64 size;
+	__u64 offset;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_policy_access_type - permissions of a policy record
+ * @_KDBUS_POLICY_ACCESS_NULL:	Uninitialized/invalid
+ * @KDBUS_POLICY_ACCESS_USER:	Grant access to a uid
+ * @KDBUS_POLICY_ACCESS_GROUP:	Grant access to gid
+ * @KDBUS_POLICY_ACCESS_WORLD:	World-accessible
+ */
+enum kdbus_policy_access_type {
+	_KDBUS_POLICY_ACCESS_NULL,
+	KDBUS_POLICY_ACCESS_USER,
+	KDBUS_POLICY_ACCESS_GROUP,
+	KDBUS_POLICY_ACCESS_WORLD,
+};
+
+/**
+ * enum kdbus_policy_access_flags - mode flags
+ * @KDBUS_POLICY_OWN:		Allow to own a well-known name
+ *				Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_TALK:		Allow communication to a well-known name
+ *				Implies KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_SEE:		Allow to see a well-known name
+ */
+enum kdbus_policy_type {
+	KDBUS_POLICY_SEE	= 0,
+	KDBUS_POLICY_TALK,
+	KDBUS_POLICY_OWN,
+};
+
+/**
+ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
+ * @KDBUS_HELLO_ACCEPT_FD:	The connection allows the reception of
+ *				any passed file descriptors
+ * @KDBUS_HELLO_ACTIVATOR:	Special-purpose connection which registers
+ *				a well-know name for a process to be started
+ *				when traffic arrives
+ * @KDBUS_HELLO_POLICY_HOLDER:	Special-purpose connection which registers
+ *				policy entries for a name. The provided name
+ *				is not activated and not registered with the
+ *				name database, it only allows unprivileged
+ *				connections to acquire a name, talk or discover
+ *				a service
+ * @KDBUS_HELLO_MONITOR:	Special-purpose connection to monitor
+ *				bus traffic
+ */
+enum kdbus_hello_flags {
+	KDBUS_HELLO_ACCEPT_FD		=  1ULL <<  0,
+	KDBUS_HELLO_ACTIVATOR		=  1ULL <<  1,
+	KDBUS_HELLO_POLICY_HOLDER	=  1ULL <<  2,
+	KDBUS_HELLO_MONITOR		=  1ULL <<  3,
+};
+
+/**
+ * enum kdbus_attach_flags - flags for metadata attachments
+ * @KDBUS_ATTACH_TIMESTAMP:		Timestamp
+ * @KDBUS_ATTACH_CREDS:			Credentials
+ * @KDBUS_ATTACH_PIDS:			PIDs
+ * @KDBUS_ATTACH_AUXGROUPS:		Auxiliary groups
+ * @KDBUS_ATTACH_NAMES:			Well-known names
+ * @KDBUS_ATTACH_TID_COMM:		The "comm" process identifier of the TID
+ * @KDBUS_ATTACH_PID_COMM:		The "comm" process identifier of the PID
+ * @KDBUS_ATTACH_EXE:			The path of the executable
+ * @KDBUS_ATTACH_CMDLINE:		The process command line
+ * @KDBUS_ATTACH_CGROUP:		The croup membership
+ * @KDBUS_ATTACH_CAPS:			The process capabilities
+ * @KDBUS_ATTACH_SECLABEL:		The security label
+ * @KDBUS_ATTACH_AUDIT:			The audit IDs
+ * @KDBUS_ATTACH_CONN_DESCRIPTION:	The human-readable connection name
+ * @_KDBUS_ATTACH_ALL:			All of the above
+ * @_KDBUS_ATTACH_ANY:			Wildcard match to enable any kind of
+ *					metatdata.
+ */
+enum kdbus_attach_flags {
+	KDBUS_ATTACH_TIMESTAMP		=  1ULL <<  0,
+	KDBUS_ATTACH_CREDS		=  1ULL <<  1,
+	KDBUS_ATTACH_PIDS		=  1ULL <<  2,
+	KDBUS_ATTACH_AUXGROUPS		=  1ULL <<  3,
+	KDBUS_ATTACH_NAMES		=  1ULL <<  4,
+	KDBUS_ATTACH_TID_COMM		=  1ULL <<  5,
+	KDBUS_ATTACH_PID_COMM		=  1ULL <<  6,
+	KDBUS_ATTACH_EXE		=  1ULL <<  7,
+	KDBUS_ATTACH_CMDLINE		=  1ULL <<  8,
+	KDBUS_ATTACH_CGROUP		=  1ULL <<  9,
+	KDBUS_ATTACH_CAPS		=  1ULL << 10,
+	KDBUS_ATTACH_SECLABEL		=  1ULL << 11,
+	KDBUS_ATTACH_AUDIT		=  1ULL << 12,
+	KDBUS_ATTACH_CONN_DESCRIPTION	=  1ULL << 13,
+	_KDBUS_ATTACH_ALL		=  (1ULL << 14) - 1,
+	_KDBUS_ATTACH_ANY		=  ~0ULL
+};
+
+/**
+ * struct kdbus_cmd_hello - struct to say hello to kdbus
+ * @size:		The total size of the structure
+ * @flags:		Connection flags (KDBUS_HELLO_*), userspace → kernel
+ * @kernel_flags:	Supported connection flags, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @attach_flags_send:	Mask of metadata to attach to each message sent
+ *			off by this connection (KDBUS_ATTACH_*)
+ * @attach_flags_recv:	Mask of metadata to attach to each message receieved
+ *			by the new connection (KDBUS_ATTACH_*)
+ * @bus_flags:		The flags field copied verbatim from the original
+ *			KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
+ *			to do negotiation of features of the payload that is
+ *			transferred (kernel → userspace)
+ * @id:			The ID of this connection (kernel → userspace)
+ * @pool_size:		Size of the connection's buffer where the received
+ *			messages are placed
+ * @offset:		Pool offset where additional items of type
+ *			kdbus_item_list are stored. They contain information
+ *			about the bus and the newly created connection.
+ * @items_size:		Copy of item_list.size stored in @offset.
+ * @id128:		Unique 128-bit ID of the bus (kernel → userspace)
+ * @items:		A list of items
+ *
+ * This struct is used with the KDBUS_CMD_HELLO ioctl.
+ */
+struct kdbus_cmd_hello {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	__u64 attach_flags_send;
+	__u64 attach_flags_recv;
+	__u64 bus_flags;
+	__u64 id;
+	__u64 pool_size;
+	__u64 offset;
+	__u64 items_size;
+	__u8 id128[16];
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,EP,NS}_MAKE
+ * @KDBUS_MAKE_ACCESS_GROUP:	Make the bus or endpoint node group-accessible
+ * @KDBUS_MAKE_ACCESS_WORLD:	Make the bus or endpoint node world-accessible
+ */
+enum kdbus_make_flags {
+	KDBUS_MAKE_ACCESS_GROUP		= 1ULL <<  0,
+	KDBUS_MAKE_ACCESS_WORLD		= 1ULL <<  1,
+};
+
+/**
+ * struct kdbus_cmd_make - struct to make a bus, an endpoint or a domain
+ * @size:		The total size of the struct
+ * @flags:		Properties for the bus/ep/domain to create,
+ *			userspace → kernel
+ * @kernel_flags:	Supported flags for the used command, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @items:		Items describing details
+ *
+ * This structure is used with the KDBUS_CMD_BUS_MAKE and
+ * KDBUS_CMD_ENDPOINT_MAKE ioctls.
+ */
+struct kdbus_cmd_make {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_name_flags - properties of a well-known name
+ * @KDBUS_NAME_REPLACE_EXISTING:	Try to replace name of other connections
+ * @KDBUS_NAME_ALLOW_REPLACEMENT:	Allow the replacement of the name
+ * @KDBUS_NAME_QUEUE:			Name should be queued if busy
+ * @KDBUS_NAME_IN_QUEUE:		Name is queued
+ * @KDBUS_NAME_ACTIVATOR:		Name is owned by a activator connection
+ */
+enum kdbus_name_flags {
+	KDBUS_NAME_REPLACE_EXISTING	= 1ULL <<  0,
+	KDBUS_NAME_ALLOW_REPLACEMENT	= 1ULL <<  1,
+	KDBUS_NAME_QUEUE		= 1ULL <<  2,
+	KDBUS_NAME_IN_QUEUE		= 1ULL <<  3,
+	KDBUS_NAME_ACTIVATOR		= 1ULL <<  4,
+};
+
+/**
+ * struct kdbus_cmd_name - struct to describe a well-known name
+ * @size:		The total size of the struct
+ * @flags:		Flags for a name entry (KDBUS_NAME_*),
+ *			userspace → kernel, kernel → userspace
+ * @kernel_flags:	Supported flags for a name entry, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @items:		Item list, containing the well-known name as
+ *			KDBUS_ITEM_NAME
+ *
+ * This structure is used with the KDBUS_CMD_NAME_ACQUIRE ioctl.
+ */
+struct kdbus_cmd_name {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_name_info - struct to describe a well-known name
+ * @size:		The total size of the struct
+ * @conn_flags:		The flags of the owning connection (KDBUS_HELLO_*)
+ * @owner_id:		The current owner of the name
+ * @items:		Item list, containing the well-known name as
+ *			KDBUS_ITEM_OWNED_NAME
+ *
+ * This structure is used as return struct for the KDBUS_CMD_NAME_LIST ioctl.
+ */
+struct kdbus_name_info {
+	__u64 size;
+	__u64 conn_flags;
+	__u64 owner_id;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_name_list - information returned by KDBUS_CMD_NAME_LIST
+ * @size:		The total size of the structure
+ * @names:		A list of names
+ *
+ * Note that the user is responsible for freeing the allocated memory with
+ * the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_name_list {
+	__u64 size;
+	struct kdbus_name_info names[0];
+};
+
+/**
+ * enum kdbus_name_list_flags - what to include into the returned list
+ * @KDBUS_NAME_LIST_UNIQUE:	All active connections
+ * @KDBUS_NAME_LIST_NAMES:	All known well-known names
+ * @KDBUS_NAME_LIST_ACTIVATORS:	All activator connections
+ * @KDBUS_NAME_LIST_QUEUED:	All queued-up names
+ */
+enum kdbus_name_list_flags {
+	KDBUS_NAME_LIST_UNIQUE		= 1ULL <<  0,
+	KDBUS_NAME_LIST_NAMES		= 1ULL <<  1,
+	KDBUS_NAME_LIST_ACTIVATORS	= 1ULL <<  2,
+	KDBUS_NAME_LIST_QUEUED		= 1ULL <<  3,
+};
+
+/**
+ * struct kdbus_cmd_name_list - request a list of name entries
+ * @flags:		Flags for the query (KDBUS_NAME_LIST_*),
+ *			userspace → kernel
+ * @kernel_flags:	Supported flags for queries, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @offset:		The returned offset in the caller's pool buffer.
+ *			The user must use KDBUS_CMD_FREE to free the
+ *			allocated memory.
+ * @list_size:		Returned size of list in bytes
+ * @size:		Output buffer to report size of data at @offset.
+ * @items:		Items for the command. Reserved for future use.
+ *
+ * This structure is used with the KDBUS_CMD_NAME_LIST ioctl.
+ */
+struct kdbus_cmd_name_list {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	__u64 offset;
+	__u64 list_size;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_info - information returned by KDBUS_CMD_*_INFO
+ * @size:		The total size of the struct
+ * @id:			The connection's or bus' 64-bit ID
+ * @flags:		The connection's or bus' flags
+ * @items:		A list of struct kdbus_item
+ *
+ * Note that the user is responsible for freeing the allocated memory with
+ * the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_info {
+	__u64 size;
+	__u64 id;
+	__u64 flags;
+	struct kdbus_item items[0];
+};
+
+/**
+ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
+ * @size:		The total size of the struct
+ * @flags:		KDBUS_ATTACH_* flags, userspace → kernel
+ * @kernel_flags:	Supported KDBUS_ATTACH_* flags, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @id:			The 64-bit ID of the connection. If set to zero, passing
+ *			@name is required. kdbus will look up the name to
+ *			determine the ID in this case.
+ * @offset:		Returned offset in the caller's pool buffer where the
+ *			kdbus_info struct result is stored. The user must
+ *			use KDBUS_CMD_FREE to free the allocated memory.
+ * @info_size:		Output buffer to report size of data at @offset.
+ * @items:		The optional item list, containing the
+ *			well-known name to look up as a KDBUS_ITEM_NAME.
+ *			Only needed in case @id is zero.
+ *
+ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
+ * tell the user the offset in the connection pool buffer at which to find the
+ * result in a struct kdbus_info.
+ */
+struct kdbus_cmd_info {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	__u64 id;
+	__u64 offset;
+	__u64 info_size;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_cmd_update - update flags of a connection
+ * @size:		The total size of the struct
+ * @flags:		Flags for the update command, userspace → kernel
+ * @kernel_flags:	Supported flags for this command, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @items:		A list of struct kdbus_item
+ *
+ * This struct is used with the KDBUS_CMD_CONN_UPDATE ioctl.
+ */
+struct kdbus_cmd_update {
+	__u64 size;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
+ * @KDBUS_MATCH_REPLACE:	If entries with the supplied cookie already
+ *				exists, remove them before installing the new
+ *				matches.
+ */
+enum kdbus_cmd_match_flags {
+	KDBUS_MATCH_REPLACE	= 1ULL <<  0,
+};
+
+/**
+ * struct kdbus_cmd_match - struct to add or remove matches
+ * @size:		The total size of the struct
+ * @cookie:		Userspace supplied cookie. When removing, the cookie
+ *			identifies the match to remove
+ * @flags:		Flags for match command (KDBUS_MATCH_*),
+ *			userspace → kernel
+ * @kernel_flags:	Supported flags of the used command, kernel → userspace
+ * @return_flags:	Command return flags, kernel → userspace
+ * @items:		A list of items for additional information
+ *
+ * This structure is used with the KDBUS_CMD_MATCH_ADD and
+ * KDBUS_CMD_MATCH_REMOVE ioctl.
+ */
+struct kdbus_cmd_match {
+	__u64 size;
+	__u64 cookie;
+	__u64 flags;
+	__u64 kernel_flags;
+	__u64 return_flags;
+	struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * Ioctl API
+ * KDBUS_CMD_BUS_MAKE:		After opening the "control" node, this command
+ *				creates a new bus with the specified
+ *				name. The bus is immediately shut down and
+ *				cleaned up when the opened file descriptor is
+ *				closed.
+ * KDBUS_CMD_ENDPOINT_MAKE:	Creates a new named special endpoint to talk to
+ *				the bus. Such endpoints usually carry a more
+ *				restrictive policy and grant restricted access
+ *				to specific applications.
+ * KDBUS_CMD_HELLO:		By opening the bus node, a connection is
+ *				created. After a HELLO the opened connection
+ *				becomes an active peer on the bus.
+ * KDBUS_CMD_BYEBYE:		Disconnect a connection. If there are no
+ *				messages queued up in the connection's pool,
+ *				the call succeeds, and the handle is rendered
+ *				unusable. Otherwise, -EBUSY is returned without
+ *				any further side-effects.
+ * KDBUS_CMD_SEND:		Send a message and pass data from userspace to
+ *				the kernel.
+ * KDBUS_CMD_RECV:		Receive a message from the kernel which is
+ *				placed in the receiver's pool.
+ * KDBUS_CMD_FREE:		Release the allocated memory in the receiver's
+ *				pool.
+ * KDBUS_CMD_NAME_ACQUIRE:	Request a well-known bus name to associate with
+ *				the connection. Well-known names are used to
+ *				address a peer on the bus.
+ * KDBUS_CMD_NAME_RELEASE:	Release a well-known name the connection
+ *				currently owns.
+ * KDBUS_CMD_NAME_LIST:		Retrieve the list of all currently registered
+ *				well-known and unique names.
+ * KDBUS_CMD_CONN_INFO:		Retrieve credentials and properties of the
+ *				initial creator of the connection. The data was
+ *				stored at registration time and does not
+ *				necessarily represent the connected process or
+ *				the actual state of the process.
+ * KDBUS_CMD_CONN_UPDATE:	Update the properties of a connection. Used to
+ *				update the metadata subscription mask and
+ *				policy.
+ * KDBUS_CMD_BUS_CREATOR_INFO:	Retrieve information of the creator of the bus
+ *				a connection is attached to.
+ * KDBUS_CMD_ENDPOINT_UPDATE:	Update the properties of a custom enpoint. Used
+ *				to update the policy.
+ * KDBUS_CMD_MATCH_ADD:		Install a match which broadcast messages should
+ *				be delivered to the connection.
+ * KDBUS_CMD_MATCH_REMOVE:	Remove a current match for broadcast messages.
+ */
+enum kdbus_ioctl_type {
+	KDBUS_CMD_BUS_MAKE =		_IOW(KDBUS_IOCTL_MAGIC, 0x00,
+					     struct kdbus_cmd_make),
+	KDBUS_CMD_ENDPOINT_MAKE =	_IOW(KDBUS_IOCTL_MAGIC, 0x10,
+					     struct kdbus_cmd_make),
+
+	KDBUS_CMD_HELLO =		_IOWR(KDBUS_IOCTL_MAGIC, 0x20,
+					      struct kdbus_cmd_hello),
+	KDBUS_CMD_BYEBYE =		_IO(KDBUS_IOCTL_MAGIC, 0x21),
+
+	KDBUS_CMD_SEND =		_IOWR(KDBUS_IOCTL_MAGIC, 0x30,
+					      struct kdbus_cmd_send),
+	KDBUS_CMD_RECV =		_IOWR(KDBUS_IOCTL_MAGIC, 0x31,
+					      struct kdbus_cmd_recv),
+	KDBUS_CMD_FREE =		_IOW(KDBUS_IOCTL_MAGIC, 0x32,
+					     struct kdbus_cmd_free),
+
+	KDBUS_CMD_NAME_ACQUIRE =	_IOWR(KDBUS_IOCTL_MAGIC, 0x40,
+					      struct kdbus_cmd_name),
+	KDBUS_CMD_NAME_RELEASE =	_IOW(KDBUS_IOCTL_MAGIC, 0x41,
+					     struct kdbus_cmd_name),
+	KDBUS_CMD_NAME_LIST =		_IOWR(KDBUS_IOCTL_MAGIC, 0x42,
+					      struct kdbus_cmd_name_list),
+
+	KDBUS_CMD_CONN_INFO =		_IOWR(KDBUS_IOCTL_MAGIC, 0x50,
+					      struct kdbus_cmd_info),
+	KDBUS_CMD_CONN_UPDATE =		_IOW(KDBUS_IOCTL_MAGIC, 0x51,
+					     struct kdbus_cmd_update),
+	KDBUS_CMD_BUS_CREATOR_INFO =	_IOWR(KDBUS_IOCTL_MAGIC, 0x52,
+					      struct kdbus_cmd_info),
+
+	KDBUS_CMD_ENDPOINT_UPDATE =	_IOW(KDBUS_IOCTL_MAGIC, 0x61,
+					     struct kdbus_cmd_update),
+
+	KDBUS_CMD_MATCH_ADD =		_IOW(KDBUS_IOCTL_MAGIC, 0x70,
+					     struct kdbus_cmd_match),
+	KDBUS_CMD_MATCH_REMOVE =	_IOW(KDBUS_IOCTL_MAGIC, 0x71,
+					     struct kdbus_cmd_match),
+};
+
+#endif /* _KDBUS_UAPI_H_ */
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 03/13] kdbus: add driver skeleton, ioctl entry points and utility functions
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

Add the basic driver structure.

handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.

main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.

limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ioctl/ioctl-number.txt |    1 +
 ipc/kdbus/handle.c                   | 1134 ++++++++++++++++++++++++++++++++++
 ipc/kdbus/handle.h                   |   20 +
 ipc/kdbus/limits.h                   |   95 +++
 ipc/kdbus/main.c                     |   72 +++
 ipc/kdbus/util.c                     |  317 ++++++++++
 ipc/kdbus/util.h                     |  133 ++++
 7 files changed, 1772 insertions(+)
 create mode 100644 ipc/kdbus/handle.c
 create mode 100644 ipc/kdbus/handle.h
 create mode 100644 ipc/kdbus/limits.h
 create mode 100644 ipc/kdbus/main.c
 create mode 100644 ipc/kdbus/util.c
 create mode 100644 ipc/kdbus/util.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 8136e1fd30fd..54e091ebb862 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -292,6 +292,7 @@ Code  Seq#(hex)	Include File		Comments
 0x92	00-0F	drivers/usb/mon/mon_bin.c
 0x93	60-7F	linux/auto_fs.h
 0x94	all	fs/btrfs/ioctl.h
+0x95	all	uapi/linux/kdbus.h	kdbus IPC driver
 0x97	00-7F	fs/ceph/ioctl.h		Ceph file system
 0x99	00-0F				537-Addinboard driver
 					<mailto:buk@buks.ipn.de>
diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
new file mode 100644
index 000000000000..92e73e26ac5f
--- /dev/null
+++ b/ipc/kdbus/handle.c
@@ -0,0 +1,1134 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/kdev_t.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "domain.h"
+#include "policy.h"
+
+/**
+ * enum kdbus_handle_ep_type - type an endpoint handle can be of
+ * @KDBUS_HANDLE_EP_NONE:	New file descriptor on an endpoint
+ * @KDBUS_HANDLE_EP_CONNECTED:	An endpoint connection after HELLO
+ * @KDBUS_HANDLE_EP_OWNER:	File descriptor to hold an endpoint
+ */
+enum kdbus_handle_ep_type {
+	KDBUS_HANDLE_EP_NONE,
+	KDBUS_HANDLE_EP_CONNECTED,
+	KDBUS_HANDLE_EP_OWNER,
+};
+
+/**
+ * struct kdbus_handle_ep - an endpoint handle to the kdbus system
+ * @lock:		Handle lock
+ * @ep:			The endpoint for this handle
+ * @type:		Type of this handle (KDBUS_HANDLE_EP_*)
+ * @conn:		The connection this handle owns, in case @type
+ *			is KDBUS_HANDLE_EP_CONNECTED
+ * @ep_owner:		The endpoint this handle owns, in case @type
+ *			is KDBUS_HANDLE_EP_OWNER
+ * @privileged:		Flag to mark a handle as privileged
+ */
+struct kdbus_handle_ep {
+	struct mutex lock;
+	struct kdbus_ep *ep;
+
+	enum kdbus_handle_ep_type type;
+	union {
+		struct kdbus_conn *conn;
+		struct kdbus_ep *ep_owner;
+	};
+
+	bool privileged:1;
+};
+
+static int handle_ep_open(struct inode *inode, struct file *file)
+{
+	struct kdbus_handle_ep *handle;
+	struct kdbus_domain *domain;
+	struct kdbus_node *node;
+	struct kdbus_bus *bus;
+	int ret;
+
+	/* kdbusfs stores the kdbus_node in i_private */
+	node = inode->i_private;
+	if (!kdbus_node_acquire(node))
+		return -ESHUTDOWN;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle) {
+		ret = -ENOMEM;
+		goto exit_node;
+	}
+
+	mutex_init(&handle->lock);
+	handle->ep = kdbus_ep_ref(kdbus_ep_from_node(node));
+	handle->type = KDBUS_HANDLE_EP_NONE;
+
+	domain = handle->ep->bus->domain;
+	bus = handle->ep->bus;
+
+	/*
+	 * A connection is privileged if it is opened on an endpoint without
+	 * custom policy and either:
+	 *   * the user has CAP_IPC_OWNER in the domain user namespace
+	 * or
+	 *   * the callers euid matches the uid of the bus creator
+	 */
+	if (!handle->ep->has_policy &&
+	    (ns_capable(domain->user_namespace, CAP_IPC_OWNER) ||
+	     uid_eq(file->f_cred->euid, bus->node.uid)))
+		handle->privileged = true;
+
+	file->private_data = handle;
+	kdbus_node_release(node);
+
+	return 0;
+
+exit_node:
+	kdbus_node_release(node);
+	return ret;
+}
+
+static int handle_ep_release(struct inode *inode, struct file *file)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+
+	switch (handle->type) {
+	case KDBUS_HANDLE_EP_OWNER:
+		kdbus_ep_deactivate(handle->ep_owner);
+		kdbus_ep_unref(handle->ep_owner);
+		break;
+
+	case KDBUS_HANDLE_EP_CONNECTED:
+		kdbus_conn_disconnect(handle->conn, false);
+		kdbus_conn_unref(handle->conn);
+		break;
+
+	case KDBUS_HANDLE_EP_NONE:
+		/* nothing to clean up */
+		break;
+	}
+
+	kdbus_ep_unref(handle->ep);
+	kfree(handle);
+
+	return 0;
+}
+
+static int handle_ep_ioctl_endpoint_make(struct kdbus_handle_ep *handle,
+					 void __user *buf)
+{
+	struct kdbus_cmd_make *make;
+	struct kdbus_ep *ep;
+	const char *name;
+	int ret;
+
+	/* creating custom endpoints is a privileged operation */
+	if (!handle->privileged)
+		return -EPERM;
+
+	make = kdbus_memdup_user(buf, sizeof(*make), KDBUS_MAKE_MAX_SIZE);
+	if (IS_ERR(make))
+		return PTR_ERR(make);
+
+	make->return_flags = 0;
+	if (kdbus_member_set_user(&make->return_flags, buf,
+				  struct kdbus_cmd_make, return_flags)) {
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	ret = kdbus_negotiate_flags(make, buf, struct kdbus_cmd_make,
+				    KDBUS_MAKE_ACCESS_GROUP |
+				    KDBUS_MAKE_ACCESS_WORLD);
+	if (ret < 0)
+		goto exit;
+
+	ret = kdbus_items_validate(make->items, KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit;
+
+	name = kdbus_items_get_str(make->items, KDBUS_ITEMS_SIZE(make, items),
+				   KDBUS_ITEM_MAKE_NAME);
+	if (IS_ERR(name)) {
+		ret = PTR_ERR(name);
+		goto exit;
+	}
+
+	ep = kdbus_ep_new(handle->ep->bus, name,
+			  make->flags & (KDBUS_MAKE_ACCESS_WORLD |
+					 KDBUS_MAKE_ACCESS_GROUP),
+			  current_euid(), current_egid(), true);
+	if (IS_ERR(ep)) {
+		ret = PTR_ERR(ep);
+		goto exit;
+	}
+
+	ret = kdbus_ep_activate(ep);
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	ret = kdbus_ep_policy_set(ep, make->items,
+				  KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	/* protect against parallel ioctls */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_NONE) {
+		ret = -EBADFD;
+	} else {
+		handle->type = KDBUS_HANDLE_EP_OWNER;
+		handle->ep_owner = ep;
+	}
+	mutex_unlock(&handle->lock);
+
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	goto exit;
+
+exit_ep_unref:
+	kdbus_ep_deactivate(ep);
+	kdbus_ep_unref(ep);
+exit:
+	kfree(make);
+	return ret;
+}
+
+static int handle_ep_ioctl_hello(struct kdbus_handle_ep *handle,
+				 void __user *buf)
+{
+	struct kdbus_conn *conn;
+	struct kdbus_cmd_hello *hello;
+	int ret;
+
+	hello = kdbus_memdup_user(buf, sizeof(*hello), KDBUS_HELLO_MAX_SIZE);
+	if (IS_ERR(hello))
+		return PTR_ERR(hello);
+
+	ret = kdbus_negotiate_flags(hello, buf, typeof(*hello),
+				    KDBUS_HELLO_ACCEPT_FD |
+				    KDBUS_HELLO_ACTIVATOR |
+				    KDBUS_HELLO_POLICY_HOLDER |
+				    KDBUS_HELLO_MONITOR);
+	if (ret < 0)
+		goto exit;
+
+	hello->return_flags = 0;
+
+	ret = kdbus_items_validate(hello->items,
+				   KDBUS_ITEMS_SIZE(hello, items));
+	if (ret < 0)
+		goto exit;
+
+	if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE)) {
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	conn = kdbus_conn_new(handle->ep, hello, handle->privileged);
+	if (IS_ERR(conn)) {
+		ret = PTR_ERR(conn);
+		goto exit;
+	}
+
+	ret = kdbus_conn_connect(conn, hello);
+	if (ret < 0)
+		goto exit_conn;
+
+	ret = kdbus_conn_acquire(conn);
+	if (ret < 0)
+		goto exit_conn;
+
+	if (kdbus_conn_is_activator(conn) || kdbus_conn_is_policy_holder(conn))
+		ret = kdbus_policy_set(&conn->ep->bus->policy_db, hello->items,
+				       KDBUS_ITEMS_SIZE(hello, items),
+				       1, kdbus_conn_is_policy_holder(conn),
+				       conn);
+
+	kdbus_conn_release(conn);
+
+	if (ret < 0)
+		goto exit_conn;
+
+	if (copy_to_user(buf, hello, sizeof(*hello))) {
+		ret = -EFAULT;
+		goto exit_conn;
+	}
+
+	/* protect against parallel ioctls */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_NONE) {
+		ret = -EBADFD;
+	} else {
+		handle->type = KDBUS_HANDLE_EP_CONNECTED;
+		handle->conn = conn;
+	}
+	mutex_unlock(&handle->lock);
+
+	if (ret < 0)
+		goto exit_conn;
+
+	goto exit;
+
+exit_conn:
+	kdbus_conn_disconnect(conn, false);
+	kdbus_conn_unref(conn);
+exit:
+	kfree(hello);
+	return ret;
+}
+
+/* kdbus endpoint make commands */
+static long handle_ep_ioctl_none(struct file *file, unsigned int cmd,
+				 void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	long ret;
+
+	switch (cmd) {
+	case KDBUS_CMD_ENDPOINT_MAKE:
+		ret = handle_ep_ioctl_endpoint_make(handle, buf);
+		break;
+
+	case KDBUS_CMD_HELLO:
+		ret = handle_ep_ioctl_hello(handle, buf);
+		break;
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	return ret;
+}
+
+/* kdbus endpoint commands for connected peers */
+static long handle_ep_ioctl_connected(struct file *file, unsigned int cmd,
+				      void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	struct kdbus_conn *conn = handle->conn;
+	void *free_ptr = NULL;
+	long ret = 0;
+
+	/*
+	 * BYEBYE is special; we must not acquire a connection when
+	 * calling into kdbus_conn_disconnect() or we will deadlock,
+	 * because kdbus_conn_disconnect() will wait for all acquired
+	 * references to be dropped.
+	 */
+	if (cmd == KDBUS_CMD_BYEBYE) {
+		if (!kdbus_conn_is_ordinary(conn))
+			return -EOPNOTSUPP;
+
+		return kdbus_conn_disconnect(conn, true);
+	}
+
+	ret = kdbus_conn_acquire(conn);
+	if (ret < 0)
+		return ret;
+
+	switch (cmd) {
+	case KDBUS_CMD_NAME_ACQUIRE: {
+		/* acquire a well-known name */
+		struct kdbus_cmd_name *cmd_name;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_name = kdbus_memdup_user(buf, sizeof(*cmd_name),
+					     sizeof(*cmd_name) +
+						KDBUS_ITEM_HEADER_SIZE +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_name)) {
+			ret = PTR_ERR(cmd_name);
+			break;
+		}
+
+		free_ptr = cmd_name;
+
+		cmd_name->return_flags = 0;
+		if (kdbus_member_set_user(&cmd_name->return_flags, buf,
+					  struct kdbus_cmd_name,
+					  return_flags)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+					    KDBUS_NAME_REPLACE_EXISTING |
+					    KDBUS_NAME_ALLOW_REPLACEMENT |
+					    KDBUS_NAME_QUEUE);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_name->items,
+					   KDBUS_ITEMS_SIZE(cmd_name, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_acquire(conn->ep->bus->name_registry,
+					     conn, cmd_name);
+		if (ret < 0)
+			break;
+
+		/* return flags to the caller */
+		if (copy_to_user(buf, cmd_name, cmd_name->size))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_NAME_RELEASE: {
+		/* release a well-known name */
+		struct kdbus_cmd_name *cmd_name;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_name = kdbus_memdup_user(buf, sizeof(*cmd_name),
+					     sizeof(*cmd_name) +
+						KDBUS_ITEM_HEADER_SIZE +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_name)) {
+			ret = PTR_ERR(cmd_name);
+			break;
+		}
+
+		free_ptr = cmd_name;
+
+		cmd_name->return_flags = 0;
+		if (kdbus_member_set_user(&cmd_name->return_flags, buf,
+					  struct kdbus_cmd_name,
+					  return_flags)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+					    0);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_name->items,
+					   KDBUS_ITEMS_SIZE(cmd_name, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_release(conn->ep->bus->name_registry,
+					     conn, cmd_name);
+		break;
+	}
+
+	case KDBUS_CMD_NAME_LIST: {
+		struct kdbus_cmd_name_list *cmd_list;
+
+		cmd_list = kdbus_memdup_user(buf, sizeof(*cmd_list),
+					     KDBUS_CMD_MAX_SIZE);
+		if (IS_ERR(cmd_list)) {
+			ret = PTR_ERR(cmd_list);
+			break;
+		}
+
+		free_ptr = cmd_list;
+
+		ret = kdbus_negotiate_flags(cmd_list, buf, typeof(*cmd_list),
+					    KDBUS_NAME_LIST_UNIQUE |
+					    KDBUS_NAME_LIST_NAMES |
+					    KDBUS_NAME_LIST_ACTIVATORS |
+					    KDBUS_NAME_LIST_QUEUED);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_list->items,
+					   KDBUS_ITEMS_SIZE(cmd_list, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_list(conn->ep->bus->name_registry,
+					  conn, cmd_list);
+		if (ret < 0)
+			break;
+
+		cmd_list->return_flags = 0;
+
+		/* return allocated data */
+		if (kdbus_member_set_user(&cmd_list->offset, buf,
+					  struct kdbus_cmd_name_list, offset) ||
+		    kdbus_member_set_user(&cmd_list->list_size, buf,
+					  struct kdbus_cmd_name_list,
+					  list_size) ||
+		    kdbus_member_set_user(&cmd_list->return_flags, buf,
+					  struct kdbus_cmd_name_list,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_CONN_INFO:
+	case KDBUS_CMD_BUS_CREATOR_INFO: {
+		struct kdbus_cmd_info *cmd_info;
+
+		/* return the properties of a connection */
+		cmd_info = kdbus_memdup_user(buf, sizeof(*cmd_info),
+					     sizeof(*cmd_info) +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_info)) {
+			ret = PTR_ERR(cmd_info);
+			break;
+		}
+
+		free_ptr = cmd_info;
+
+		ret = kdbus_negotiate_flags(cmd_info, buf, typeof(*cmd_info),
+					    _KDBUS_ATTACH_ALL);
+		if (ret < 0)
+			break;
+
+		cmd_info->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_info->items,
+					   KDBUS_ITEMS_SIZE(cmd_info, items));
+		if (ret < 0)
+			break;
+
+		if (cmd == KDBUS_CMD_CONN_INFO)
+			ret = kdbus_cmd_conn_info(conn, cmd_info);
+		else
+			ret = kdbus_cmd_bus_creator_info(conn, cmd_info);
+
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_info->offset, buf,
+					  struct kdbus_cmd_info, offset) ||
+		    kdbus_member_set_user(&cmd_info->info_size, buf,
+					  struct kdbus_cmd_info, info_size) ||
+		    kdbus_member_set_user(&cmd_info->return_flags, buf,
+					  struct kdbus_cmd_info,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_CONN_UPDATE: {
+		/* update the properties of a connection */
+		struct kdbus_cmd_update *cmd_update;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_policy_holder(conn) &&
+		    !kdbus_conn_is_monitor(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_update = kdbus_memdup_user(buf, sizeof(*cmd_update),
+					       KDBUS_UPDATE_MAX_SIZE);
+		if (IS_ERR(cmd_update)) {
+			ret = PTR_ERR(cmd_update);
+			break;
+		}
+
+		free_ptr = cmd_update;
+
+		ret = kdbus_negotiate_flags(cmd_update, buf,
+					    typeof(*cmd_update), 0);
+		if (ret < 0)
+			break;
+
+		cmd_update->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_update->items,
+					   KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_conn_update(conn, cmd_update);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_update->return_flags, buf,
+					  struct kdbus_cmd_update,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_MATCH_ADD: {
+		/* subscribe to/filter for broadcast messages */
+		struct kdbus_cmd_match *cmd_match;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_match = kdbus_memdup_user(buf, sizeof(*cmd_match),
+					      KDBUS_MATCH_MAX_SIZE);
+		if (IS_ERR(cmd_match)) {
+			ret = PTR_ERR(cmd_match);
+			break;
+		}
+
+		free_ptr = cmd_match;
+
+		ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+					    KDBUS_MATCH_REPLACE);
+		if (ret < 0)
+			break;
+
+		cmd_match->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_match->items,
+					   KDBUS_ITEMS_SIZE(cmd_match, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_match_db_add(conn, cmd_match);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_match->return_flags, buf,
+					  struct kdbus_cmd_match,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_MATCH_REMOVE: {
+		/* unsubscribe from broadcast messages */
+		struct kdbus_cmd_match *cmd_match;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_match = kdbus_memdup_user(buf, sizeof(*cmd_match),
+					      sizeof(*cmd_match));
+		if (IS_ERR(cmd_match)) {
+			ret = PTR_ERR(cmd_match);
+			break;
+		}
+
+		free_ptr = cmd_match;
+
+		ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+					    0);
+		if (ret < 0)
+			break;
+
+		cmd_match->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_match->items,
+					   KDBUS_ITEMS_SIZE(cmd_match, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_match_db_remove(conn, cmd_match);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_match->return_flags, buf,
+					  struct kdbus_cmd_match,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_SEND: {
+		/* submit a message which will be queued in the receiver */
+		struct kdbus_cmd_send *cmd_send;
+		struct kdbus_kmsg *kmsg = NULL;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_send = kdbus_memdup_user(buf, sizeof(*cmd_send),
+					     KDBUS_SEND_MAX_SIZE);
+		if (IS_ERR(cmd_send)) {
+			ret = PTR_ERR(cmd_send);
+			break;
+		}
+
+		free_ptr = cmd_send;
+
+		ret = kdbus_negotiate_flags(cmd_send, buf, typeof(*cmd_send),
+					    KDBUS_SEND_SYNC_REPLY);
+		if (ret < 0)
+			break;
+
+		cmd_send->return_flags = 0;
+		cmd_send->reply.offset = 0;
+		cmd_send->reply.msg_size = 0;
+		cmd_send->reply.return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_send->items,
+					   KDBUS_ITEMS_SIZE(cmd_send, items));
+		if (ret < 0)
+			break;
+
+		kmsg = kdbus_kmsg_new_from_cmd(conn, buf, cmd_send);
+		if (IS_ERR(kmsg)) {
+			ret = PTR_ERR(kmsg);
+			break;
+		}
+
+		ret = kdbus_cmd_msg_send(conn, cmd_send, file, kmsg);
+		if (ret < 0) {
+			kdbus_kmsg_free(kmsg);
+			break;
+		}
+
+		if (kdbus_member_set_user(&cmd_send->return_flags, buf,
+					  struct kdbus_cmd_send,
+					  return_flags))
+			ret = -EFAULT;
+
+		/* store the reply back to userspace */
+		if (cmd_send->flags & KDBUS_SEND_SYNC_REPLY) {
+			if (kdbus_member_set_user(&cmd_send->reply, buf,
+						  struct kdbus_cmd_send,
+						  reply))
+				ret = -EFAULT;
+		}
+
+		kdbus_kmsg_free(kmsg);
+		break;
+	}
+
+	case KDBUS_CMD_RECV: {
+		struct kdbus_cmd_recv *cmd_recv;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_monitor(conn) &&
+		    !kdbus_conn_is_activator(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_recv = kdbus_memdup_user(buf, sizeof(*cmd_recv),
+					     KDBUS_RECV_MAX_SIZE);
+		if (IS_ERR(cmd_recv)) {
+			ret = PTR_ERR(cmd_recv);
+			break;
+		}
+
+		free_ptr = cmd_recv;
+
+		ret = kdbus_negotiate_flags(cmd_recv, buf, typeof(*cmd_recv),
+					    KDBUS_RECV_PEEK |
+					    KDBUS_RECV_DROP |
+					    KDBUS_RECV_USE_PRIORITY);
+		if (ret < 0)
+			break;
+
+		cmd_recv->return_flags = 0;
+		cmd_recv->dropped_msgs = 0;
+		cmd_recv->msg.offset = 0;
+		cmd_recv->msg.msg_size = 0;
+		cmd_recv->msg.return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_recv->items,
+					   KDBUS_ITEMS_SIZE(cmd_recv, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_msg_recv(conn, cmd_recv);
+		/*
+		 * In case of -EOVERFLOW, we still have to write back the
+		 * number of lost messages.
+		 */
+		if (ret < 0 && ret != -EOVERFLOW)
+			break;
+
+		/* return the number of dropped messages */
+		if (kdbus_member_set_user(&cmd_recv->dropped_msgs, buf,
+					  struct kdbus_cmd_recv,
+					  dropped_msgs) ||
+		    kdbus_member_set_user(&cmd_recv->msg, buf,
+					  struct kdbus_cmd_recv, msg) ||
+		    kdbus_member_set_user(&cmd_recv->return_flags, buf,
+					  struct kdbus_cmd_recv,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_FREE: {
+		struct kdbus_cmd_free *cmd_free;
+		const struct kdbus_item *item;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_monitor(conn) &&
+		    !kdbus_conn_is_activator(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_free = kdbus_memdup_user(buf, sizeof(*cmd_free),
+					     KDBUS_CMD_MAX_SIZE);
+		if (IS_ERR(cmd_free)) {
+			ret = PTR_ERR(cmd_free);
+			break;
+		}
+
+		free_ptr = cmd_free;
+
+		ret = kdbus_negotiate_flags(cmd_free, buf, typeof(*cmd_free),
+					    0);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_free->items,
+					   KDBUS_ITEMS_SIZE(cmd_free, items));
+		if (ret < 0)
+			break;
+
+		KDBUS_ITEMS_FOREACH(item, cmd_free->items,
+				    KDBUS_ITEMS_SIZE(cmd_free, items)) {
+			/* no items supported so far */
+			switch (item->type) {
+			default:
+				ret = -EINVAL;
+				break;
+			}
+		}
+		if (ret < 0)
+			break;
+
+		cmd_free->return_flags = 0;
+
+		ret = kdbus_pool_release_offset(conn->pool, cmd_free->offset);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_free->return_flags, buf,
+					  struct kdbus_cmd_free,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kdbus_conn_release(conn);
+	kfree(free_ptr);
+	return ret;
+}
+
+/* kdbus endpoint commands for endpoint owners */
+static long handle_ep_ioctl_owner(struct file *file, unsigned int cmd,
+				  void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	struct kdbus_ep *ep = handle->ep_owner;
+	void *free_ptr = NULL;
+	long ret = 0;
+
+	switch (cmd) {
+	case KDBUS_CMD_ENDPOINT_UPDATE: {
+		struct kdbus_cmd_update *cmd_update;
+
+		/* update the properties of a custom endpoint */
+		cmd_update = kdbus_memdup_user(buf, sizeof(*cmd_update),
+					       KDBUS_UPDATE_MAX_SIZE);
+		if (IS_ERR(cmd_update)) {
+			ret = PTR_ERR(cmd_update);
+			break;
+		}
+
+		free_ptr = cmd_update;
+
+		ret = kdbus_negotiate_flags(cmd_update, buf,
+					    typeof(*cmd_update), 0);
+		if (ret < 0)
+			break;
+
+		cmd_update->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_update->items,
+					   KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_ep_policy_set(ep, cmd_update->items,
+					  KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_update->return_flags, buf,
+					  struct kdbus_cmd_update,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kfree(free_ptr);
+	return ret;
+}
+
+static long handle_ep_ioctl(struct file *file, unsigned int cmd,
+			    unsigned long arg)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	void __user *argp = (void __user *)arg;
+	enum kdbus_handle_ep_type type;
+
+	/* lock while accessing handle->type to enforce barriers */
+	mutex_lock(&handle->lock);
+	type = handle->type;
+	mutex_unlock(&handle->lock);
+
+	switch (type) {
+	case KDBUS_HANDLE_EP_NONE:
+		return handle_ep_ioctl_none(file, cmd, argp);
+
+	case KDBUS_HANDLE_EP_CONNECTED:
+		return handle_ep_ioctl_connected(file, cmd, argp);
+
+	case KDBUS_HANDLE_EP_OWNER:
+		return handle_ep_ioctl_owner(file, cmd, argp);
+
+	default:
+		return -EBADFD;
+	}
+}
+
+static unsigned int handle_ep_poll(struct file *file,
+				   struct poll_table_struct *wait)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	unsigned int mask = POLLOUT | POLLWRNORM;
+	int ret;
+
+	/* Only a connected endpoint can read/write data */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_CONNECTED) {
+		mutex_unlock(&handle->lock);
+		return POLLERR | POLLHUP;
+	}
+	mutex_unlock(&handle->lock);
+
+	ret = kdbus_conn_acquire(handle->conn);
+	if (ret < 0)
+		return POLLERR | POLLHUP;
+
+	poll_wait(file, &handle->conn->wait, wait);
+
+	if (!list_empty(&handle->conn->queue.msg_list))
+		mask |= POLLIN | POLLRDNORM;
+
+	kdbus_conn_release(handle->conn);
+
+	return mask;
+}
+
+static int handle_ep_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_CONNECTED) {
+		mutex_unlock(&handle->lock);
+		return -EPERM;
+	}
+	mutex_unlock(&handle->lock);
+
+	return kdbus_pool_mmap(handle->conn->pool, vma);
+}
+
+const struct file_operations kdbus_handle_ep_ops = {
+	.owner =		THIS_MODULE,
+	.open =			handle_ep_open,
+	.release =		handle_ep_release,
+	.poll =			handle_ep_poll,
+	.llseek =		noop_llseek,
+	.unlocked_ioctl =	handle_ep_ioctl,
+	.mmap =			handle_ep_mmap,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl =		handle_ep_ioctl,
+#endif
+};
+
+static int handle_control_open(struct inode *inode, struct file *file)
+{
+	if (!kdbus_node_is_active(inode->i_private))
+		return -ESHUTDOWN;
+
+	/* private_data is used by BUS_MAKE to store the new bus */
+	file->private_data = NULL;
+
+	return 0;
+}
+
+static int handle_control_release(struct inode *inode, struct file *file)
+{
+	struct kdbus_bus *bus = file->private_data;
+
+	if (bus) {
+		kdbus_bus_deactivate(bus);
+		kdbus_bus_unref(bus);
+	}
+
+	return 0;
+}
+
+static int handle_control_ioctl_bus_make(struct file *file,
+					 struct kdbus_domain *domain,
+					 void __user *buf)
+{
+	struct kdbus_cmd_make *make;
+	struct kdbus_bus *bus;
+	int ret;
+
+	/* catch double BUS_MAKE early, locked test is below */
+	if (file->private_data)
+		return -EBADFD;
+
+	make = kdbus_memdup_user(buf, sizeof(*make), KDBUS_MAKE_MAX_SIZE);
+	if (IS_ERR(make))
+		return PTR_ERR(make);
+
+	ret = kdbus_negotiate_flags(make, buf, struct kdbus_cmd_make,
+				    KDBUS_MAKE_ACCESS_GROUP |
+				    KDBUS_MAKE_ACCESS_WORLD);
+	if (ret < 0)
+		goto exit;
+
+	ret = kdbus_items_validate(make->items, KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit;
+
+	bus = kdbus_bus_new(domain, make, current_euid(), current_egid());
+	if (IS_ERR(bus)) {
+		ret = PTR_ERR(bus);
+		goto exit;
+	}
+
+	ret = kdbus_bus_activate(bus);
+	if (ret < 0)
+		goto exit_bus_unref;
+
+	/* protect against parallel ioctls */
+	mutex_lock(&domain->lock);
+	if (file->private_data)
+		ret = -EBADFD;
+	else
+		file->private_data = bus;
+	mutex_unlock(&domain->lock);
+
+	if (ret < 0)
+		goto exit_bus_unref;
+
+	goto exit;
+
+exit_bus_unref:
+	kdbus_bus_deactivate(bus);
+	kdbus_bus_unref(bus);
+exit:
+	kfree(make);
+	return ret;
+}
+
+static long handle_control_ioctl(struct file *file, unsigned int cmd,
+				 unsigned long arg)
+{
+	struct kdbus_node *node = file_inode(file)->i_private;
+	struct kdbus_domain *domain;
+	int ret = 0;
+
+	/*
+	 * The parent of control-nodes is always a domain, make sure to pin it
+	 * so the parent is actually valid.
+	 */
+	if (!kdbus_node_acquire(node))
+		return -ESHUTDOWN;
+
+	domain = kdbus_domain_from_node(node->parent);
+	if (!kdbus_node_acquire(&domain->node)) {
+		kdbus_node_release(node);
+		return -ESHUTDOWN;
+	}
+
+	switch (cmd) {
+	case KDBUS_CMD_BUS_MAKE:
+		ret = handle_control_ioctl_bus_make(file, domain,
+						    (void __user *)arg);
+		break;
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kdbus_node_release(&domain->node);
+	kdbus_node_release(node);
+	return ret;
+}
+
+const struct file_operations kdbus_handle_control_ops = {
+	.open =			handle_control_open,
+	.release =		handle_control_release,
+	.llseek =		noop_llseek,
+	.unlocked_ioctl =	handle_control_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl =		handle_control_ioctl,
+#endif
+};
diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
new file mode 100644
index 000000000000..32809dad3720
--- /dev/null
+++ b/ipc/kdbus/handle.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_HANDLE_H
+#define __KDBUS_HANDLE_H
+
+extern const struct file_operations kdbus_handle_ep_ops;
+extern const struct file_operations kdbus_handle_control_ops;
+
+#endif
diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
new file mode 100644
index 000000000000..b848f437e792
--- /dev/null
+++ b/ipc/kdbus/limits.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DEFAULTS_H
+#define __KDBUS_DEFAULTS_H
+
+/* maximum size of message header and items */
+#define KDBUS_MSG_MAX_SIZE		SZ_8K
+
+/* maximum number of message items */
+#define KDBUS_MSG_MAX_ITEMS		128
+
+/* max size of ioctl command data */
+#define KDBUS_CMD_MAX_SIZE		SZ_8K
+
+/*
+ * Maximum number of passed file descriptors
+ * Number taken from AF_UNIX upper limits
+ */
+#define KDBUS_MSG_MAX_FDS		253
+
+/* maximum message payload size */
+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		SZ_2M
+
+/* maximum size of bloom bit field in bytes */
+#define KDBUS_BUS_BLOOM_MAX_SIZE		SZ_4K
+
+/* maximum length of well-known bus name */
+#define KDBUS_NAME_MAX_LEN			255
+
+/* maximum length of bus, domain, ep name */
+#define KDBUS_SYSNAME_MAX_LEN			63
+
+/* maximum size of make data */
+#define KDBUS_MAKE_MAX_SIZE			SZ_32K
+
+/* maximum size of hello data */
+#define KDBUS_HELLO_MAX_SIZE			SZ_32K
+
+/* maximum size for update commands */
+#define KDBUS_UPDATE_MAX_SIZE			SZ_32K
+
+/* maximum number of matches per connection */
+#define KDBUS_MATCH_MAX				256
+
+/* maximum size of match data */
+#define KDBUS_MATCH_MAX_SIZE			SZ_32K
+
+/* maximum size of send data */
+#define KDBUS_SEND_MAX_SIZE			SZ_32K
+
+/* maximum size of recv data */
+#define KDBUS_RECV_MAX_SIZE			SZ_32K
+
+/* maximum size of policy data */
+#define KDBUS_POLICY_MAX_SIZE			SZ_32K
+
+/* maximum number of queued messages in a connection */
+#define KDBUS_CONN_MAX_MSGS			256
+
+/*
+ * maximum number of queued messages wich will not be user accounted.
+ * after this value is reached each user will have an individual limit.
+ */
+#define KDBUS_CONN_MAX_MSGS_UNACCOUNTED		16
+
+/*
+ * maximum number of queued messages from the same indvidual user after the
+ * the un-accounted value has been hit
+ */
+#define KDBUS_CONN_MAX_MSGS_PER_USER		16
+
+/* maximum number of well-known names per connection */
+#define KDBUS_CONN_MAX_NAMES			256
+
+/* maximum number of queued requests waiting for a reply */
+#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
+
+/* maximum number of connections per user in one domain */
+#define KDBUS_USER_MAX_CONN			1024
+
+/* maximum number of buses per user in one domain */
+#define KDBUS_USER_MAX_BUSES			16
+
+#endif
diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
new file mode 100644
index 000000000000..5d6a84453347
--- /dev/null
+++ b/ipc/kdbus/main.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt)    KBUILD_MODNAME ": " fmt
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+
+#include "util.h"
+#include "fs.h"
+#include "handle.h"
+#include "metadata.h"
+#include "node.h"
+
+/* kdbus mount-point /sys/fs/kdbus */
+static struct kobject *kdbus_dir;
+
+/* global module option to apply a mask to exported metadata */
+unsigned long long kdbus_meta_attach_mask = KDBUS_ATTACH_TIMESTAMP |
+					    KDBUS_ATTACH_CREDS |
+					    KDBUS_ATTACH_PIDS |
+					    KDBUS_ATTACH_AUXGROUPS |
+					    KDBUS_ATTACH_NAMES |
+					    KDBUS_ATTACH_SECLABEL |
+					    KDBUS_ATTACH_CONN_DESCRIPTION;
+MODULE_PARM_DESC(attach_flags_mask, "Attach-flags mask for exported metadata");
+module_param_named(attach_flags_mask, kdbus_meta_attach_mask, ullong, 0644);
+
+static int __init kdbus_init(void)
+{
+	int ret;
+
+	kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
+	if (!kdbus_dir)
+		return -ENOMEM;
+
+	ret = kdbus_fs_init();
+	if (ret < 0) {
+		pr_err("cannot register filesystem: %d\n", ret);
+		goto exit_dir;
+	}
+
+	pr_info("initialized\n");
+	return 0;
+
+exit_dir:
+	kobject_put(kdbus_dir);
+	return ret;
+}
+
+static void __exit kdbus_exit(void)
+{
+	kdbus_fs_exit();
+	kobject_put(kdbus_dir);
+}
+
+module_init(kdbus_init);
+module_exit(kdbus_exit);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
+MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
new file mode 100644
index 000000000000..16069f5f644e
--- /dev/null
+++ b/ipc/kdbus/util.c
@@ -0,0 +1,317 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/ctype.h>
+#include <linux/err.h>
+#include <linux/file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+#include <linux/user_namespace.h>
+
+#include "limits.h"
+#include "util.h"
+
+/**
+ * kdbus_sysname_valid() - validate names showing up in /proc, /sys and /dev
+ * @name:		Name of domain, bus, endpoint
+ *
+ * Return: 0 if the given name is valid, otherwise negative errno
+ */
+int kdbus_sysname_is_valid(const char *name)
+{
+	unsigned int i;
+	size_t len;
+
+	len = strlen(name);
+	if (len == 0)
+		return -EINVAL;
+
+	for (i = 0; i < len; i++) {
+		if (isalpha(name[i]))
+			continue;
+		if (isdigit(name[i]))
+			continue;
+		if (name[i] == '_')
+			continue;
+		if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
+			continue;
+
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * kdbus_check_and_write_flags() - check flags provided by user, and write the
+ *				   valid mask back
+ * @flags:	The flags mask provided by userspace
+ * @buf:	The buffer provided by userspace
+ * @offset_out:	Offset of the kernel_flags field inside the user-provided struct
+ * @valid:	Mask of valid bits
+ *
+ * This function will check whether the flags provided by userspace are within
+ * the combination of allowed bits to the kernel, with the KDBUS_FLAGS_KERNEL
+ * bit set in the return buffer.
+ *
+ * Return: 0 on success, -EFAULT if copy_to_user() failed, or -EINVAL if
+ * userspace submitted invalid bits in its mask.
+ */
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+				off_t offset_out, u64 valid)
+{
+	u64 val = valid | KDBUS_FLAG_KERNEL;
+
+	/*
+	 * KDBUS_FLAG_KERNEL is reserved and will never be considered
+	 * valid by any user of this function.
+	 */
+	WARN_ON_ONCE(valid & KDBUS_FLAG_KERNEL);
+
+	if (copy_to_user(((u8 __user *)buf) + offset_out, &val, sizeof(val)))
+		return -EFAULT;
+
+	if (flags & ~valid)
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_fput_files() - fput() an array of struct files
+ * @files:	The array of files to put, may be NULL
+ * @count:	The number of elements in @files
+ *
+ * Call fput() on all non-NULL elements in @files, and set the entries to
+ * NULL afterwards.
+ */
+void kdbus_fput_files(struct file **files, unsigned int count)
+{
+	int i;
+
+	if (!files || count == 0)
+		return;
+
+	for (i = count - 1; i >= 0; i--)
+		if (files[i]) {
+			fput(files[i]);
+			files[i] = NULL;
+		}
+}
+
+/**
+ * kdbus_copy_from_user() - copy aligned data from user-space
+ * @dest:	target buffer in kernel memory
+ * @user_ptr:	user-provided source buffer
+ * @size:	memory size to copy from user
+ *
+ * This copies @size bytes from @user_ptr into the kernel, just like
+ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
+ * unaligned user-space pointers.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
+{
+	if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
+		return -EFAULT;
+
+	if (copy_from_user(dest, user_ptr, size))
+		return -EFAULT;
+
+	return 0;
+}
+
+/**
+ * kdbus_memdup_user() - copy dynamically sized object from user-space
+ * @user_ptr:	user-provided source buffer
+ * @sz_min:	minimum object size
+ * @sz_max:	maximum object size
+ *
+ * This copies a dynamically sized object from user-space into kernel-space. We
+ * require the object to have a 64bit size field at offset 0. We read it out
+ * first, allocate a suitably sized buffer and then copy all data.
+ *
+ * The @sz_min and @sz_max parameters define possible min and max object sizes
+ * so user-space cannot trigger un-bound kernel-space allocations.
+ *
+ * The same alignment-restrictions as described in kdbus_copy_from_user() apply.
+ *
+ * Return: pointer to dynamically allocated copy, or ERR_PTR() on failure.
+ */
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max)
+{
+	void *ptr;
+	u64 size;
+	int ret;
+
+	ret = kdbus_copy_from_user(&size, user_ptr, sizeof(size));
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	if (size < sz_min)
+		return ERR_PTR(-EINVAL);
+
+	if (size > sz_max)
+		return ERR_PTR(-EMSGSIZE);
+
+	ptr = memdup_user(user_ptr, size);
+	if (IS_ERR(ptr))
+		return ptr;
+
+	if (*(u64 *)ptr != size) {
+		kfree(ptr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return ptr;
+}
+
+/**
+ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
+ * @name:	user-supplied name to verify
+ * @user_ns:	user-namespace to act in
+ * @kuid:	Kernel internal uid of user
+ *
+ * This verifies that the user-supplied name @name has their UID as prefix. This
+ * is the default name-spacing policy we enforce on user-supplied names for
+ * public kdbus entities like buses and endpoints.
+ *
+ * The user must supply names prefixed with "<UID>-", whereas the UID is
+ * interpreted in the user-namespace of the domain. If the user fails to supply
+ * such a prefixed name, we reject it.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+			    kuid_t kuid)
+{
+	uid_t uid;
+	char prefix[16];
+
+	/*
+	 * The kuid must have a mapping into the userns of the domain
+	 * otherwise do not allow creation of buses nor endpoints.
+	 */
+	uid = from_kuid(user_ns, kuid);
+	if (uid == (uid_t) -1)
+		return -EINVAL;
+
+	snprintf(prefix, sizeof(prefix), "%u-", uid);
+	if (strncmp(name, prefix, strlen(prefix)) != 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_from_kuid_keep() - Create a uid from kuid/user-ns pair
+ * @uid:		Kernel uid to map into @user_ns
+ *
+ * This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself.
+ *
+ * Return: UID @uid mapped into @user_ns, or INVALID_UID if @uid==INVALID_UID.
+ */
+u32 kdbus_from_kuid_keep(kuid_t uid)
+{
+	return uid_valid(uid) ?
+		from_kuid_munged(current_user_ns(), uid) : ((uid_t)-1);
+}
+
+/**
+ * kdbus_from_kgid_keep() - Create a gid from kgid/user-ns pair
+ * @gid:		Kernel gid to map into @user_ns
+ *
+ * This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself.
+ *
+ * Return: GID @gid mapped into @user_ns, or INVALID_GID if @gid==INVALID_GID.
+ */
+u32 kdbus_from_kgid_keep(kgid_t gid)
+{
+	return gid_valid(gid) ?
+		from_kgid_munged(current_user_ns(), gid) : ((gid_t)-1);
+}
+
+/**
+ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
+ * @flags:		Attach flags provided by userspace
+ * @attach_flags:	A pointer where to store the valid attach flags
+ *
+ * Convert attach-flags provided by user-space into a valid mask. If the mask
+ * is invalid, an error is returned. The sanitized attach flags are stored in
+ * the output parameter.
+ *
+ * Return: 0 on success, negative error on failure.
+ */
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
+{
+	/* 'any' degrades to 'all' for compatibility */
+	if (flags == _KDBUS_ATTACH_ANY)
+		flags = _KDBUS_ATTACH_ALL;
+
+	/* reject unknown attach flags */
+	if (flags & ~_KDBUS_ATTACH_ALL)
+		return -EINVAL;
+
+	*attach_flags = flags;
+	return 0;
+}
+
+/**
+ * kdbus_kvec_set - helper utility to assemble kvec arrays
+ * @kvec:	kvec entry to use
+ * @src:	Source address to set in @kvec
+ * @len:	Number of bytes in @src
+ * @total_len:	Pointer to total length variable
+ *
+ * Set @src and @len in @kvec, and increase @total_len by @len.
+ */
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
+{
+	kvec->iov_base = src;
+	kvec->iov_len = len;
+	*total_len += len;
+}
+
+static const char * const zeros = "\0\0\0\0\0\0\0";
+
+/**
+ * kdbus_kvec_pad - conditionally write a padding kvec
+ * @kvec:	kvec entry to use
+ * @len:	Total length used for kvec array
+ *
+ * Check if the current total byte length of the array in @len is aligned to
+ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
+ * by the number of bytes stored in @kvec.
+ *
+ * Return: the number of added padding bytes.
+ */
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
+{
+	size_t pad = KDBUS_ALIGN8(*len) - *len;
+
+	if (!pad)
+		return 0;
+
+	kvec->iov_base = (void *)zeros;
+	kvec->iov_len = pad;
+
+	*len += pad;
+
+	return pad;
+}
diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
new file mode 100644
index 000000000000..33d31f6274e0
--- /dev/null
+++ b/ipc/kdbus/util.h
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_UTIL_H
+#define __KDBUS_UTIL_H
+
+#include <linux/dcache.h>
+#include <linux/ioctl.h>
+#include <linux/uidgid.h>
+
+#include "kdbus.h"
+
+/* all exported addresses are 64 bit */
+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
+
+/* all exported sizes are 64 bit and data aligned to 64 bit */
+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
+
+/**
+ * kdbus_size_get_user - read the size variable from user memory
+ * @_s:			Size variable
+ * @_b:			Buffer to read from
+ * @_t:			Structure, "size" is a member of
+ *
+ * Return: the result of copy_from_user()
+ */
+#define kdbus_size_get_user(_s, _b, _t)					\
+({									\
+	u64 __user *_sz =						\
+		(void __user *)((u8 __user *)(_b) + offsetof(_t, size));\
+	copy_from_user(_s, _sz, sizeof(__u64));				\
+})
+
+/**
+ * kdbus_member_set_user - write a structure member to user memory
+ * @_s:			Variable to copy from
+ * @_b:			Buffer to write to
+ * @_t:			Structure type
+ * @_m:			Member name in the passed structure
+ *
+ * Return: the result of copy_to_user()
+ */
+#define kdbus_member_set_user(_s, _b, _t, _m)				\
+({									\
+	u64 __user *_sz =						\
+		(void __user *)((u8 __user *)(_b) + offsetof(_t, _m));	\
+	copy_to_user(_sz, _s, sizeof(((_t *)0)->_m));			\
+})
+
+/**
+ * kdbus_strhash - calculate a hash
+ * @str:		String
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_strhash(const char *str)
+{
+	unsigned long hash = init_name_hash();
+
+	while (*str)
+		hash = partial_name_hash(*str++, hash);
+
+	return end_name_hash(hash);
+}
+
+/**
+ * kdbus_strnhash - calculate a hash
+ * @str:		String
+ * @len:		Length of @str
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_strnhash(const char *str, size_t len)
+{
+	unsigned long hash = init_name_hash();
+
+	while (len--)
+		hash = partial_name_hash(*str++, hash);
+
+	return end_name_hash(hash);
+}
+
+/**
+ * kdbus_str_valid - verify a string
+ * @str:		String to verify
+ * @size:		Size of buffer of string (including 0-byte)
+ *
+ * This verifies the string at position @str with size @size is properly
+ * zero-terminated and does not contain a 0-byte but at the end.
+ *
+ * Return: true if string is valid, false if not.
+ */
+static inline bool kdbus_str_valid(const char *str, size_t size)
+{
+	return size > 0 && memchr(str, '\0', size) == str + size - 1;
+}
+
+int kdbus_sysname_is_valid(const char *name);
+void kdbus_fput_files(struct file **files, unsigned int count);
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+			    kuid_t kuid);
+u32 kdbus_from_kuid_keep(kuid_t uid);
+u32 kdbus_from_kgid_keep(kgid_t gid);
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
+
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max);
+
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+				off_t offset_out, u64 valid);
+
+struct kvec;
+
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
+
+#define kdbus_negotiate_flags(_s, _b, _t, _v)				\
+	kdbus_check_and_write_flags((_s)->flags, _b,			\
+				    offsetof(_t, kernel_flags), _v)
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 03/13] kdbus: add driver skeleton, ioctl entry points and utility functions
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

Add the basic driver structure.

handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.

main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.

limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
 Documentation/ioctl/ioctl-number.txt |    1 +
 ipc/kdbus/handle.c                   | 1134 ++++++++++++++++++++++++++++++++++
 ipc/kdbus/handle.h                   |   20 +
 ipc/kdbus/limits.h                   |   95 +++
 ipc/kdbus/main.c                     |   72 +++
 ipc/kdbus/util.c                     |  317 ++++++++++
 ipc/kdbus/util.h                     |  133 ++++
 7 files changed, 1772 insertions(+)
 create mode 100644 ipc/kdbus/handle.c
 create mode 100644 ipc/kdbus/handle.h
 create mode 100644 ipc/kdbus/limits.h
 create mode 100644 ipc/kdbus/main.c
 create mode 100644 ipc/kdbus/util.c
 create mode 100644 ipc/kdbus/util.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 8136e1fd30fd..54e091ebb862 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -292,6 +292,7 @@ Code  Seq#(hex)	Include File		Comments
 0x92	00-0F	drivers/usb/mon/mon_bin.c
 0x93	60-7F	linux/auto_fs.h
 0x94	all	fs/btrfs/ioctl.h
+0x95	all	uapi/linux/kdbus.h	kdbus IPC driver
 0x97	00-7F	fs/ceph/ioctl.h		Ceph file system
 0x99	00-0F				537-Addinboard driver
 					<mailto:buk-KMFVLCTwZAcb1SvskN2V4Q@public.gmane.org>
diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
new file mode 100644
index 000000000000..92e73e26ac5f
--- /dev/null
+++ b/ipc/kdbus/handle.c
@@ -0,0 +1,1134 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/kdev_t.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "domain.h"
+#include "policy.h"
+
+/**
+ * enum kdbus_handle_ep_type - type an endpoint handle can be of
+ * @KDBUS_HANDLE_EP_NONE:	New file descriptor on an endpoint
+ * @KDBUS_HANDLE_EP_CONNECTED:	An endpoint connection after HELLO
+ * @KDBUS_HANDLE_EP_OWNER:	File descriptor to hold an endpoint
+ */
+enum kdbus_handle_ep_type {
+	KDBUS_HANDLE_EP_NONE,
+	KDBUS_HANDLE_EP_CONNECTED,
+	KDBUS_HANDLE_EP_OWNER,
+};
+
+/**
+ * struct kdbus_handle_ep - an endpoint handle to the kdbus system
+ * @lock:		Handle lock
+ * @ep:			The endpoint for this handle
+ * @type:		Type of this handle (KDBUS_HANDLE_EP_*)
+ * @conn:		The connection this handle owns, in case @type
+ *			is KDBUS_HANDLE_EP_CONNECTED
+ * @ep_owner:		The endpoint this handle owns, in case @type
+ *			is KDBUS_HANDLE_EP_OWNER
+ * @privileged:		Flag to mark a handle as privileged
+ */
+struct kdbus_handle_ep {
+	struct mutex lock;
+	struct kdbus_ep *ep;
+
+	enum kdbus_handle_ep_type type;
+	union {
+		struct kdbus_conn *conn;
+		struct kdbus_ep *ep_owner;
+	};
+
+	bool privileged:1;
+};
+
+static int handle_ep_open(struct inode *inode, struct file *file)
+{
+	struct kdbus_handle_ep *handle;
+	struct kdbus_domain *domain;
+	struct kdbus_node *node;
+	struct kdbus_bus *bus;
+	int ret;
+
+	/* kdbusfs stores the kdbus_node in i_private */
+	node = inode->i_private;
+	if (!kdbus_node_acquire(node))
+		return -ESHUTDOWN;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle) {
+		ret = -ENOMEM;
+		goto exit_node;
+	}
+
+	mutex_init(&handle->lock);
+	handle->ep = kdbus_ep_ref(kdbus_ep_from_node(node));
+	handle->type = KDBUS_HANDLE_EP_NONE;
+
+	domain = handle->ep->bus->domain;
+	bus = handle->ep->bus;
+
+	/*
+	 * A connection is privileged if it is opened on an endpoint without
+	 * custom policy and either:
+	 *   * the user has CAP_IPC_OWNER in the domain user namespace
+	 * or
+	 *   * the callers euid matches the uid of the bus creator
+	 */
+	if (!handle->ep->has_policy &&
+	    (ns_capable(domain->user_namespace, CAP_IPC_OWNER) ||
+	     uid_eq(file->f_cred->euid, bus->node.uid)))
+		handle->privileged = true;
+
+	file->private_data = handle;
+	kdbus_node_release(node);
+
+	return 0;
+
+exit_node:
+	kdbus_node_release(node);
+	return ret;
+}
+
+static int handle_ep_release(struct inode *inode, struct file *file)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+
+	switch (handle->type) {
+	case KDBUS_HANDLE_EP_OWNER:
+		kdbus_ep_deactivate(handle->ep_owner);
+		kdbus_ep_unref(handle->ep_owner);
+		break;
+
+	case KDBUS_HANDLE_EP_CONNECTED:
+		kdbus_conn_disconnect(handle->conn, false);
+		kdbus_conn_unref(handle->conn);
+		break;
+
+	case KDBUS_HANDLE_EP_NONE:
+		/* nothing to clean up */
+		break;
+	}
+
+	kdbus_ep_unref(handle->ep);
+	kfree(handle);
+
+	return 0;
+}
+
+static int handle_ep_ioctl_endpoint_make(struct kdbus_handle_ep *handle,
+					 void __user *buf)
+{
+	struct kdbus_cmd_make *make;
+	struct kdbus_ep *ep;
+	const char *name;
+	int ret;
+
+	/* creating custom endpoints is a privileged operation */
+	if (!handle->privileged)
+		return -EPERM;
+
+	make = kdbus_memdup_user(buf, sizeof(*make), KDBUS_MAKE_MAX_SIZE);
+	if (IS_ERR(make))
+		return PTR_ERR(make);
+
+	make->return_flags = 0;
+	if (kdbus_member_set_user(&make->return_flags, buf,
+				  struct kdbus_cmd_make, return_flags)) {
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	ret = kdbus_negotiate_flags(make, buf, struct kdbus_cmd_make,
+				    KDBUS_MAKE_ACCESS_GROUP |
+				    KDBUS_MAKE_ACCESS_WORLD);
+	if (ret < 0)
+		goto exit;
+
+	ret = kdbus_items_validate(make->items, KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit;
+
+	name = kdbus_items_get_str(make->items, KDBUS_ITEMS_SIZE(make, items),
+				   KDBUS_ITEM_MAKE_NAME);
+	if (IS_ERR(name)) {
+		ret = PTR_ERR(name);
+		goto exit;
+	}
+
+	ep = kdbus_ep_new(handle->ep->bus, name,
+			  make->flags & (KDBUS_MAKE_ACCESS_WORLD |
+					 KDBUS_MAKE_ACCESS_GROUP),
+			  current_euid(), current_egid(), true);
+	if (IS_ERR(ep)) {
+		ret = PTR_ERR(ep);
+		goto exit;
+	}
+
+	ret = kdbus_ep_activate(ep);
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	ret = kdbus_ep_policy_set(ep, make->items,
+				  KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	/* protect against parallel ioctls */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_NONE) {
+		ret = -EBADFD;
+	} else {
+		handle->type = KDBUS_HANDLE_EP_OWNER;
+		handle->ep_owner = ep;
+	}
+	mutex_unlock(&handle->lock);
+
+	if (ret < 0)
+		goto exit_ep_unref;
+
+	goto exit;
+
+exit_ep_unref:
+	kdbus_ep_deactivate(ep);
+	kdbus_ep_unref(ep);
+exit:
+	kfree(make);
+	return ret;
+}
+
+static int handle_ep_ioctl_hello(struct kdbus_handle_ep *handle,
+				 void __user *buf)
+{
+	struct kdbus_conn *conn;
+	struct kdbus_cmd_hello *hello;
+	int ret;
+
+	hello = kdbus_memdup_user(buf, sizeof(*hello), KDBUS_HELLO_MAX_SIZE);
+	if (IS_ERR(hello))
+		return PTR_ERR(hello);
+
+	ret = kdbus_negotiate_flags(hello, buf, typeof(*hello),
+				    KDBUS_HELLO_ACCEPT_FD |
+				    KDBUS_HELLO_ACTIVATOR |
+				    KDBUS_HELLO_POLICY_HOLDER |
+				    KDBUS_HELLO_MONITOR);
+	if (ret < 0)
+		goto exit;
+
+	hello->return_flags = 0;
+
+	ret = kdbus_items_validate(hello->items,
+				   KDBUS_ITEMS_SIZE(hello, items));
+	if (ret < 0)
+		goto exit;
+
+	if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE)) {
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	conn = kdbus_conn_new(handle->ep, hello, handle->privileged);
+	if (IS_ERR(conn)) {
+		ret = PTR_ERR(conn);
+		goto exit;
+	}
+
+	ret = kdbus_conn_connect(conn, hello);
+	if (ret < 0)
+		goto exit_conn;
+
+	ret = kdbus_conn_acquire(conn);
+	if (ret < 0)
+		goto exit_conn;
+
+	if (kdbus_conn_is_activator(conn) || kdbus_conn_is_policy_holder(conn))
+		ret = kdbus_policy_set(&conn->ep->bus->policy_db, hello->items,
+				       KDBUS_ITEMS_SIZE(hello, items),
+				       1, kdbus_conn_is_policy_holder(conn),
+				       conn);
+
+	kdbus_conn_release(conn);
+
+	if (ret < 0)
+		goto exit_conn;
+
+	if (copy_to_user(buf, hello, sizeof(*hello))) {
+		ret = -EFAULT;
+		goto exit_conn;
+	}
+
+	/* protect against parallel ioctls */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_NONE) {
+		ret = -EBADFD;
+	} else {
+		handle->type = KDBUS_HANDLE_EP_CONNECTED;
+		handle->conn = conn;
+	}
+	mutex_unlock(&handle->lock);
+
+	if (ret < 0)
+		goto exit_conn;
+
+	goto exit;
+
+exit_conn:
+	kdbus_conn_disconnect(conn, false);
+	kdbus_conn_unref(conn);
+exit:
+	kfree(hello);
+	return ret;
+}
+
+/* kdbus endpoint make commands */
+static long handle_ep_ioctl_none(struct file *file, unsigned int cmd,
+				 void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	long ret;
+
+	switch (cmd) {
+	case KDBUS_CMD_ENDPOINT_MAKE:
+		ret = handle_ep_ioctl_endpoint_make(handle, buf);
+		break;
+
+	case KDBUS_CMD_HELLO:
+		ret = handle_ep_ioctl_hello(handle, buf);
+		break;
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	return ret;
+}
+
+/* kdbus endpoint commands for connected peers */
+static long handle_ep_ioctl_connected(struct file *file, unsigned int cmd,
+				      void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	struct kdbus_conn *conn = handle->conn;
+	void *free_ptr = NULL;
+	long ret = 0;
+
+	/*
+	 * BYEBYE is special; we must not acquire a connection when
+	 * calling into kdbus_conn_disconnect() or we will deadlock,
+	 * because kdbus_conn_disconnect() will wait for all acquired
+	 * references to be dropped.
+	 */
+	if (cmd == KDBUS_CMD_BYEBYE) {
+		if (!kdbus_conn_is_ordinary(conn))
+			return -EOPNOTSUPP;
+
+		return kdbus_conn_disconnect(conn, true);
+	}
+
+	ret = kdbus_conn_acquire(conn);
+	if (ret < 0)
+		return ret;
+
+	switch (cmd) {
+	case KDBUS_CMD_NAME_ACQUIRE: {
+		/* acquire a well-known name */
+		struct kdbus_cmd_name *cmd_name;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_name = kdbus_memdup_user(buf, sizeof(*cmd_name),
+					     sizeof(*cmd_name) +
+						KDBUS_ITEM_HEADER_SIZE +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_name)) {
+			ret = PTR_ERR(cmd_name);
+			break;
+		}
+
+		free_ptr = cmd_name;
+
+		cmd_name->return_flags = 0;
+		if (kdbus_member_set_user(&cmd_name->return_flags, buf,
+					  struct kdbus_cmd_name,
+					  return_flags)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+					    KDBUS_NAME_REPLACE_EXISTING |
+					    KDBUS_NAME_ALLOW_REPLACEMENT |
+					    KDBUS_NAME_QUEUE);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_name->items,
+					   KDBUS_ITEMS_SIZE(cmd_name, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_acquire(conn->ep->bus->name_registry,
+					     conn, cmd_name);
+		if (ret < 0)
+			break;
+
+		/* return flags to the caller */
+		if (copy_to_user(buf, cmd_name, cmd_name->size))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_NAME_RELEASE: {
+		/* release a well-known name */
+		struct kdbus_cmd_name *cmd_name;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_name = kdbus_memdup_user(buf, sizeof(*cmd_name),
+					     sizeof(*cmd_name) +
+						KDBUS_ITEM_HEADER_SIZE +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_name)) {
+			ret = PTR_ERR(cmd_name);
+			break;
+		}
+
+		free_ptr = cmd_name;
+
+		cmd_name->return_flags = 0;
+		if (kdbus_member_set_user(&cmd_name->return_flags, buf,
+					  struct kdbus_cmd_name,
+					  return_flags)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+					    0);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_name->items,
+					   KDBUS_ITEMS_SIZE(cmd_name, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_release(conn->ep->bus->name_registry,
+					     conn, cmd_name);
+		break;
+	}
+
+	case KDBUS_CMD_NAME_LIST: {
+		struct kdbus_cmd_name_list *cmd_list;
+
+		cmd_list = kdbus_memdup_user(buf, sizeof(*cmd_list),
+					     KDBUS_CMD_MAX_SIZE);
+		if (IS_ERR(cmd_list)) {
+			ret = PTR_ERR(cmd_list);
+			break;
+		}
+
+		free_ptr = cmd_list;
+
+		ret = kdbus_negotiate_flags(cmd_list, buf, typeof(*cmd_list),
+					    KDBUS_NAME_LIST_UNIQUE |
+					    KDBUS_NAME_LIST_NAMES |
+					    KDBUS_NAME_LIST_ACTIVATORS |
+					    KDBUS_NAME_LIST_QUEUED);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_list->items,
+					   KDBUS_ITEMS_SIZE(cmd_list, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_name_list(conn->ep->bus->name_registry,
+					  conn, cmd_list);
+		if (ret < 0)
+			break;
+
+		cmd_list->return_flags = 0;
+
+		/* return allocated data */
+		if (kdbus_member_set_user(&cmd_list->offset, buf,
+					  struct kdbus_cmd_name_list, offset) ||
+		    kdbus_member_set_user(&cmd_list->list_size, buf,
+					  struct kdbus_cmd_name_list,
+					  list_size) ||
+		    kdbus_member_set_user(&cmd_list->return_flags, buf,
+					  struct kdbus_cmd_name_list,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_CONN_INFO:
+	case KDBUS_CMD_BUS_CREATOR_INFO: {
+		struct kdbus_cmd_info *cmd_info;
+
+		/* return the properties of a connection */
+		cmd_info = kdbus_memdup_user(buf, sizeof(*cmd_info),
+					     sizeof(*cmd_info) +
+						KDBUS_NAME_MAX_LEN + 1);
+		if (IS_ERR(cmd_info)) {
+			ret = PTR_ERR(cmd_info);
+			break;
+		}
+
+		free_ptr = cmd_info;
+
+		ret = kdbus_negotiate_flags(cmd_info, buf, typeof(*cmd_info),
+					    _KDBUS_ATTACH_ALL);
+		if (ret < 0)
+			break;
+
+		cmd_info->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_info->items,
+					   KDBUS_ITEMS_SIZE(cmd_info, items));
+		if (ret < 0)
+			break;
+
+		if (cmd == KDBUS_CMD_CONN_INFO)
+			ret = kdbus_cmd_conn_info(conn, cmd_info);
+		else
+			ret = kdbus_cmd_bus_creator_info(conn, cmd_info);
+
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_info->offset, buf,
+					  struct kdbus_cmd_info, offset) ||
+		    kdbus_member_set_user(&cmd_info->info_size, buf,
+					  struct kdbus_cmd_info, info_size) ||
+		    kdbus_member_set_user(&cmd_info->return_flags, buf,
+					  struct kdbus_cmd_info,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_CONN_UPDATE: {
+		/* update the properties of a connection */
+		struct kdbus_cmd_update *cmd_update;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_policy_holder(conn) &&
+		    !kdbus_conn_is_monitor(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_update = kdbus_memdup_user(buf, sizeof(*cmd_update),
+					       KDBUS_UPDATE_MAX_SIZE);
+		if (IS_ERR(cmd_update)) {
+			ret = PTR_ERR(cmd_update);
+			break;
+		}
+
+		free_ptr = cmd_update;
+
+		ret = kdbus_negotiate_flags(cmd_update, buf,
+					    typeof(*cmd_update), 0);
+		if (ret < 0)
+			break;
+
+		cmd_update->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_update->items,
+					   KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_conn_update(conn, cmd_update);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_update->return_flags, buf,
+					  struct kdbus_cmd_update,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_MATCH_ADD: {
+		/* subscribe to/filter for broadcast messages */
+		struct kdbus_cmd_match *cmd_match;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_match = kdbus_memdup_user(buf, sizeof(*cmd_match),
+					      KDBUS_MATCH_MAX_SIZE);
+		if (IS_ERR(cmd_match)) {
+			ret = PTR_ERR(cmd_match);
+			break;
+		}
+
+		free_ptr = cmd_match;
+
+		ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+					    KDBUS_MATCH_REPLACE);
+		if (ret < 0)
+			break;
+
+		cmd_match->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_match->items,
+					   KDBUS_ITEMS_SIZE(cmd_match, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_match_db_add(conn, cmd_match);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_match->return_flags, buf,
+					  struct kdbus_cmd_match,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_MATCH_REMOVE: {
+		/* unsubscribe from broadcast messages */
+		struct kdbus_cmd_match *cmd_match;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_match = kdbus_memdup_user(buf, sizeof(*cmd_match),
+					      sizeof(*cmd_match));
+		if (IS_ERR(cmd_match)) {
+			ret = PTR_ERR(cmd_match);
+			break;
+		}
+
+		free_ptr = cmd_match;
+
+		ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+					    0);
+		if (ret < 0)
+			break;
+
+		cmd_match->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_match->items,
+					   KDBUS_ITEMS_SIZE(cmd_match, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_match_db_remove(conn, cmd_match);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_match->return_flags, buf,
+					  struct kdbus_cmd_match,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_SEND: {
+		/* submit a message which will be queued in the receiver */
+		struct kdbus_cmd_send *cmd_send;
+		struct kdbus_kmsg *kmsg = NULL;
+
+		if (!kdbus_conn_is_ordinary(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_send = kdbus_memdup_user(buf, sizeof(*cmd_send),
+					     KDBUS_SEND_MAX_SIZE);
+		if (IS_ERR(cmd_send)) {
+			ret = PTR_ERR(cmd_send);
+			break;
+		}
+
+		free_ptr = cmd_send;
+
+		ret = kdbus_negotiate_flags(cmd_send, buf, typeof(*cmd_send),
+					    KDBUS_SEND_SYNC_REPLY);
+		if (ret < 0)
+			break;
+
+		cmd_send->return_flags = 0;
+		cmd_send->reply.offset = 0;
+		cmd_send->reply.msg_size = 0;
+		cmd_send->reply.return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_send->items,
+					   KDBUS_ITEMS_SIZE(cmd_send, items));
+		if (ret < 0)
+			break;
+
+		kmsg = kdbus_kmsg_new_from_cmd(conn, buf, cmd_send);
+		if (IS_ERR(kmsg)) {
+			ret = PTR_ERR(kmsg);
+			break;
+		}
+
+		ret = kdbus_cmd_msg_send(conn, cmd_send, file, kmsg);
+		if (ret < 0) {
+			kdbus_kmsg_free(kmsg);
+			break;
+		}
+
+		if (kdbus_member_set_user(&cmd_send->return_flags, buf,
+					  struct kdbus_cmd_send,
+					  return_flags))
+			ret = -EFAULT;
+
+		/* store the reply back to userspace */
+		if (cmd_send->flags & KDBUS_SEND_SYNC_REPLY) {
+			if (kdbus_member_set_user(&cmd_send->reply, buf,
+						  struct kdbus_cmd_send,
+						  reply))
+				ret = -EFAULT;
+		}
+
+		kdbus_kmsg_free(kmsg);
+		break;
+	}
+
+	case KDBUS_CMD_RECV: {
+		struct kdbus_cmd_recv *cmd_recv;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_monitor(conn) &&
+		    !kdbus_conn_is_activator(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_recv = kdbus_memdup_user(buf, sizeof(*cmd_recv),
+					     KDBUS_RECV_MAX_SIZE);
+		if (IS_ERR(cmd_recv)) {
+			ret = PTR_ERR(cmd_recv);
+			break;
+		}
+
+		free_ptr = cmd_recv;
+
+		ret = kdbus_negotiate_flags(cmd_recv, buf, typeof(*cmd_recv),
+					    KDBUS_RECV_PEEK |
+					    KDBUS_RECV_DROP |
+					    KDBUS_RECV_USE_PRIORITY);
+		if (ret < 0)
+			break;
+
+		cmd_recv->return_flags = 0;
+		cmd_recv->dropped_msgs = 0;
+		cmd_recv->msg.offset = 0;
+		cmd_recv->msg.msg_size = 0;
+		cmd_recv->msg.return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_recv->items,
+					   KDBUS_ITEMS_SIZE(cmd_recv, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_cmd_msg_recv(conn, cmd_recv);
+		/*
+		 * In case of -EOVERFLOW, we still have to write back the
+		 * number of lost messages.
+		 */
+		if (ret < 0 && ret != -EOVERFLOW)
+			break;
+
+		/* return the number of dropped messages */
+		if (kdbus_member_set_user(&cmd_recv->dropped_msgs, buf,
+					  struct kdbus_cmd_recv,
+					  dropped_msgs) ||
+		    kdbus_member_set_user(&cmd_recv->msg, buf,
+					  struct kdbus_cmd_recv, msg) ||
+		    kdbus_member_set_user(&cmd_recv->return_flags, buf,
+					  struct kdbus_cmd_recv,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	case KDBUS_CMD_FREE: {
+		struct kdbus_cmd_free *cmd_free;
+		const struct kdbus_item *item;
+
+		if (!kdbus_conn_is_ordinary(conn) &&
+		    !kdbus_conn_is_monitor(conn) &&
+		    !kdbus_conn_is_activator(conn)) {
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		cmd_free = kdbus_memdup_user(buf, sizeof(*cmd_free),
+					     KDBUS_CMD_MAX_SIZE);
+		if (IS_ERR(cmd_free)) {
+			ret = PTR_ERR(cmd_free);
+			break;
+		}
+
+		free_ptr = cmd_free;
+
+		ret = kdbus_negotiate_flags(cmd_free, buf, typeof(*cmd_free),
+					    0);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_items_validate(cmd_free->items,
+					   KDBUS_ITEMS_SIZE(cmd_free, items));
+		if (ret < 0)
+			break;
+
+		KDBUS_ITEMS_FOREACH(item, cmd_free->items,
+				    KDBUS_ITEMS_SIZE(cmd_free, items)) {
+			/* no items supported so far */
+			switch (item->type) {
+			default:
+				ret = -EINVAL;
+				break;
+			}
+		}
+		if (ret < 0)
+			break;
+
+		cmd_free->return_flags = 0;
+
+		ret = kdbus_pool_release_offset(conn->pool, cmd_free->offset);
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_free->return_flags, buf,
+					  struct kdbus_cmd_free,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kdbus_conn_release(conn);
+	kfree(free_ptr);
+	return ret;
+}
+
+/* kdbus endpoint commands for endpoint owners */
+static long handle_ep_ioctl_owner(struct file *file, unsigned int cmd,
+				  void __user *buf)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	struct kdbus_ep *ep = handle->ep_owner;
+	void *free_ptr = NULL;
+	long ret = 0;
+
+	switch (cmd) {
+	case KDBUS_CMD_ENDPOINT_UPDATE: {
+		struct kdbus_cmd_update *cmd_update;
+
+		/* update the properties of a custom endpoint */
+		cmd_update = kdbus_memdup_user(buf, sizeof(*cmd_update),
+					       KDBUS_UPDATE_MAX_SIZE);
+		if (IS_ERR(cmd_update)) {
+			ret = PTR_ERR(cmd_update);
+			break;
+		}
+
+		free_ptr = cmd_update;
+
+		ret = kdbus_negotiate_flags(cmd_update, buf,
+					    typeof(*cmd_update), 0);
+		if (ret < 0)
+			break;
+
+		cmd_update->return_flags = 0;
+
+		ret = kdbus_items_validate(cmd_update->items,
+					   KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		ret = kdbus_ep_policy_set(ep, cmd_update->items,
+					  KDBUS_ITEMS_SIZE(cmd_update, items));
+		if (ret < 0)
+			break;
+
+		if (kdbus_member_set_user(&cmd_update->return_flags, buf,
+					  struct kdbus_cmd_update,
+					  return_flags))
+			ret = -EFAULT;
+
+		break;
+	}
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kfree(free_ptr);
+	return ret;
+}
+
+static long handle_ep_ioctl(struct file *file, unsigned int cmd,
+			    unsigned long arg)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	void __user *argp = (void __user *)arg;
+	enum kdbus_handle_ep_type type;
+
+	/* lock while accessing handle->type to enforce barriers */
+	mutex_lock(&handle->lock);
+	type = handle->type;
+	mutex_unlock(&handle->lock);
+
+	switch (type) {
+	case KDBUS_HANDLE_EP_NONE:
+		return handle_ep_ioctl_none(file, cmd, argp);
+
+	case KDBUS_HANDLE_EP_CONNECTED:
+		return handle_ep_ioctl_connected(file, cmd, argp);
+
+	case KDBUS_HANDLE_EP_OWNER:
+		return handle_ep_ioctl_owner(file, cmd, argp);
+
+	default:
+		return -EBADFD;
+	}
+}
+
+static unsigned int handle_ep_poll(struct file *file,
+				   struct poll_table_struct *wait)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+	unsigned int mask = POLLOUT | POLLWRNORM;
+	int ret;
+
+	/* Only a connected endpoint can read/write data */
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_CONNECTED) {
+		mutex_unlock(&handle->lock);
+		return POLLERR | POLLHUP;
+	}
+	mutex_unlock(&handle->lock);
+
+	ret = kdbus_conn_acquire(handle->conn);
+	if (ret < 0)
+		return POLLERR | POLLHUP;
+
+	poll_wait(file, &handle->conn->wait, wait);
+
+	if (!list_empty(&handle->conn->queue.msg_list))
+		mask |= POLLIN | POLLRDNORM;
+
+	kdbus_conn_release(handle->conn);
+
+	return mask;
+}
+
+static int handle_ep_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct kdbus_handle_ep *handle = file->private_data;
+
+	mutex_lock(&handle->lock);
+	if (handle->type != KDBUS_HANDLE_EP_CONNECTED) {
+		mutex_unlock(&handle->lock);
+		return -EPERM;
+	}
+	mutex_unlock(&handle->lock);
+
+	return kdbus_pool_mmap(handle->conn->pool, vma);
+}
+
+const struct file_operations kdbus_handle_ep_ops = {
+	.owner =		THIS_MODULE,
+	.open =			handle_ep_open,
+	.release =		handle_ep_release,
+	.poll =			handle_ep_poll,
+	.llseek =		noop_llseek,
+	.unlocked_ioctl =	handle_ep_ioctl,
+	.mmap =			handle_ep_mmap,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl =		handle_ep_ioctl,
+#endif
+};
+
+static int handle_control_open(struct inode *inode, struct file *file)
+{
+	if (!kdbus_node_is_active(inode->i_private))
+		return -ESHUTDOWN;
+
+	/* private_data is used by BUS_MAKE to store the new bus */
+	file->private_data = NULL;
+
+	return 0;
+}
+
+static int handle_control_release(struct inode *inode, struct file *file)
+{
+	struct kdbus_bus *bus = file->private_data;
+
+	if (bus) {
+		kdbus_bus_deactivate(bus);
+		kdbus_bus_unref(bus);
+	}
+
+	return 0;
+}
+
+static int handle_control_ioctl_bus_make(struct file *file,
+					 struct kdbus_domain *domain,
+					 void __user *buf)
+{
+	struct kdbus_cmd_make *make;
+	struct kdbus_bus *bus;
+	int ret;
+
+	/* catch double BUS_MAKE early, locked test is below */
+	if (file->private_data)
+		return -EBADFD;
+
+	make = kdbus_memdup_user(buf, sizeof(*make), KDBUS_MAKE_MAX_SIZE);
+	if (IS_ERR(make))
+		return PTR_ERR(make);
+
+	ret = kdbus_negotiate_flags(make, buf, struct kdbus_cmd_make,
+				    KDBUS_MAKE_ACCESS_GROUP |
+				    KDBUS_MAKE_ACCESS_WORLD);
+	if (ret < 0)
+		goto exit;
+
+	ret = kdbus_items_validate(make->items, KDBUS_ITEMS_SIZE(make, items));
+	if (ret < 0)
+		goto exit;
+
+	bus = kdbus_bus_new(domain, make, current_euid(), current_egid());
+	if (IS_ERR(bus)) {
+		ret = PTR_ERR(bus);
+		goto exit;
+	}
+
+	ret = kdbus_bus_activate(bus);
+	if (ret < 0)
+		goto exit_bus_unref;
+
+	/* protect against parallel ioctls */
+	mutex_lock(&domain->lock);
+	if (file->private_data)
+		ret = -EBADFD;
+	else
+		file->private_data = bus;
+	mutex_unlock(&domain->lock);
+
+	if (ret < 0)
+		goto exit_bus_unref;
+
+	goto exit;
+
+exit_bus_unref:
+	kdbus_bus_deactivate(bus);
+	kdbus_bus_unref(bus);
+exit:
+	kfree(make);
+	return ret;
+}
+
+static long handle_control_ioctl(struct file *file, unsigned int cmd,
+				 unsigned long arg)
+{
+	struct kdbus_node *node = file_inode(file)->i_private;
+	struct kdbus_domain *domain;
+	int ret = 0;
+
+	/*
+	 * The parent of control-nodes is always a domain, make sure to pin it
+	 * so the parent is actually valid.
+	 */
+	if (!kdbus_node_acquire(node))
+		return -ESHUTDOWN;
+
+	domain = kdbus_domain_from_node(node->parent);
+	if (!kdbus_node_acquire(&domain->node)) {
+		kdbus_node_release(node);
+		return -ESHUTDOWN;
+	}
+
+	switch (cmd) {
+	case KDBUS_CMD_BUS_MAKE:
+		ret = handle_control_ioctl_bus_make(file, domain,
+						    (void __user *)arg);
+		break;
+
+	default:
+		ret = -ENOTTY;
+		break;
+	}
+
+	kdbus_node_release(&domain->node);
+	kdbus_node_release(node);
+	return ret;
+}
+
+const struct file_operations kdbus_handle_control_ops = {
+	.open =			handle_control_open,
+	.release =		handle_control_release,
+	.llseek =		noop_llseek,
+	.unlocked_ioctl =	handle_control_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl =		handle_control_ioctl,
+#endif
+};
diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
new file mode 100644
index 000000000000..32809dad3720
--- /dev/null
+++ b/ipc/kdbus/handle.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_HANDLE_H
+#define __KDBUS_HANDLE_H
+
+extern const struct file_operations kdbus_handle_ep_ops;
+extern const struct file_operations kdbus_handle_control_ops;
+
+#endif
diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
new file mode 100644
index 000000000000..b848f437e792
--- /dev/null
+++ b/ipc/kdbus/limits.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DEFAULTS_H
+#define __KDBUS_DEFAULTS_H
+
+/* maximum size of message header and items */
+#define KDBUS_MSG_MAX_SIZE		SZ_8K
+
+/* maximum number of message items */
+#define KDBUS_MSG_MAX_ITEMS		128
+
+/* max size of ioctl command data */
+#define KDBUS_CMD_MAX_SIZE		SZ_8K
+
+/*
+ * Maximum number of passed file descriptors
+ * Number taken from AF_UNIX upper limits
+ */
+#define KDBUS_MSG_MAX_FDS		253
+
+/* maximum message payload size */
+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		SZ_2M
+
+/* maximum size of bloom bit field in bytes */
+#define KDBUS_BUS_BLOOM_MAX_SIZE		SZ_4K
+
+/* maximum length of well-known bus name */
+#define KDBUS_NAME_MAX_LEN			255
+
+/* maximum length of bus, domain, ep name */
+#define KDBUS_SYSNAME_MAX_LEN			63
+
+/* maximum size of make data */
+#define KDBUS_MAKE_MAX_SIZE			SZ_32K
+
+/* maximum size of hello data */
+#define KDBUS_HELLO_MAX_SIZE			SZ_32K
+
+/* maximum size for update commands */
+#define KDBUS_UPDATE_MAX_SIZE			SZ_32K
+
+/* maximum number of matches per connection */
+#define KDBUS_MATCH_MAX				256
+
+/* maximum size of match data */
+#define KDBUS_MATCH_MAX_SIZE			SZ_32K
+
+/* maximum size of send data */
+#define KDBUS_SEND_MAX_SIZE			SZ_32K
+
+/* maximum size of recv data */
+#define KDBUS_RECV_MAX_SIZE			SZ_32K
+
+/* maximum size of policy data */
+#define KDBUS_POLICY_MAX_SIZE			SZ_32K
+
+/* maximum number of queued messages in a connection */
+#define KDBUS_CONN_MAX_MSGS			256
+
+/*
+ * maximum number of queued messages wich will not be user accounted.
+ * after this value is reached each user will have an individual limit.
+ */
+#define KDBUS_CONN_MAX_MSGS_UNACCOUNTED		16
+
+/*
+ * maximum number of queued messages from the same indvidual user after the
+ * the un-accounted value has been hit
+ */
+#define KDBUS_CONN_MAX_MSGS_PER_USER		16
+
+/* maximum number of well-known names per connection */
+#define KDBUS_CONN_MAX_NAMES			256
+
+/* maximum number of queued requests waiting for a reply */
+#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
+
+/* maximum number of connections per user in one domain */
+#define KDBUS_USER_MAX_CONN			1024
+
+/* maximum number of buses per user in one domain */
+#define KDBUS_USER_MAX_BUSES			16
+
+#endif
diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
new file mode 100644
index 000000000000..5d6a84453347
--- /dev/null
+++ b/ipc/kdbus/main.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt)    KBUILD_MODNAME ": " fmt
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+
+#include "util.h"
+#include "fs.h"
+#include "handle.h"
+#include "metadata.h"
+#include "node.h"
+
+/* kdbus mount-point /sys/fs/kdbus */
+static struct kobject *kdbus_dir;
+
+/* global module option to apply a mask to exported metadata */
+unsigned long long kdbus_meta_attach_mask = KDBUS_ATTACH_TIMESTAMP |
+					    KDBUS_ATTACH_CREDS |
+					    KDBUS_ATTACH_PIDS |
+					    KDBUS_ATTACH_AUXGROUPS |
+					    KDBUS_ATTACH_NAMES |
+					    KDBUS_ATTACH_SECLABEL |
+					    KDBUS_ATTACH_CONN_DESCRIPTION;
+MODULE_PARM_DESC(attach_flags_mask, "Attach-flags mask for exported metadata");
+module_param_named(attach_flags_mask, kdbus_meta_attach_mask, ullong, 0644);
+
+static int __init kdbus_init(void)
+{
+	int ret;
+
+	kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
+	if (!kdbus_dir)
+		return -ENOMEM;
+
+	ret = kdbus_fs_init();
+	if (ret < 0) {
+		pr_err("cannot register filesystem: %d\n", ret);
+		goto exit_dir;
+	}
+
+	pr_info("initialized\n");
+	return 0;
+
+exit_dir:
+	kobject_put(kdbus_dir);
+	return ret;
+}
+
+static void __exit kdbus_exit(void)
+{
+	kdbus_fs_exit();
+	kobject_put(kdbus_dir);
+}
+
+module_init(kdbus_init);
+module_exit(kdbus_exit);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
+MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
new file mode 100644
index 000000000000..16069f5f644e
--- /dev/null
+++ b/ipc/kdbus/util.c
@@ -0,0 +1,317 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/ctype.h>
+#include <linux/err.h>
+#include <linux/file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+#include <linux/user_namespace.h>
+
+#include "limits.h"
+#include "util.h"
+
+/**
+ * kdbus_sysname_valid() - validate names showing up in /proc, /sys and /dev
+ * @name:		Name of domain, bus, endpoint
+ *
+ * Return: 0 if the given name is valid, otherwise negative errno
+ */
+int kdbus_sysname_is_valid(const char *name)
+{
+	unsigned int i;
+	size_t len;
+
+	len = strlen(name);
+	if (len == 0)
+		return -EINVAL;
+
+	for (i = 0; i < len; i++) {
+		if (isalpha(name[i]))
+			continue;
+		if (isdigit(name[i]))
+			continue;
+		if (name[i] == '_')
+			continue;
+		if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
+			continue;
+
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * kdbus_check_and_write_flags() - check flags provided by user, and write the
+ *				   valid mask back
+ * @flags:	The flags mask provided by userspace
+ * @buf:	The buffer provided by userspace
+ * @offset_out:	Offset of the kernel_flags field inside the user-provided struct
+ * @valid:	Mask of valid bits
+ *
+ * This function will check whether the flags provided by userspace are within
+ * the combination of allowed bits to the kernel, with the KDBUS_FLAGS_KERNEL
+ * bit set in the return buffer.
+ *
+ * Return: 0 on success, -EFAULT if copy_to_user() failed, or -EINVAL if
+ * userspace submitted invalid bits in its mask.
+ */
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+				off_t offset_out, u64 valid)
+{
+	u64 val = valid | KDBUS_FLAG_KERNEL;
+
+	/*
+	 * KDBUS_FLAG_KERNEL is reserved and will never be considered
+	 * valid by any user of this function.
+	 */
+	WARN_ON_ONCE(valid & KDBUS_FLAG_KERNEL);
+
+	if (copy_to_user(((u8 __user *)buf) + offset_out, &val, sizeof(val)))
+		return -EFAULT;
+
+	if (flags & ~valid)
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_fput_files() - fput() an array of struct files
+ * @files:	The array of files to put, may be NULL
+ * @count:	The number of elements in @files
+ *
+ * Call fput() on all non-NULL elements in @files, and set the entries to
+ * NULL afterwards.
+ */
+void kdbus_fput_files(struct file **files, unsigned int count)
+{
+	int i;
+
+	if (!files || count == 0)
+		return;
+
+	for (i = count - 1; i >= 0; i--)
+		if (files[i]) {
+			fput(files[i]);
+			files[i] = NULL;
+		}
+}
+
+/**
+ * kdbus_copy_from_user() - copy aligned data from user-space
+ * @dest:	target buffer in kernel memory
+ * @user_ptr:	user-provided source buffer
+ * @size:	memory size to copy from user
+ *
+ * This copies @size bytes from @user_ptr into the kernel, just like
+ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
+ * unaligned user-space pointers.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
+{
+	if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
+		return -EFAULT;
+
+	if (copy_from_user(dest, user_ptr, size))
+		return -EFAULT;
+
+	return 0;
+}
+
+/**
+ * kdbus_memdup_user() - copy dynamically sized object from user-space
+ * @user_ptr:	user-provided source buffer
+ * @sz_min:	minimum object size
+ * @sz_max:	maximum object size
+ *
+ * This copies a dynamically sized object from user-space into kernel-space. We
+ * require the object to have a 64bit size field at offset 0. We read it out
+ * first, allocate a suitably sized buffer and then copy all data.
+ *
+ * The @sz_min and @sz_max parameters define possible min and max object sizes
+ * so user-space cannot trigger un-bound kernel-space allocations.
+ *
+ * The same alignment-restrictions as described in kdbus_copy_from_user() apply.
+ *
+ * Return: pointer to dynamically allocated copy, or ERR_PTR() on failure.
+ */
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max)
+{
+	void *ptr;
+	u64 size;
+	int ret;
+
+	ret = kdbus_copy_from_user(&size, user_ptr, sizeof(size));
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	if (size < sz_min)
+		return ERR_PTR(-EINVAL);
+
+	if (size > sz_max)
+		return ERR_PTR(-EMSGSIZE);
+
+	ptr = memdup_user(user_ptr, size);
+	if (IS_ERR(ptr))
+		return ptr;
+
+	if (*(u64 *)ptr != size) {
+		kfree(ptr);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return ptr;
+}
+
+/**
+ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
+ * @name:	user-supplied name to verify
+ * @user_ns:	user-namespace to act in
+ * @kuid:	Kernel internal uid of user
+ *
+ * This verifies that the user-supplied name @name has their UID as prefix. This
+ * is the default name-spacing policy we enforce on user-supplied names for
+ * public kdbus entities like buses and endpoints.
+ *
+ * The user must supply names prefixed with "<UID>-", whereas the UID is
+ * interpreted in the user-namespace of the domain. If the user fails to supply
+ * such a prefixed name, we reject it.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+			    kuid_t kuid)
+{
+	uid_t uid;
+	char prefix[16];
+
+	/*
+	 * The kuid must have a mapping into the userns of the domain
+	 * otherwise do not allow creation of buses nor endpoints.
+	 */
+	uid = from_kuid(user_ns, kuid);
+	if (uid == (uid_t) -1)
+		return -EINVAL;
+
+	snprintf(prefix, sizeof(prefix), "%u-", uid);
+	if (strncmp(name, prefix, strlen(prefix)) != 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_from_kuid_keep() - Create a uid from kuid/user-ns pair
+ * @uid:		Kernel uid to map into @user_ns
+ *
+ * This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself.
+ *
+ * Return: UID @uid mapped into @user_ns, or INVALID_UID if @uid==INVALID_UID.
+ */
+u32 kdbus_from_kuid_keep(kuid_t uid)
+{
+	return uid_valid(uid) ?
+		from_kuid_munged(current_user_ns(), uid) : ((uid_t)-1);
+}
+
+/**
+ * kdbus_from_kgid_keep() - Create a gid from kgid/user-ns pair
+ * @gid:		Kernel gid to map into @user_ns
+ *
+ * This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself.
+ *
+ * Return: GID @gid mapped into @user_ns, or INVALID_GID if @gid==INVALID_GID.
+ */
+u32 kdbus_from_kgid_keep(kgid_t gid)
+{
+	return gid_valid(gid) ?
+		from_kgid_munged(current_user_ns(), gid) : ((gid_t)-1);
+}
+
+/**
+ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
+ * @flags:		Attach flags provided by userspace
+ * @attach_flags:	A pointer where to store the valid attach flags
+ *
+ * Convert attach-flags provided by user-space into a valid mask. If the mask
+ * is invalid, an error is returned. The sanitized attach flags are stored in
+ * the output parameter.
+ *
+ * Return: 0 on success, negative error on failure.
+ */
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
+{
+	/* 'any' degrades to 'all' for compatibility */
+	if (flags == _KDBUS_ATTACH_ANY)
+		flags = _KDBUS_ATTACH_ALL;
+
+	/* reject unknown attach flags */
+	if (flags & ~_KDBUS_ATTACH_ALL)
+		return -EINVAL;
+
+	*attach_flags = flags;
+	return 0;
+}
+
+/**
+ * kdbus_kvec_set - helper utility to assemble kvec arrays
+ * @kvec:	kvec entry to use
+ * @src:	Source address to set in @kvec
+ * @len:	Number of bytes in @src
+ * @total_len:	Pointer to total length variable
+ *
+ * Set @src and @len in @kvec, and increase @total_len by @len.
+ */
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
+{
+	kvec->iov_base = src;
+	kvec->iov_len = len;
+	*total_len += len;
+}
+
+static const char * const zeros = "\0\0\0\0\0\0\0";
+
+/**
+ * kdbus_kvec_pad - conditionally write a padding kvec
+ * @kvec:	kvec entry to use
+ * @len:	Total length used for kvec array
+ *
+ * Check if the current total byte length of the array in @len is aligned to
+ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
+ * by the number of bytes stored in @kvec.
+ *
+ * Return: the number of added padding bytes.
+ */
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
+{
+	size_t pad = KDBUS_ALIGN8(*len) - *len;
+
+	if (!pad)
+		return 0;
+
+	kvec->iov_base = (void *)zeros;
+	kvec->iov_len = pad;
+
+	*len += pad;
+
+	return pad;
+}
diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
new file mode 100644
index 000000000000..33d31f6274e0
--- /dev/null
+++ b/ipc/kdbus/util.h
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_UTIL_H
+#define __KDBUS_UTIL_H
+
+#include <linux/dcache.h>
+#include <linux/ioctl.h>
+#include <linux/uidgid.h>
+
+#include "kdbus.h"
+
+/* all exported addresses are 64 bit */
+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
+
+/* all exported sizes are 64 bit and data aligned to 64 bit */
+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
+
+/**
+ * kdbus_size_get_user - read the size variable from user memory
+ * @_s:			Size variable
+ * @_b:			Buffer to read from
+ * @_t:			Structure, "size" is a member of
+ *
+ * Return: the result of copy_from_user()
+ */
+#define kdbus_size_get_user(_s, _b, _t)					\
+({									\
+	u64 __user *_sz =						\
+		(void __user *)((u8 __user *)(_b) + offsetof(_t, size));\
+	copy_from_user(_s, _sz, sizeof(__u64));				\
+})
+
+/**
+ * kdbus_member_set_user - write a structure member to user memory
+ * @_s:			Variable to copy from
+ * @_b:			Buffer to write to
+ * @_t:			Structure type
+ * @_m:			Member name in the passed structure
+ *
+ * Return: the result of copy_to_user()
+ */
+#define kdbus_member_set_user(_s, _b, _t, _m)				\
+({									\
+	u64 __user *_sz =						\
+		(void __user *)((u8 __user *)(_b) + offsetof(_t, _m));	\
+	copy_to_user(_sz, _s, sizeof(((_t *)0)->_m));			\
+})
+
+/**
+ * kdbus_strhash - calculate a hash
+ * @str:		String
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_strhash(const char *str)
+{
+	unsigned long hash = init_name_hash();
+
+	while (*str)
+		hash = partial_name_hash(*str++, hash);
+
+	return end_name_hash(hash);
+}
+
+/**
+ * kdbus_strnhash - calculate a hash
+ * @str:		String
+ * @len:		Length of @str
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_strnhash(const char *str, size_t len)
+{
+	unsigned long hash = init_name_hash();
+
+	while (len--)
+		hash = partial_name_hash(*str++, hash);
+
+	return end_name_hash(hash);
+}
+
+/**
+ * kdbus_str_valid - verify a string
+ * @str:		String to verify
+ * @size:		Size of buffer of string (including 0-byte)
+ *
+ * This verifies the string at position @str with size @size is properly
+ * zero-terminated and does not contain a 0-byte but at the end.
+ *
+ * Return: true if string is valid, false if not.
+ */
+static inline bool kdbus_str_valid(const char *str, size_t size)
+{
+	return size > 0 && memchr(str, '\0', size) == str + size - 1;
+}
+
+int kdbus_sysname_is_valid(const char *name);
+void kdbus_fput_files(struct file **files, unsigned int count);
+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
+			    kuid_t kuid);
+u32 kdbus_from_kuid_keep(kuid_t uid);
+u32 kdbus_from_kgid_keep(kgid_t gid);
+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
+
+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
+void *kdbus_memdup_user(void __user *user_ptr, size_t sz_min, size_t sz_max);
+
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+				off_t offset_out, u64 valid);
+
+struct kvec;
+
+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
+
+#define kdbus_negotiate_flags(_s, _b, _t, _v)				\
+	kdbus_check_and_write_flags((_s)->flags, _b,			\
+				    offsetof(_t, kernel_flags), _v)
+
+#endif
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 04/13] kdbus: add connection pool implementation
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

A pool for data received from the kernel is installed for every
connection of the bus, and it is used to copy data from the kernel to
userspace clients, for messages and other information.

It is accessed when one of the following ioctls is issued:

  * KDBUS_CMD_MSG_RECV, to receive a message
  * KDBUS_CMD_NAME_LIST, to dump the name registry
  * KDBUS_CMD_CONN_INFO, to retrieve information on a connection

The offsets returned by either one of the aforementioned ioctls
describe offsets inside the pool. Internally, the pool is organized in
slices, that are dynamically allocated on demand. The overall size of
the pool is chosen by the connection when it connects to the bus with
KDBUS_CMD_HELLO.

In order to make the slice available for subsequent calls,
KDBUS_CMD_FREE has to be called on the offset.

To access the memory, the caller is expected to mmap() it to its task.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/pool.c | 784 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/pool.h |  47 ++++
 2 files changed, 831 insertions(+)
 create mode 100644 ipc/kdbus/pool.c
 create mode 100644 ipc/kdbus/pool.h

diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
new file mode 100644
index 000000000000..2bdb645518ea
--- /dev/null
+++ b/ipc/kdbus/pool.c
@@ -0,0 +1,784 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/aio.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "pool.h"
+#include "util.h"
+
+/**
+ * struct kdbus_pool - the receiver's buffer
+ * @f:			The backing shmem file
+ * @size:		The size of the file
+ * @busy:		The currently used size
+ * @lock:		Pool data lock
+ * @slices:		All slices sorted by address
+ * @slices_busy:	Tree of allocated slices
+ * @slices_free:	Tree of free slices
+ *
+ * The receiver's buffer, managed as a pool of allocated and free
+ * slices containing the queued messages.
+ *
+ * Messages sent with KDBUS_CMD_SEND are copied direcly by the
+ * sending process into the receiver's pool.
+ *
+ * Messages received with KDBUS_CMD_RECV just return the offset
+ * to the data placed in the pool.
+ *
+ * The internally allocated memory needs to be returned by the receiver
+ * with KDBUS_CMD_FREE.
+ */
+struct kdbus_pool {
+	struct file *f;
+	size_t size;
+	size_t busy;
+	struct mutex lock;
+
+	struct list_head slices;
+	struct rb_root slices_busy;
+	struct rb_root slices_free;
+};
+
+/**
+ * struct kdbus_pool_slice - allocated element in kdbus_pool
+ * @pool:		Pool this slice belongs to
+ * @off:		Offset of slice in the shmem file
+ * @size:		Size of slice
+ * @entry:		Entry in "all slices" list
+ * @rb_node:		Entry in free or busy list
+ * @child:		Child slice
+ * @free:		Unused slice
+ * @ref_kernel:		Kernel holds a reference
+ * @ref_user:		Userspace holds a reference
+ *
+ * The pool has one or more slices, always spanning the entire size of the
+ * pool.
+ *
+ * Every slice is an element in a list sorted by the buffer address, to
+ * provide access to the next neighbor slice.
+ *
+ * Every slice is member in either the busy or the free tree. The free
+ * tree is organized by slice size, the busy tree organized by buffer
+ * offset.
+ */
+struct kdbus_pool_slice {
+	struct kdbus_pool *pool;
+	size_t off;
+	size_t size;
+
+	struct list_head entry;
+	struct rb_node rb_node;
+	struct kdbus_pool_slice *child;
+
+	bool free:1;
+	bool ref_kernel:1;
+	bool ref_user:1;
+};
+
+static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
+						     size_t off, size_t size)
+{
+	struct kdbus_pool_slice *slice;
+
+	slice = kzalloc(sizeof(*slice), GFP_KERNEL);
+	if (!slice)
+		return NULL;
+
+	slice->pool = pool;
+	slice->off = off;
+	slice->size = size;
+	slice->free = true;
+	return slice;
+}
+
+/* insert a slice into the free tree */
+static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
+				      struct kdbus_pool_slice *slice)
+{
+	struct rb_node **n;
+	struct rb_node *pn = NULL;
+
+	n = &pool->slices_free.rb_node;
+	while (*n) {
+		struct kdbus_pool_slice *pslice;
+
+		pn = *n;
+		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+		if (slice->size < pslice->size)
+			n = &pn->rb_left;
+		else
+			n = &pn->rb_right;
+	}
+
+	rb_link_node(&slice->rb_node, pn, n);
+	rb_insert_color(&slice->rb_node, &pool->slices_free);
+}
+
+/* insert a slice into the busy tree */
+static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
+				      struct kdbus_pool_slice *slice)
+{
+	struct rb_node **n;
+	struct rb_node *pn = NULL;
+
+	n = &pool->slices_busy.rb_node;
+	while (*n) {
+		struct kdbus_pool_slice *pslice;
+
+		pn = *n;
+		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+		if (slice->off < pslice->off)
+			n = &pn->rb_left;
+		else if (slice->off > pslice->off)
+			n = &pn->rb_right;
+		else
+			BUG();
+	}
+
+	rb_link_node(&slice->rb_node, pn, n);
+	rb_insert_color(&slice->rb_node, &pool->slices_busy);
+}
+
+static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
+						      size_t off)
+{
+	struct rb_node *n;
+
+	n = pool->slices_busy.rb_node;
+	while (n) {
+		struct kdbus_pool_slice *s;
+
+		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+		if (off < s->off)
+			n = n->rb_left;
+		else if (off > s->off)
+			n = n->rb_right;
+		else
+			return s;
+	}
+
+	return NULL;
+}
+
+/**
+ * kdbus_pool_slice_alloc() - allocate memory from a pool
+ * @pool:	The receiver's pool
+ * @size:	The number of bytes to allocate
+ * @kvec:	kvec to copy into the new slice, may be %NULL
+ * @iovec:	iovec to copy into the new slice, may be %NULL
+ * @vec_count:	Number of elements in @kvec or @iovec
+ *
+ * The returned slice is used for kdbus_pool_slice_release() to
+ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
+ * will be copied from kernel or userspace memory into the new slice at
+ * offset 0.
+ *
+ * Return: the allocated slice on success, ERR_PTR on failure.
+ */
+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+						size_t size,
+						struct kvec *kvec,
+						struct iovec *iovec,
+						size_t vec_count)
+{
+	size_t slice_size = KDBUS_ALIGN8(size);
+	struct rb_node *n, *found = NULL;
+	struct kdbus_pool_slice *s;
+	int ret = 0;
+
+	if (WARN_ON(kvec && iovec))
+		return ERR_PTR(-EINVAL);
+
+	/* search a free slice with the closest matching size */
+	mutex_lock(&pool->lock);
+	n = pool->slices_free.rb_node;
+	while (n) {
+		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+		if (slice_size < s->size) {
+			found = n;
+			n = n->rb_left;
+		} else if (slice_size > s->size) {
+			n = n->rb_right;
+		} else {
+			found = n;
+			break;
+		}
+	}
+
+	/* no slice with the minimum size found in the pool */
+	if (!found) {
+		ret = -ENOBUFS;
+		goto exit_unlock;
+	}
+
+	/* no exact match, use the closest one */
+	if (!n) {
+		struct kdbus_pool_slice *s_new;
+
+		s = rb_entry(found, struct kdbus_pool_slice, rb_node);
+
+		/* split-off the remainder of the size to its own slice */
+		s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
+					     s->size - slice_size);
+		if (!s_new) {
+			ret = -ENOMEM;
+			goto exit_unlock;
+		}
+
+		list_add(&s_new->entry, &s->entry);
+		kdbus_pool_add_free_slice(pool, s_new);
+
+		/* adjust our size now that we split-off another slice */
+		s->size = slice_size;
+	}
+
+	/* move slice from free to the busy tree */
+	rb_erase(found, &pool->slices_free);
+	kdbus_pool_add_busy_slice(pool, s);
+
+	WARN_ON(s->ref_kernel || s->ref_user);
+	WARN_ON(s->child);
+
+	s->ref_kernel = true;
+	s->free = false;
+	pool->busy += s->size;
+	mutex_unlock(&pool->lock);
+
+	if (kvec)
+		ret = kdbus_pool_slice_copy_kvec(s, 0, kvec, vec_count, size);
+
+	if (iovec)
+		ret = kdbus_pool_slice_copy_iovec(s, 0, iovec, vec_count, size);
+
+	if (ret < 0) {
+		kdbus_pool_slice_release(s);
+		return ERR_PTR(ret);
+	}
+
+	return s;
+
+exit_unlock:
+	mutex_unlock(&pool->lock);
+	return ERR_PTR(ret);
+}
+
+static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
+{
+	struct kdbus_pool *pool = slice->pool;
+	struct kdbus_pool_slice *child = slice->child;
+
+	/* don't free the slice if either has a reference */
+	if (slice->ref_kernel || slice->ref_user)
+		return;
+
+	if (WARN_ON(slice->free))
+		return;
+
+	rb_erase(&slice->rb_node, &pool->slices_busy);
+	pool->busy -= slice->size;
+
+	/* merge with the next free slice */
+	if (!list_is_last(&slice->entry, &pool->slices)) {
+		struct kdbus_pool_slice *s;
+
+		s = list_entry(slice->entry.next,
+			       struct kdbus_pool_slice, entry);
+		if (s->free) {
+			rb_erase(&s->rb_node, &pool->slices_free);
+			list_del(&s->entry);
+			slice->size += s->size;
+			kfree(s);
+		}
+	}
+
+	/* merge with previous free slice */
+	if (pool->slices.next != &slice->entry) {
+		struct kdbus_pool_slice *s;
+
+		s = list_entry(slice->entry.prev,
+			       struct kdbus_pool_slice, entry);
+		if (s->free) {
+			rb_erase(&s->rb_node, &pool->slices_free);
+			list_del(&slice->entry);
+			s->size += slice->size;
+			kfree(slice);
+			slice = s;
+		}
+	}
+
+	slice->free = true;
+	slice->child = NULL;
+	kdbus_pool_add_free_slice(pool, slice);
+
+	if (child) {
+		/* Only allow one level of recursion */
+		WARN_ON(child->child);
+		WARN_ON(!child->ref_kernel);
+		child->ref_kernel = false;
+		__kdbus_pool_slice_release(child);
+	}
+}
+
+/**
+ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
+ * @slice:		Slice allocated from the the pool
+ *
+ * This releases the kernel-reference on the given slice. If the
+ * kernel-reference and the user-reference on a slice are dropped, the slice is
+ * returned to the pool.
+ *
+ * So far, we do not implement full ref-counting on slices. Each, kernel and
+ * user-space can have exactly one reference to a slice. If both are dropped at
+ * the same time, the slice is released.
+ */
+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
+{
+	struct kdbus_pool *pool;
+
+	if (!slice)
+		return;
+
+	/* @slice may be freed, so keep local ptr to @pool */
+	pool = slice->pool;
+
+	mutex_lock(&pool->lock);
+	/* kernel must own a ref to @slice to drop it */
+	WARN_ON(!slice->ref_kernel);
+	slice->ref_kernel = false;
+	__kdbus_pool_slice_release(slice);
+	mutex_unlock(&pool->lock);
+}
+
+/**
+ * kdbus_pool_release_offset() - release a public offset
+ * @pool:		pool to operate on
+ * @off:		offset to release
+ *
+ * This should be called whenever user-space frees a slice given to them. It
+ * verifies the slice is available and public, and then drops it. It ensures
+ * correct locking and barriers against queues.
+ *
+ * Return: 0 on success, ENXIO if the offset is invalid or not public.
+ */
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
+{
+	struct kdbus_pool_slice *slice;
+	int ret = 0;
+
+	mutex_lock(&pool->lock);
+	slice = kdbus_pool_find_slice(pool, off);
+	if (slice && slice->ref_user) {
+		slice->ref_user = false;
+		__kdbus_pool_slice_release(slice);
+	} else {
+		ret = -ENXIO;
+	}
+	mutex_unlock(&pool->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_pool_slice_publish() - publish slice to user-space
+ * @slice:		The slice
+ * @out_offset:		Output storage for offset, or NULL
+ * @out_size:		Output storage for size, or NULL
+ *
+ * This prepares a slice to be published to user-space.
+ *
+ * This call combines the following operations:
+ *   * the memory region is flushed so the user's memory view is consistent
+ *   * the slice is marked as referenced by user-space, so user-space has to
+ *     call KDBUS_CMD_FREE to release it
+ *   * the offset and size of the slice are written to the given output
+ *     arguments, if non-NULL
+ */
+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
+			      u64 *out_offset, u64 *out_size)
+{
+	mutex_lock(&slice->pool->lock);
+	/* kernel must own a ref to @slice to gain a user-space ref */
+	WARN_ON(!slice->ref_kernel);
+	slice->ref_user = true;
+	mutex_unlock(&slice->pool->lock);
+
+	if (out_offset)
+		*out_offset = slice->off;
+	if (out_size)
+		*out_size = slice->size;
+}
+
+/**
+ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
+ * @slice:	Slice to return the offset of
+ *
+ * Return: The internal offset @slice inside the pool.
+ */
+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
+{
+	return slice->off;
+}
+
+/**
+ * kdbus_pool_slice_set_child() - Set child of a slice
+ * @slice:	Slice to add a child to
+ * @child:	Child to set, may be %NULL
+ *
+ * Set @child as child of @slice, so it will be freed automatically when
+ * @slice goes away.
+ */
+void kdbus_pool_slice_set_child(struct kdbus_pool_slice *slice,
+				struct kdbus_pool_slice *child)
+{
+	if (WARN_ON(child && slice->pool != child->pool))
+		return;
+
+	WARN_ON(slice->child);
+	slice->child = child;
+}
+
+/**
+ * kdbus_pool_new() - create a new pool
+ * @name:		Name of the (deleted) file which shows up in
+ *			/proc, used for debugging
+ * @size:		Maximum size of the pool
+ *
+ * Return: a new kdbus_pool on success, ERR_PTR on failure.
+ */
+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
+{
+	struct kdbus_pool_slice *s;
+	struct kdbus_pool *p;
+	struct file *f;
+	char *n = NULL;
+	int ret;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return ERR_PTR(-ENOMEM);
+
+	if (name) {
+		n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
+		if (!n) {
+			ret = -ENOMEM;
+			goto exit_free;
+		}
+	}
+
+	f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, VM_NORESERVE);
+	kfree(n);
+
+	if (IS_ERR(f)) {
+		ret = PTR_ERR(f);
+		goto exit_free;
+	}
+
+	ret = get_write_access(file_inode(f));
+	if (ret < 0)
+		goto exit_put_shmem;
+
+	/* allocate first slice spanning the entire pool */
+	s = kdbus_pool_slice_new(p, 0, size);
+	if (!s) {
+		ret = -ENOMEM;
+		goto exit_put_write;
+	}
+
+	p->f = f;
+	p->size = size;
+	p->busy = 0;
+	p->slices_free = RB_ROOT;
+	p->slices_busy = RB_ROOT;
+	mutex_init(&p->lock);
+
+	INIT_LIST_HEAD(&p->slices);
+	list_add(&s->entry, &p->slices);
+
+	kdbus_pool_add_free_slice(p, s);
+	return p;
+
+exit_put_write:
+	put_write_access(file_inode(f));
+exit_put_shmem:
+	fput(f);
+exit_free:
+	kfree(p);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_pool_free() - destroy pool
+ * @pool:		The receiver's pool
+ */
+void kdbus_pool_free(struct kdbus_pool *pool)
+{
+	struct kdbus_pool_slice *s, *tmp;
+
+	if (!pool)
+		return;
+
+	list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
+		list_del(&s->entry);
+		kfree(s);
+	}
+
+	put_write_access(file_inode(pool->f));
+	fput(pool->f);
+	kfree(pool);
+}
+
+/**
+ * kdbus_pool_remain() - the number of free bytes in the pool
+ * @pool:		The receiver's pool
+ *
+ * Return: the number of unallocated bytes in the pool
+ */
+size_t kdbus_pool_remain(struct kdbus_pool *pool)
+{
+	size_t size;
+
+	mutex_lock(&pool->lock);
+	size = pool->size - pool->busy;
+	mutex_unlock(&pool->lock);
+
+	return size;
+}
+
+/**
+ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
+ * @slice:		The slice to write to
+ * @off:		Offset in the slice to write to
+ * @iov:		iovec array, pointing to data to copy
+ * @iov_len:		Number of elements in @iov
+ * @total_len:		Total number of bytes described in members of @iov
+ *
+ * User memory referenced by @iov will be copied into @slice at offset @off.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t
+kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, size_t off,
+			    struct iovec *iov, size_t iov_len, size_t total_len)
+{
+	struct file *f = slice->pool->f;
+	struct iov_iter iter;
+	struct kiocb kiocb;
+	ssize_t len;
+
+	BUG_ON(off + total_len > slice->size);
+
+	init_sync_kiocb(&kiocb, f);
+	kiocb.ki_pos = slice->off + off;
+	kiocb.ki_nbytes = total_len;
+	iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
+
+	len = f->f_op->write_iter(&kiocb, &iter);
+	if (len < 0)
+		return len;
+
+	if (len != total_len)
+		return -EFAULT;
+
+	return len;
+}
+
+/**
+ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
+ * @slice:		The slice to write to
+ * @off:		Offset in the slice to write to
+ * @kvec:		kvec array, pointing to data to copy
+ * @kvec_len:		Number of elements in @kvec
+ * @total_len:		Total number of bytes described in members of @kvec
+ *
+ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
+				   size_t off, struct kvec *kvec,
+				   size_t kvec_len, size_t total_len)
+{
+	struct file *f = slice->pool->f;
+	struct iov_iter iter;
+	mm_segment_t old_fs;
+	struct kiocb kiocb;
+	ssize_t len;
+
+	BUG_ON(off + total_len > slice->size);
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+
+	init_sync_kiocb(&kiocb, f);
+	kiocb.ki_pos = slice->off + off;
+	kiocb.ki_nbytes = total_len;
+	iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
+	len = f->f_op->write_iter(&kiocb, &iter);
+	set_fs(old_fs);
+
+	if (len < 0)
+		return len;
+
+	if (len != total_len)
+		return -EFAULT;
+
+	return len;
+}
+
+static int kdbus_pool_copy(const struct kdbus_pool_slice *slice_dst,
+			   const struct kdbus_pool_slice *slice_src)
+{
+	struct file *f_src = slice_src->pool->f;
+	struct file *f_dst = slice_dst->pool->f;
+	struct inode *i_dst = file_inode(f_dst);
+	struct address_space *mapping_dst = f_dst->f_mapping;
+	const struct address_space_operations *aops = mapping_dst->a_ops;
+	unsigned long len = slice_src->size;
+	loff_t off_src = slice_src->off;
+	loff_t off_dst = slice_dst->off;
+	int ret = 0;
+
+	BUG_ON(slice_src->size != slice_dst->size);
+	BUG_ON(slice_src->free || slice_dst->free);
+
+	mutex_lock(&i_dst->i_mutex);
+
+	while (len > 0) {
+		unsigned long page_off;
+		unsigned long copy_len;
+		char __user *kaddr;
+		struct page *page;
+		ssize_t n_read;
+		void *fsdata;
+		long status;
+
+		page_off = off_dst & (PAGE_CACHE_SIZE - 1);
+		copy_len = min_t(unsigned long,
+				 PAGE_CACHE_SIZE - page_off, len);
+
+		status = aops->write_begin(f_dst, mapping_dst, off_dst,
+					   copy_len, 0, &page, &fsdata);
+		if (unlikely(status < 0)) {
+			ret = status;
+			break;
+		}
+
+		kaddr = (char __force __user *)kmap(page) + page_off;
+		n_read = f_src->f_op->read(f_src, kaddr, copy_len, &off_src);
+		kunmap(page);
+		mark_page_accessed(page);
+		flush_dcache_page(page);
+
+		if (unlikely(n_read != copy_len)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		status = aops->write_end(f_dst, mapping_dst, off_dst,
+					 copy_len, copy_len, page, fsdata);
+		if (unlikely(status != copy_len)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		off_dst += copy_len;
+		len -= copy_len;
+	}
+
+	mutex_unlock(&i_dst->i_mutex);
+
+	return ret;
+}
+
+/**
+ * kdbus_pool_slice_move() - move memory from one pool into another one
+ * @pool_dst:		The receiver's pool to copy to
+ * @slice:		Reference to the slice to copy from the source;
+ *			updated with the newly allocated slice in the
+ *			destination
+ *
+ * Move memory from one pool to another. A slice will be allocated in the
+ * destination pool, the original memory from the existing slice is copied
+ * over, and the existing slice will be released.
+ *
+ * Return: 0 on success, negative error number on failure.
+ */
+int kdbus_pool_slice_move(struct kdbus_pool *pool_dst,
+			  struct kdbus_pool_slice **slice)
+{
+	mm_segment_t old_fs;
+	struct kdbus_pool_slice *slice_new;
+	int ret;
+
+	slice_new = kdbus_pool_slice_alloc(pool_dst, (*slice)->size,
+					   NULL, NULL, 0);
+	if (IS_ERR(slice_new))
+		return PTR_ERR(slice_new);
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	ret = kdbus_pool_copy(slice_new, *slice);
+	set_fs(old_fs);
+	if (ret < 0)
+		goto exit_free;
+
+	kdbus_pool_slice_release(*slice);
+
+	*slice = slice_new;
+	return 0;
+
+exit_free:
+	kdbus_pool_slice_release(slice_new);
+	return ret;
+}
+
+/**
+ * kdbus_pool_mmap() -  map the pool into the process
+ * @pool:		The receiver's pool
+ * @vma:		passed by mmap() syscall
+ *
+ * Return: the result of the mmap() call, negative errno on failure.
+ */
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
+{
+	/* deny write access to the pool */
+	if (vma->vm_flags & VM_WRITE)
+		return -EPERM;
+	vma->vm_flags &= ~VM_MAYWRITE;
+
+	/* do not allow to map more than the size of the file */
+	if ((vma->vm_end - vma->vm_start) > pool->size)
+		return -EFAULT;
+
+	/* replace the connection file with our shmem file */
+	if (vma->vm_file)
+		fput(vma->vm_file);
+	vma->vm_file = get_file(pool->f);
+
+	return pool->f->f_op->mmap(pool->f, vma);
+}
diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
new file mode 100644
index 000000000000..7c6fce2241de
--- /dev/null
+++ b/ipc/kdbus/pool.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POOL_H
+#define __KDBUS_POOL_H
+
+struct kdbus_pool;
+struct kdbus_pool_slice;
+
+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
+void kdbus_pool_free(struct kdbus_pool *pool);
+size_t kdbus_pool_remain(struct kdbus_pool *pool);
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
+
+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+						size_t size,
+						struct kvec *kvec,
+						struct iovec *iovec,
+						size_t vec_count);
+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
+			      u64 *out_offset, u64 *out_size);
+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
+void kdbus_pool_slice_set_child(struct kdbus_pool_slice *slice,
+				struct kdbus_pool_slice *child);
+int kdbus_pool_slice_move(struct kdbus_pool *pool_dst,
+			  struct kdbus_pool_slice **slice);
+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
+				   size_t off, struct kvec *kvec,
+				   size_t kvec_count, size_t total_len);
+ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
+				    size_t off, struct iovec *iov,
+				    size_t iov_count, size_t total_len);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 05/13] kdbus: add connection, queue handling and message validation code
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.

Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/connection.c | 2004 ++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/connection.h |  262 +++++++
 ipc/kdbus/item.c       |  309 ++++++++
 ipc/kdbus/item.h       |   57 ++
 ipc/kdbus/message.c    |  598 +++++++++++++++
 ipc/kdbus/message.h    |  133 ++++
 ipc/kdbus/queue.c      |  505 ++++++++++++
 ipc/kdbus/queue.h      |  108 +++
 ipc/kdbus/reply.c      |  262 +++++++
 ipc/kdbus/reply.h      |   68 ++
 ipc/kdbus/util.h       |    2 +-
 11 files changed, 4307 insertions(+), 1 deletion(-)
 create mode 100644 ipc/kdbus/connection.c
 create mode 100644 ipc/kdbus/connection.h
 create mode 100644 ipc/kdbus/item.c
 create mode 100644 ipc/kdbus/item.h
 create mode 100644 ipc/kdbus/message.c
 create mode 100644 ipc/kdbus/message.h
 create mode 100644 ipc/kdbus/queue.c
 create mode 100644 ipc/kdbus/queue.h
 create mode 100644 ipc/kdbus/reply.c
 create mode 100644 ipc/kdbus/reply.h

diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
new file mode 100644
index 000000000000..75e2ea161a0e
--- /dev/null
+++ b/ipc/kdbus/connection.c
@@ -0,0 +1,2004 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fs_struct.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/path.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "domain.h"
+#include "item.h"
+#include "notify.h"
+#include "policy.h"
+#include "pool.h"
+#include "reply.h"
+#include "util.h"
+#include "queue.h"
+
+#define KDBUS_CONN_ACTIVE_BIAS	(INT_MIN + 2)
+#define KDBUS_CONN_ACTIVE_NEW	(INT_MIN + 1)
+
+/*
+ * Check for maximum number of messages per individual user. This
+ * should prevent a single user from being able to fill the receiver's
+ * queue.
+ */
+static int kdbus_conn_queue_user_quota(const struct kdbus_conn *conn_src,
+				       struct kdbus_conn *conn_dst,
+				       struct kdbus_queue_entry *entry)
+{
+	struct kdbus_domain_user *user;
+
+	/*
+	 * When the kernel is the sender we do not do per user
+	 * accouting, instead we just count how many messages have
+	 * been queued and we check the quota limit when inserting
+	 * message into the receiver queue.
+	 */
+	if (!conn_src)
+		return 0;
+
+	/*
+	 * Per-user accounting can be expensive if we have many different
+	 * users on the bus. Allow one set of messages to pass through
+	 * un-accounted. Only once we hit that limit, we start accounting.
+	 */
+	if (conn_dst->queue.msg_count < KDBUS_CONN_MAX_MSGS_UNACCOUNTED)
+		return 0;
+
+	user = conn_src->user;
+
+	/* extend array to store the user message counters */
+	if (user->idr >= conn_dst->msg_users_max) {
+		unsigned int *users;
+		unsigned int i;
+
+		i = 8 + KDBUS_ALIGN8(user->idr);
+		users = krealloc(conn_dst->msg_users, i * sizeof(unsigned int),
+				 GFP_KERNEL | __GFP_ZERO);
+		if (!users)
+			return -ENOMEM;
+
+		conn_dst->msg_users = users;
+		conn_dst->msg_users_max = i;
+	}
+
+	if (conn_dst->msg_users[user->idr] >= KDBUS_CONN_MAX_MSGS_PER_USER)
+		return -ENOBUFS;
+
+	conn_dst->msg_users[user->idr]++;
+	entry->user = kdbus_domain_user_ref(user);
+	return 0;
+}
+
+/**
+ * kdbus_cmd_msg_recv() - receive a message from the queue
+ * @conn:		Connection to work on
+ * @recv:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+		       struct kdbus_cmd_recv *recv)
+{
+	bool install = !(recv->flags & KDBUS_RECV_PEEK);
+	struct kdbus_queue_entry *entry = NULL;
+	unsigned int lost_count;
+	int ret = 0;
+
+	if (recv->msg.offset > 0)
+		return -EINVAL;
+
+	mutex_lock(&conn->lock);
+	entry = kdbus_queue_entry_peek(&conn->queue, recv->priority,
+				       recv->flags & KDBUS_RECV_USE_PRIORITY);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto exit_unlock;
+	}
+
+	/*
+	 * Make sure to never install fds into a connection that has
+	 * refused to receive any.
+	 */
+	if (WARN_ON(!(conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
+	    entry->msg_res && entry->msg_res->fds_count > 0)) {
+		ret = -EINVAL;
+		goto exit_unlock;
+	}
+
+	/* just drop the message */
+	if (recv->flags & KDBUS_RECV_DROP) {
+		struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
+
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+
+		mutex_unlock(&conn->lock);
+
+		if (reply) {
+			/*
+			 * See if the reply object is still linked in
+			 * reply_dst, and kill it. Notify the waiting peer
+			 * that there won't be an answer (-EPIPE).
+			 */
+			mutex_lock(&reply->reply_dst->lock);
+			if (!list_empty(&reply->entry)) {
+				kdbus_reply_unlink(reply);
+				if (reply->sync)
+					kdbus_sync_reply_wakeup(reply, -EPIPE);
+				else
+					kdbus_notify_reply_dead(conn->ep->bus,
+							entry->msg.src_id,
+							entry->msg.cookie);
+			}
+			mutex_unlock(&reply->reply_dst->lock);
+		}
+
+		kdbus_notify_flush(conn->ep->bus);
+		kdbus_queue_entry_free(entry);
+		kdbus_reply_unref(reply);
+
+		return 0;
+	}
+
+	/*
+	 * If there have been lost broadcast messages, report the number
+	 * in the overloaded recv->dropped_msgs field and return -EOVERFLOW.
+	 */
+	lost_count = atomic_read(&conn->lost_count);
+	if (lost_count) {
+		recv->dropped_msgs = lost_count;
+		atomic_sub(lost_count, &conn->lost_count);
+		ret = -EOVERFLOW;
+		goto exit_unlock;
+	}
+
+	/*
+	 * PEEK just returns the location of the next message. Do not install
+	 * file descriptors or anything else. This is usually used to
+	 * determine the sender of the next queued message.
+	 *
+	 * File descriptor numbers referenced in the message items
+	 * are undefined, they are only valid with the full receive
+	 * not with peek.
+	 *
+	 * Only if no PEEK is specified, the FDs are installed and the message
+	 * is dropped from internal queues.
+	 */
+	ret = kdbus_queue_entry_install(entry, conn, &recv->msg.return_flags,
+					install);
+	if (ret < 0)
+		goto exit_unlock;
+
+	/* Give the offset+size back to the caller. */
+	kdbus_pool_slice_publish(entry->slice, &recv->msg.offset,
+				 &recv->msg.msg_size);
+
+	if (install) {
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+
+exit_unlock:
+	mutex_unlock(&conn->lock);
+	kdbus_notify_flush(conn->ep->bus);
+	return ret;
+}
+
+static int kdbus_conn_check_access(struct kdbus_conn *conn_src,
+				   const struct cred *conn_src_creds,
+				   struct kdbus_conn *conn_dst,
+				   const struct kdbus_msg *msg,
+				   struct kdbus_reply **reply_wake)
+{
+	/*
+	 * If the message is a reply, its cookie_reply field must match any
+	 * of the connection's expected replies. Otherwise, access to send the
+	 * message will be denied.
+	 */
+	if (reply_wake && msg->cookie_reply > 0) {
+		struct kdbus_reply *r;
+
+		/*
+		 * The connection that we are replying to has not
+		 * issued any request or perhaps we have already
+		 * replied, in anycase the supplied cookie_reply is
+		 * no more valid, so fail.
+		 */
+		if (atomic_read(&conn_dst->request_count) == 0)
+			return -EPERM;
+
+		mutex_lock(&conn_dst->lock);
+		r = kdbus_reply_find(conn_src, conn_dst, msg->cookie_reply);
+		if (r) {
+			if (r->sync)
+				*reply_wake = kdbus_reply_ref(r);
+			kdbus_reply_unlink(r);
+		}
+		mutex_unlock(&conn_dst->lock);
+
+		return r ? 0 : -EPERM;
+	}
+
+	/* ... otherwise, ask the policy DBs for permission */
+	if (!kdbus_conn_policy_talk(conn_src, conn_src_creds, conn_dst))
+		return -EPERM;
+
+	return 0;
+}
+
+/* Callers should take the conn_dst lock */
+static struct kdbus_queue_entry *
+kdbus_conn_entry_make(struct kdbus_conn *conn_dst,
+		      const struct kdbus_kmsg *kmsg)
+{
+	/* The remote connection was disconnected */
+	if (!kdbus_conn_active(conn_dst))
+		return ERR_PTR(-ECONNRESET);
+
+	/* The connection does not accept file descriptors */
+	if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
+	    kmsg->res && kmsg->res->fds_count > 0)
+		return ERR_PTR(-ECOMM);
+
+	return kdbus_queue_entry_alloc(conn_dst->pool, kmsg);
+}
+
+/*
+ * Synchronously responding to a message, allocate a queue entry
+ * and attach it to the reply tracking object.
+ * The connection's queue will never get to see it.
+ */
+static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
+					const struct kdbus_kmsg *kmsg,
+					struct kdbus_reply *reply_wake)
+{
+	struct kdbus_queue_entry *entry;
+	int remote_ret;
+	int ret = 0;
+
+	mutex_lock(&reply_wake->reply_dst->lock);
+
+	/*
+	 * If we are still waiting then proceed, allocate a queue
+	 * entry and attach it to the reply object
+	 */
+	if (reply_wake->waiting) {
+		entry = kdbus_conn_entry_make(conn_dst, kmsg);
+		if (IS_ERR(entry))
+			ret = PTR_ERR(entry);
+		else
+			/* Attach the entry to the reply object */
+			reply_wake->queue_entry = entry;
+	} else {
+		ret = -ECONNRESET;
+	}
+
+	/*
+	 * Update the reply object and wake up remote peer only
+	 * on appropriate return codes
+	 *
+	 * * -ECOMM: if the replying connection failed with -ECOMM
+	 *           then wakeup remote peer with -EREMOTEIO
+	 *
+	 *           We do this to differenciate between -ECOMM errors
+	 *           from the original sender perspective:
+	 *           -ECOMM error during the sync send and
+	 *           -ECOMM error during the sync reply, this last
+	 *           one is rewritten to -EREMOTEIO
+	 *
+	 * * Wake up on all other return codes.
+	 */
+	remote_ret = ret;
+
+	if (ret == -ECOMM)
+		remote_ret = -EREMOTEIO;
+
+	kdbus_sync_reply_wakeup(reply_wake, remote_ret);
+	kdbus_reply_unlink(reply_wake);
+	mutex_unlock(&reply_wake->reply_dst->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
+ * @conn_src:		The sending connection
+ * @conn_dst:		The connection to queue into
+ * @kmsg:		The kmsg to queue
+ * @reply:		The reply tracker to attach to the queue entry
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
+			    struct kdbus_conn *conn_dst,
+			    const struct kdbus_kmsg *kmsg,
+			    struct kdbus_reply *reply)
+{
+	struct kdbus_queue_entry *entry;
+	int ret;
+
+	kdbus_conn_lock2(conn_src, conn_dst);
+
+	/*
+	 * Limit the maximum number of queued messages. This applies
+	 * to all messages, user messages and kernel notifications
+	 *
+	 * The kernel sends notifications to subscribed connections
+	 * only. If the connection do not clean its queue, no further
+	 * message delivery.
+	 * Kernel is able to queue KDBUS_CONN_MAX_MSGS messages, this
+	 * includes all type of notifications.
+	 */
+	if (conn_dst->queue.msg_count >= KDBUS_CONN_MAX_MSGS) {
+		ret = -ENOBUFS;
+		goto exit_unlock;
+	}
+
+	entry = kdbus_conn_entry_make(conn_dst, kmsg);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto exit_unlock;
+	}
+
+	/* limit the number of queued messages from the same individual user */
+	ret = kdbus_conn_queue_user_quota(conn_src, conn_dst, entry);
+	if (ret < 0)
+		goto exit_queue_free;
+
+	/*
+	 * Remember the reply associated with this queue entry, so we can
+	 * move the reply entry's connection when a connection moves from an
+	 * activator to an implementer.
+	 */
+	entry->reply = kdbus_reply_ref(reply);
+
+	if (reply) {
+		kdbus_reply_link(reply);
+		if (!reply->sync)
+			schedule_delayed_work(&conn_src->work, 0);
+	}
+
+	/* link the message into the receiver's entry */
+	kdbus_queue_entry_add(&conn_dst->queue, entry);
+
+	/* wake up poll() */
+	wake_up_interruptible(&conn_dst->wait);
+
+	ret = 0;
+	goto exit_unlock;
+
+exit_queue_free:
+	kdbus_queue_entry_free(entry);
+exit_unlock:
+	kdbus_conn_unlock2(conn_src, conn_dst);
+	return ret;
+}
+
+/**
+ * kdbus_conn_wait_reply() - Wait for the reply of a synchronous send
+ *			     operation
+ * @conn_src:		The sending connection (origin)
+ * @conn_dst:		The replying connection
+ * @cmd_send:		Payload of SEND command
+ * @ioctl_file:		struct file used to issue this ioctl
+ * @cancel_fd:		Pinned file that reflects KDBUS_ITEM_CANCEL_FD
+ *			item, used to cancel the blocking send call
+ * @reply_wait:		The tracked reply that we are waiting for.
+ * @expire:		Reply timeout
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
+				 struct kdbus_conn *conn_dst,
+				 struct kdbus_cmd_send *cmd_send,
+				 struct file *ioctl_file,
+				 struct file *cancel_fd,
+				 struct kdbus_reply *reply_wait,
+				 ktime_t expire)
+{
+	struct kdbus_queue_entry *entry;
+	struct poll_wqueues pwq = {};
+	int ret;
+
+	if (WARN_ON(!reply_wait))
+		return -EIO;
+
+	/*
+	 * Block until the reply arrives. reply_wait is left untouched
+	 * by the timeout scans that might be conducted for other,
+	 * asynchronous replies of conn_src.
+	 */
+
+	poll_initwait(&pwq);
+	poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
+
+	for (;;) {
+		/*
+		 * Any of the following conditions will stop our synchronously
+		 * blocking SEND command:
+		 *
+		 * a) The origin sender closed its connection
+		 * b) The remote peer answered, setting reply_wait->waiting = 0
+		 * c) The cancel FD was written to
+		 * d) A signal was received
+		 * e) The specified timeout was reached, and none of the above
+		 *    conditions kicked in.
+		 */
+
+		/*
+		 * We have already acquired an active reference when
+		 * entering here, but another thread may call
+		 * KDBUS_CMD_BYEBYE which does not acquire an active
+		 * reference, therefore kdbus_conn_disconnect() will
+		 * not wait for us.
+		 */
+		if (!kdbus_conn_active(conn_src)) {
+			ret = -ECONNRESET;
+			break;
+		}
+
+		/*
+		 * After the replying peer unset the waiting variable
+		 * it will wake up us.
+		 */
+		if (!reply_wait->waiting) {
+			ret = reply_wait->err;
+			break;
+		}
+
+		if (cancel_fd) {
+			unsigned int r;
+
+			r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
+			if (r & POLLIN) {
+				ret = -ECANCELED;
+				break;
+			}
+		}
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
+					   &expire, 0)) {
+			ret = -ETIMEDOUT;
+			break;
+		}
+
+		/*
+		 * Reset the poll worker func, so the waitqueues are not
+		 * added to the poll table again. We just reuse what we've
+		 * collected earlier for further iterations.
+		 */
+		init_poll_funcptr(&pwq.pt, NULL);
+	}
+
+	poll_freewait(&pwq);
+
+	if (ret == -EINTR) {
+		/*
+		 * Interrupted system call. Unref the reply object, and pass
+		 * the return value down the chain. Mark the reply as
+		 * interrupted, so the cleanup work can remove it, but do not
+		 * unlink it from the list. Once the syscall restarts, we'll
+		 * pick it up and wait on it again.
+		 */
+		mutex_lock(&conn_src->lock);
+		reply_wait->interrupted = true;
+		schedule_delayed_work(&conn_src->work, 0);
+		mutex_unlock(&conn_src->lock);
+
+		return -ERESTARTSYS;
+	}
+
+	mutex_lock(&conn_src->lock);
+	reply_wait->waiting = false;
+	entry = reply_wait->queue_entry;
+	if (entry) {
+		ret = kdbus_queue_entry_install(entry, conn_src,
+						&cmd_send->reply.return_flags,
+						true);
+		kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
+					 &cmd_send->reply.msg_size);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+	kdbus_reply_unlink(reply_wait);
+	mutex_unlock(&conn_src->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_msg_send() - send a message
+ * @conn_src:		Connection
+ * @cmd:		Payload of SEND command
+ * @ioctl_file:		struct file used to issue this ioctl
+ * @kmsg:		Message to send
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_cmd_msg_send(struct kdbus_conn *conn_src,
+		       struct kdbus_cmd_send *cmd,
+		       struct file *ioctl_file,
+		       struct kdbus_kmsg *kmsg)
+{
+	bool sync = cmd->flags & KDBUS_SEND_SYNC_REPLY;
+	struct kdbus_name_entry *name_entry = NULL;
+	struct kdbus_reply *reply_wait = NULL;
+	struct kdbus_reply *reply_wake = NULL;
+	struct kdbus_msg *msg = &kmsg->msg;
+	struct kdbus_conn *conn_dst = NULL;
+	struct kdbus_bus *bus = conn_src->ep->bus;
+	struct file *cancel_fd = NULL;
+	struct kdbus_item *item;
+	int ret = 0;
+
+	/* assign domain-global message sequence number */
+	if (WARN_ON(kmsg->seq > 0))
+		return -EINVAL;
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_CANCEL_FD:
+			/* install cancel_fd only if synchronous */
+			if (!sync)
+				break;
+
+			if (cancel_fd) {
+				ret = -EEXIST;
+				goto exit_put_cancelfd;
+			}
+
+			cancel_fd = fget(item->fds[0]);
+			if (IS_ERR(cancel_fd))
+				return PTR_ERR(cancel_fd);
+
+			if (!cancel_fd->f_op->poll) {
+				ret = -EINVAL;
+				goto exit_put_cancelfd;
+			}
+			break;
+
+		default:
+			ret = -EINVAL;
+			goto exit_put_cancelfd;
+		}
+	}
+
+	kmsg->seq = atomic64_inc_return(&bus->domain->msg_seq_last);
+
+	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
+		kdbus_bus_broadcast(bus, conn_src, kmsg);
+		goto exit_put_cancelfd;
+	}
+
+	if (kmsg->res && kmsg->res->dst_name) {
+		/*
+		 * Lock the destination name so it will not get dropped or
+		 * moved between activator/implementer while we try to queue a
+		 * message. We also rely on this to read-lock the entire
+		 * registry so kdbus_meta_conn_collect() will have a consistent
+		 * view of all acquired names on both connections.
+		 * If kdbus_name_lock() gets changed to a per-name lock, we
+		 * really need to read-lock the whole registry here.
+		 */
+		name_entry = kdbus_name_lock(bus->name_registry,
+					     kmsg->res->dst_name);
+		if (!name_entry) {
+			ret = -ESRCH;
+			goto exit_put_cancelfd;
+		}
+
+		/*
+		 * If both a name and a connection ID are given as destination
+		 * of a message, check that the currently owning connection of
+		 * the name matches the specified ID.
+		 * This way, we allow userspace to send the message to a
+		 * specific connection by ID only if the connection currently
+		 * owns the given name.
+		 */
+		if (msg->dst_id != KDBUS_DST_ID_NAME &&
+		    msg->dst_id != name_entry->conn->id) {
+			ret = -EREMCHG;
+			goto exit_name_unlock;
+		}
+
+		if (!name_entry->conn && name_entry->activator)
+			conn_dst = kdbus_conn_ref(name_entry->activator);
+		else
+			conn_dst = kdbus_conn_ref(name_entry->conn);
+
+		if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
+		    kdbus_conn_is_activator(conn_dst)) {
+			ret = -EADDRNOTAVAIL;
+			goto exit_unref;
+		}
+	} else {
+		/* unicast message to unique name */
+		conn_dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
+		if (!conn_dst)
+			return -ENXIO;
+
+		/*
+		 * Special-purpose connections are not allowed to be addressed
+		 * via their unique IDs.
+		 */
+		if (!kdbus_conn_is_ordinary(conn_dst)) {
+			ret = -ENXIO;
+			goto exit_unref;
+		}
+	}
+
+	/*
+	 * Record the sequence number of the registered name;
+	 * it will be passed on to the queue, in case messages
+	 * addressed to a name need to be moved from or to
+	 * activator connections of the same name.
+	 */
+	if (name_entry)
+		kmsg->dst_name_id = name_entry->name_id;
+
+	if (conn_src) {
+		u64 attach_flags;
+
+		/*
+		 * If we got here due to an interrupted system call, our reply
+		 * wait object is still queued on conn_dst, with the former
+		 * cookie. Look it up, and in case it exists, go dormant right
+		 * away again, and don't queue the message again.
+		 *
+		 * We also need to make sure that conn_src did really
+		 * issue a request or if the request did not get
+		 * canceled on the way before looking up any reply
+		 * object.
+		 */
+		if (sync && atomic_read(&conn_src->request_count) > 0) {
+			mutex_lock(&conn_src->lock);
+			reply_wait = kdbus_reply_find(conn_dst, conn_src,
+						      kmsg->msg.cookie);
+			if (reply_wait) {
+				if (reply_wait->interrupted) {
+					kdbus_reply_ref(reply_wait);
+					reply_wait->interrupted = false;
+				} else {
+					reply_wait = NULL;
+				}
+			}
+			mutex_unlock(&conn_src->lock);
+
+			if (reply_wait)
+				goto wait_sync;
+		}
+
+		/* Calculate attach flags of conn_src & conn_dst */
+		attach_flags = kdbus_meta_calc_attach_flags(conn_src, conn_dst);
+
+		/*
+		 * If this connection did not fake its metadata then
+		 * lets augment its metadata by the current valid
+		 * metadata
+		 */
+		if (!conn_src->faked_meta) {
+			ret = kdbus_meta_proc_collect(kmsg->proc_meta,
+						      attach_flags);
+			if (ret < 0)
+				goto exit_unref;
+		}
+
+		/*
+		 * If requested, then we always send the current
+		 * description and owned names of source connection
+		 */
+		ret = kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+					      attach_flags);
+		if (ret < 0)
+			goto exit_unref;
+
+		if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
+			ret = kdbus_conn_check_access(conn_src, current_cred(),
+						      conn_dst, msg, NULL);
+			if (ret < 0)
+				goto exit_unref;
+
+			reply_wait = kdbus_reply_new(conn_dst, conn_src, msg,
+						     name_entry, sync);
+			if (IS_ERR(reply_wait)) {
+				ret = PTR_ERR(reply_wait);
+				reply_wait = NULL;
+				goto exit_unref;
+			}
+		} else if (msg->flags & KDBUS_MSG_SIGNAL) {
+			if (!kdbus_match_db_match_kmsg(conn_dst->match_db,
+						       conn_src, kmsg)) {
+				ret = -EPERM;
+				goto exit_unref;
+			}
+
+			/*
+			 * A receiver needs TALK access to the sender
+			 * in order to receive signals.
+			 */
+			ret = kdbus_conn_check_access(conn_dst, NULL, conn_src,
+						      msg, NULL);
+			if (ret < 0)
+				goto exit_unref;
+		} else {
+			ret = kdbus_conn_check_access(conn_src, current_cred(),
+						      conn_dst, msg,
+						      &reply_wake);
+			if (ret < 0)
+				goto exit_unref;
+		}
+	}
+
+	/*
+	 * Forward to monitors before queuing the message. Otherwise, the
+	 * receiver might queue a reply before the original message is queued
+	 * on the monitors.
+	 * We never guarantee consistent ordering across connections, but for
+	 * monitors we should at least make sure they get the message before
+	 * anyone else.
+	 */
+	kdbus_bus_eavesdrop(bus, conn_src, kmsg);
+
+	if (reply_wake) {
+		/*
+		 * If we're synchronously responding to a message, allocate a
+		 * queue item and attach it to the reply tracking object.
+		 * The connection's queue will never get to see it.
+		 */
+		ret = kdbus_conn_entry_sync_attach(conn_dst, kmsg, reply_wake);
+		if (ret < 0)
+			goto exit_unref;
+	} else {
+		/*
+		 * Otherwise, put it in the queue and wait for the connection
+		 * to dequeue and receive the message.
+		 */
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst,
+					      kmsg, reply_wait);
+		if (ret < 0)
+			goto exit_unref;
+	}
+
+wait_sync:
+	/* no reason to keep names locked for replies */
+	name_entry = kdbus_name_unlock(bus->name_registry, name_entry);
+
+	if (sync) {
+		ktime_t now = ktime_get();
+		ktime_t expire = ns_to_ktime(msg->timeout_ns);
+
+		if (likely(ktime_compare(now, expire) < 0))
+			ret = kdbus_conn_wait_reply(conn_src, conn_dst, cmd,
+						    ioctl_file, cancel_fd,
+						    reply_wait, expire);
+		else
+			ret = -ETIMEDOUT;
+	}
+
+exit_unref:
+	kdbus_reply_unref(reply_wait);
+	kdbus_reply_unref(reply_wake);
+	kdbus_conn_unref(conn_dst);
+exit_name_unlock:
+	kdbus_name_unlock(bus->name_registry, name_entry);
+exit_put_cancelfd:
+	if (cancel_fd)
+		fput(cancel_fd);
+
+	return ret;
+}
+
+/**
+ * kdbus_conn_disconnect() - disconnect a connection
+ * @conn:		The connection to disconnect
+ * @ensure_queue_empty:	Flag to indicate if the call should fail in
+ *			case the connection's message list is not
+ *			empty
+ *
+ * If @ensure_msg_list_empty is true, and the connection has pending messages,
+ * -EBUSY is returned.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
+{
+	struct kdbus_queue_entry *entry, *tmp;
+	struct kdbus_bus *bus = conn->ep->bus;
+	struct kdbus_reply *r, *r_tmp;
+	struct kdbus_conn *c;
+	int i, v;
+
+	mutex_lock(&conn->lock);
+	v = atomic_read(&conn->active);
+	if (v == KDBUS_CONN_ACTIVE_NEW) {
+		/* was never connected */
+		mutex_unlock(&conn->lock);
+		return 0;
+	}
+	if (v < 0) {
+		/* already dead */
+		mutex_unlock(&conn->lock);
+		return -EALREADY;
+	}
+	if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
+		/* still busy */
+		mutex_unlock(&conn->lock);
+		return -EBUSY;
+	}
+
+	atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
+	mutex_unlock(&conn->lock);
+
+	wake_up_interruptible(&conn->wait);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
+	if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
+		lock_contended(&conn->dep_map, _RET_IP_);
+#endif
+
+	wait_event(conn->wait,
+		   atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	lock_acquired(&conn->dep_map, _RET_IP_);
+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+	cancel_delayed_work_sync(&conn->work);
+	kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&conn->ep->lock);
+	down_write(&bus->conn_rwlock);
+
+	/* remove from bus and endpoint */
+	hash_del(&conn->hentry);
+	list_del(&conn->monitor_entry);
+	list_del(&conn->ep_entry);
+
+	up_write(&bus->conn_rwlock);
+	mutex_unlock(&conn->ep->lock);
+
+	/*
+	 * Remove all names associated with this connection; this possibly
+	 * moves queued messages back to the activator connection.
+	 */
+	kdbus_name_remove_by_conn(bus->name_registry, conn);
+
+	/* if we die while other connections wait for our reply, notify them */
+	mutex_lock(&conn->lock);
+	list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
+		if (entry->reply)
+			kdbus_notify_reply_dead(bus, entry->msg.src_id,
+						entry->msg.cookie);
+
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+
+	list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
+		kdbus_reply_unlink(r);
+	mutex_unlock(&conn->lock);
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	down_read(&bus->conn_rwlock);
+	hash_for_each(bus->conn_hash, i, c, hentry) {
+		mutex_lock(&c->lock);
+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
+			if (r->reply_src == conn) {
+				if (r->sync) {
+					kdbus_sync_reply_wakeup(r, -EPIPE);
+					kdbus_reply_unlink(r);
+					continue;
+				}
+
+				/* send a 'connection dead' notification */
+				kdbus_notify_reply_dead(bus, c->id, r->cookie);
+				kdbus_reply_unlink(r);
+			}
+		}
+		mutex_unlock(&c->lock);
+	}
+	up_read(&bus->conn_rwlock);
+
+	if (!kdbus_conn_is_monitor(conn))
+		kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
+				       conn->id, conn->flags);
+
+	kdbus_notify_flush(bus);
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_active() - connection is not disconnected
+ * @conn:		Connection to check
+ *
+ * Return true if the connection was not disconnected, yet. Note that a
+ * connection might be disconnected asynchronously, unless you hold the
+ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
+ * suppress connection shutdown for a short period.
+ *
+ * Return: true if the connection is still active
+ */
+bool kdbus_conn_active(const struct kdbus_conn *conn)
+{
+	return atomic_read(&conn->active) >= 0;
+}
+
+static void __kdbus_conn_free(struct kref *kref)
+{
+	struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
+
+	WARN_ON(kdbus_conn_active(conn));
+	WARN_ON(delayed_work_pending(&conn->work));
+	WARN_ON(!list_empty(&conn->queue.msg_list));
+	WARN_ON(!list_empty(&conn->names_list));
+	WARN_ON(!list_empty(&conn->names_queue_list));
+	WARN_ON(!list_empty(&conn->reply_list));
+
+	if (conn->user) {
+		atomic_dec(&conn->user->connections);
+		kdbus_domain_user_unref(conn->user);
+	}
+
+	kdbus_meta_proc_unref(conn->meta);
+	kdbus_match_db_free(conn->match_db);
+	kdbus_pool_free(conn->pool);
+	kdbus_ep_unref(conn->ep);
+	put_cred(conn->cred);
+	kfree(conn->description);
+	kfree(conn);
+}
+
+/**
+ * kdbus_conn_ref() - take a connection reference
+ * @conn:		Connection, may be %NULL
+ *
+ * Return: the connection itself
+ */
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
+{
+	if (conn)
+		kref_get(&conn->kref);
+	return conn;
+}
+
+/**
+ * kdbus_conn_unref() - drop a connection reference
+ * @conn:		Connection (may be NULL)
+ *
+ * When the last reference is dropped, the connection's internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
+{
+	if (conn)
+		kref_put(&conn->kref, __kdbus_conn_free);
+	return NULL;
+}
+
+/**
+ * kdbus_conn_acquire() - acquire an active connection reference
+ * @conn:		Connection
+ *
+ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
+ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
+ * user-visible action on this connection and signal ECONNRESET instead.
+ * To avoid testing for connection availability everytime you take the
+ * connection-lock, you can acquire a connection for short periods.
+ *
+ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
+ * connection. You must also hold a regular reference at any time! As long as
+ * you hold the active-ref, the connection will not be shut down. However, if
+ * the connection was shut down, you can never acquire an active-ref again.
+ *
+ * kdbus_conn_disconnect() disables the connection and then waits for all active
+ * references to be dropped. It will also wake up any pending operation.
+ * However, you must not sleep for an indefinite period while holding an
+ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
+ * to sleep for an indefinite period, either release the reference and try to
+ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
+ * your wait-queue.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_conn_acquire(struct kdbus_conn *conn)
+{
+	if (!atomic_inc_unless_negative(&conn->active))
+		return -ECONNRESET;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
+#endif
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_release() - release an active connection reference
+ * @conn:		Connection
+ *
+ * This releases an active reference that has been acquired via
+ * kdbus_conn_acquire(). If the connection was already disabled and this is the
+ * last active-ref that is dropped, the disconnect-waiter will be woken up and
+ * properly close the connection.
+ */
+void kdbus_conn_release(struct kdbus_conn *conn)
+{
+	int v;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+	v = atomic_dec_return(&conn->active);
+	if (v != KDBUS_CONN_ACTIVE_BIAS)
+		return;
+
+	wake_up_all(&conn->wait);
+}
+
+/**
+ * kdbus_conn_move_messages() - move messages from one connection to another
+ * @conn_dst:		Connection to copy to
+ * @conn_src:		Connection to copy from
+ * @name_id:		Filter for the sequence number of the registered
+ *			name, 0 means no filtering.
+ *
+ * Move all messages from one connection to another. This is used when
+ * an implementer connection is taking over/giving back a well-known name
+ * from/to an activator connection.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+			     struct kdbus_conn *conn_src,
+			     u64 name_id)
+{
+	struct kdbus_queue_entry *q, *q_tmp;
+	struct kdbus_reply *r, *r_tmp;
+	struct kdbus_bus *bus;
+	struct kdbus_conn *c;
+	LIST_HEAD(msg_list);
+	int i, ret = 0;
+
+	if (WARN_ON(!mutex_is_locked(&conn_dst->ep->bus->lock)))
+		return -EINVAL;
+
+	if (WARN_ON(conn_src == conn_dst))
+		return -EINVAL;
+
+	bus = conn_src->ep->bus;
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	down_read(&bus->conn_rwlock);
+	hash_for_each(bus->conn_hash, i, c, hentry) {
+		if (c == conn_src || c == conn_dst)
+			continue;
+
+		mutex_lock(&c->lock);
+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
+			if (r->reply_src != conn_src)
+				continue;
+
+			/* filter messages for a specific name */
+			if (name_id > 0 && r->name_id != name_id)
+				continue;
+
+			kdbus_conn_unref(r->reply_src);
+			r->reply_src = kdbus_conn_ref(conn_dst);
+		}
+		mutex_unlock(&c->lock);
+	}
+	up_read(&bus->conn_rwlock);
+
+	kdbus_conn_lock2(conn_src, conn_dst);
+	list_for_each_entry_safe(q, q_tmp, &conn_src->queue.msg_list, entry) {
+		/* filter messages for a specific name */
+		if (name_id > 0 && q->dst_name_id != name_id)
+			continue;
+
+		kdbus_queue_entry_remove(conn_src, q);
+
+		if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
+		    q->msg_res && q->msg_res->fds_count > 0) {
+			atomic_inc(&conn_dst->lost_count);
+			continue;
+		}
+
+		ret = kdbus_queue_entry_move(conn_dst, q);
+		if (ret < 0) {
+			atomic_inc(&conn_dst->lost_count);
+			kdbus_queue_entry_free(q);
+		}
+	}
+	kdbus_conn_unlock2(conn_src, conn_dst);
+
+	/* wake up poll() */
+	wake_up_interruptible(&conn_dst->wait);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_conn_info() - retrieve info about a connection
+ * @conn:		Connection
+ * @cmd_info:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_conn_info(struct kdbus_conn *conn,
+			struct kdbus_cmd_info *cmd_info)
+{
+	struct kdbus_meta_conn *conn_meta = NULL;
+	struct kdbus_pool_slice *slice = NULL;
+	struct kdbus_name_entry *entry = NULL;
+	struct kdbus_conn *owner_conn = NULL;
+	struct kdbus_item *meta_items = NULL;
+	struct kdbus_info info = {};
+	struct kvec kvec[2];
+	size_t meta_size;
+	u64 attach_flags;
+	int ret = 0;
+
+	if (cmd_info->id == 0) {
+		const char *name;
+
+		name = kdbus_items_get_str(cmd_info->items,
+					   KDBUS_ITEMS_SIZE(cmd_info, items),
+					   KDBUS_ITEM_NAME);
+		if (IS_ERR(name))
+			return -EINVAL;
+
+		if (!kdbus_name_is_valid(name, false))
+			return -EINVAL;
+
+		entry = kdbus_name_lock(conn->ep->bus->name_registry, name);
+		if (!entry || !kdbus_conn_policy_see_name(conn, current_cred(),
+							  name)) {
+			/* pretend a name doesn't exist if you cannot see it */
+			ret = -ESRCH;
+			goto exit;
+		}
+
+		if (entry->conn)
+			owner_conn = kdbus_conn_ref(entry->conn);
+	} else {
+		owner_conn = kdbus_bus_find_conn_by_id(conn->ep->bus,
+						       cmd_info->id);
+		if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
+							  owner_conn)) {
+			/* pretend an id doesn't exist if you cannot see it */
+			ret = -ENXIO;
+			goto exit;
+		}
+	}
+
+	info.id = owner_conn->id;
+	info.flags = owner_conn->flags;
+
+	/* mask out what information the connection wants to pass us */
+	attach_flags = cmd_info->flags &
+		       atomic64_read(&owner_conn->attach_flags_send);
+
+	conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(conn_meta)) {
+		ret = PTR_ERR(conn_meta);
+		conn_meta = NULL;
+		goto exit;
+	}
+
+	ret = kdbus_meta_conn_collect(conn_meta, NULL, owner_conn,
+				      attach_flags);
+	if (ret < 0)
+		goto exit;
+
+	meta_items = kdbus_meta_export(owner_conn->meta, conn_meta,
+				       attach_flags, &meta_size);
+	if (IS_ERR(meta_items)) {
+		ret = PTR_ERR(meta_items);
+		meta_items = NULL;
+		goto exit;
+	}
+
+	kdbus_kvec_set(&kvec[0], &info, sizeof(info), &info.size);
+	kdbus_kvec_set(&kvec[1], meta_items, meta_size, &info.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, info.size,
+				       kvec, NULL, ARRAY_SIZE(kvec));
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit;
+	}
+
+	/* write back the offset */
+	kdbus_pool_slice_publish(slice, &cmd_info->offset,
+				 &cmd_info->info_size);
+	ret = 0;
+
+	kdbus_pool_slice_release(slice);
+exit:
+	kfree(meta_items);
+	kdbus_meta_conn_unref(conn_meta);
+	kdbus_conn_unref(owner_conn);
+	kdbus_name_unlock(conn->ep->bus->name_registry, entry);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_conn_update() - update the attach-flags of a connection or
+ *			     the policy entries of a policy holding one
+ * @conn:		Connection
+ * @cmd:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+			  const struct kdbus_cmd_update *cmd)
+{
+	struct kdbus_bus *bus = conn->ep->bus;
+	bool send_flags_provided = false;
+	bool recv_flags_provided = false;
+	bool policy_provided = false;
+	const struct kdbus_item *item;
+	u64 attach_send;
+	u64 attach_recv;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+			/*
+			 * Only ordinary or monitor connections may update
+			 * their attach-flags-send. attach-flags-recv can
+			 * additionally be updated by activators.
+			 */
+			if (!kdbus_conn_is_ordinary(conn) &&
+			    !kdbus_conn_is_monitor(conn))
+				return -EOPNOTSUPP;
+
+			ret = kdbus_sanitize_attach_flags(item->data64[0],
+							  &attach_send);
+			if (ret < 0)
+				return ret;
+
+			send_flags_provided = true;
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+			if (!kdbus_conn_is_ordinary(conn) &&
+			    !kdbus_conn_is_monitor(conn) &&
+			    !kdbus_conn_is_activator(conn))
+				return -EOPNOTSUPP;
+
+			ret = kdbus_sanitize_attach_flags(item->data64[0],
+							  &attach_recv);
+			if (ret < 0)
+				return ret;
+
+			recv_flags_provided = true;
+			break;
+
+		case KDBUS_ITEM_NAME:
+		case KDBUS_ITEM_POLICY_ACCESS:
+			/*
+			 * Only policy holders may update their policy
+			 * entries. Policy holders are privileged
+			 * connections.
+			 */
+			if (!kdbus_conn_is_policy_holder(conn))
+				return -EOPNOTSUPP;
+
+			policy_provided = true;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+
+	if (policy_provided) {
+		ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
+				       KDBUS_ITEMS_SIZE(cmd, items),
+				       1, true, conn);
+		if (ret < 0)
+			return ret;
+	}
+
+	if (send_flags_provided) {
+		/*
+		 * The attach flags send must always satisfy the
+		 * bus requirements.
+		 */
+		if (bus->attach_flags_req & ~attach_send)
+			return -EINVAL;
+
+		atomic64_set(&conn->attach_flags_send, attach_send);
+	}
+
+	if (recv_flags_provided)
+		atomic64_set(&conn->attach_flags_recv, attach_recv);
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_new() - create a new connection
+ * @ep:			The endpoint the connection is connected to
+ * @hello:		The kdbus_cmd_hello as passed in by the user
+ * @privileged:		Whether to create a privileged connection
+ *
+ * Return: a new kdbus_conn on success, ERR_PTR on failure
+ */
+struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
+				  struct kdbus_cmd_hello *hello,
+				  bool privileged)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	static struct lock_class_key __key;
+#endif
+	const struct kdbus_creds *creds = NULL;
+	struct kdbus_pool_slice *slice = NULL;
+	const struct kdbus_pids *pids = NULL;
+	struct kdbus_item_list items = {};
+	struct kdbus_bus *bus = ep->bus;
+	const struct kdbus_item *item;
+	const char *conn_description = NULL;
+	const char *seclabel = NULL;
+	const char *name = NULL;
+	struct kdbus_conn *conn;
+	u64 attach_flags_send;
+	u64 attach_flags_recv;
+	bool is_policy_holder;
+	bool is_activator;
+	bool is_monitor;
+	struct kvec kvec[2];
+	int ret;
+
+	struct {
+		/* bloom item */
+		u64 size;
+		u64 type;
+		struct kdbus_bloom_parameter bloom;
+	} bloom_item;
+
+	is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
+	is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
+	is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
+
+	/* can only be one of monitor/activator/policy_holder */
+	if (is_monitor + is_activator + is_policy_holder > 1)
+		return ERR_PTR(-EINVAL);
+
+	/* Monitors are disallowed on custom endpoints */
+	if (is_monitor && ep->has_policy)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* only privileged connections can activate and monitor */
+	if (!privileged && (is_activator || is_policy_holder || is_monitor))
+		return ERR_PTR(-EPERM);
+
+	KDBUS_ITEMS_FOREACH(item, hello->items,
+			    KDBUS_ITEMS_SIZE(hello, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_NAME:
+			if (!is_activator && !is_policy_holder)
+				return ERR_PTR(-EINVAL);
+
+			if (name)
+				return ERR_PTR(-EINVAL);
+
+			if (!kdbus_name_is_valid(item->str, true))
+				return ERR_PTR(-EINVAL);
+
+			name = item->str;
+			break;
+
+		case KDBUS_ITEM_CREDS:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			if (item->size != KDBUS_ITEM_SIZE(sizeof(*creds)))
+				return ERR_PTR(-EINVAL);
+
+			creds = &item->creds;
+			break;
+
+		case KDBUS_ITEM_PIDS:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			if (item->size != KDBUS_ITEM_SIZE(sizeof(*pids)))
+				return ERR_PTR(-EINVAL);
+
+			pids = &item->pids;
+			break;
+
+		case KDBUS_ITEM_SECLABEL:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			seclabel = item->str;
+			break;
+
+		case KDBUS_ITEM_CONN_DESCRIPTION:
+			/* human-readable connection name (debugging) */
+			if (conn_description)
+				return ERR_PTR(-EINVAL);
+
+			conn_description = item->str;
+			break;
+
+		case KDBUS_ITEM_POLICY_ACCESS:
+		case KDBUS_ITEM_BLOOM_MASK:
+		case KDBUS_ITEM_ID:
+		case KDBUS_ITEM_NAME_ADD:
+		case KDBUS_ITEM_NAME_REMOVE:
+		case KDBUS_ITEM_NAME_CHANGE:
+		case KDBUS_ITEM_ID_ADD:
+		case KDBUS_ITEM_ID_REMOVE:
+			/* will be handled by policy and match code */
+			break;
+
+		default:
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	if ((is_activator || is_policy_holder) && !name)
+		return ERR_PTR(-EINVAL);
+
+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
+					  &attach_flags_send);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
+					  &attach_flags_recv);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	/* Let userspace know which flags are enforced by the bus */
+	hello->attach_flags_send = bus->attach_flags_req | KDBUS_FLAG_KERNEL;
+
+	/*
+	 * The attach flags must always satisfy the bus
+	 * requirements.
+	 */
+	if (bus->attach_flags_req & ~attach_flags_send)
+		return ERR_PTR(-ECONNREFUSED);
+
+	conn = kzalloc(sizeof(*conn), GFP_KERNEL);
+	if (!conn)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&conn->kref);
+	atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
+#endif
+	mutex_init(&conn->lock);
+	INIT_LIST_HEAD(&conn->names_list);
+	INIT_LIST_HEAD(&conn->names_queue_list);
+	INIT_LIST_HEAD(&conn->reply_list);
+	atomic_set(&conn->name_count, 0);
+	atomic_set(&conn->request_count, 0);
+	atomic_set(&conn->lost_count, 0);
+	INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
+	conn->cred = get_current_cred();
+	init_waitqueue_head(&conn->wait);
+	kdbus_queue_init(&conn->queue);
+	conn->privileged = privileged;
+	conn->ep = kdbus_ep_ref(ep);
+	conn->id = atomic64_inc_return(&bus->conn_seq_last);
+	conn->flags = hello->flags;
+	atomic64_set(&conn->attach_flags_send, attach_flags_send);
+	atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
+	/* init entry, so we can remove it unconditionally */
+	INIT_LIST_HEAD(&conn->monitor_entry);
+
+	if (conn_description) {
+		conn->description = kstrdup(conn_description, GFP_KERNEL);
+		if (!conn->description) {
+			ret = -ENOMEM;
+			goto exit_unref;
+		}
+	}
+
+	conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
+	if (IS_ERR(conn->pool)) {
+		ret = PTR_ERR(conn->pool);
+		conn->pool = NULL;
+		goto exit_unref;
+	}
+
+	conn->match_db = kdbus_match_db_new();
+	if (IS_ERR(conn->match_db)) {
+		ret = PTR_ERR(conn->match_db);
+		conn->match_db = NULL;
+		goto exit_unref;
+	}
+
+	/* return properties of this connection to the caller */
+	hello->bus_flags = bus->bus_flags;
+	hello->id = conn->id;
+
+	BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
+	memcpy(hello->id128, bus->id128, sizeof(hello->id128));
+
+	conn->meta = kdbus_meta_proc_new();
+	if (IS_ERR(conn->meta)) {
+		ret = PTR_ERR(conn->meta);
+		conn->meta = NULL;
+		goto exit_unref;
+	}
+
+	/* privileged processes can impersonate somebody else */
+	if (creds || pids || seclabel) {
+		ret = kdbus_meta_proc_fake(conn->meta, creds, pids, seclabel);
+		if (ret < 0)
+			goto exit_unref;
+
+		conn->faked_meta = true;
+	} else {
+		ret = kdbus_meta_proc_collect(conn->meta,
+					      KDBUS_ATTACH_CREDS |
+					      KDBUS_ATTACH_PIDS |
+					      KDBUS_ATTACH_AUXGROUPS |
+					      KDBUS_ATTACH_TID_COMM |
+					      KDBUS_ATTACH_PID_COMM |
+					      KDBUS_ATTACH_EXE |
+					      KDBUS_ATTACH_CMDLINE |
+					      KDBUS_ATTACH_CGROUP |
+					      KDBUS_ATTACH_CAPS |
+					      KDBUS_ATTACH_SECLABEL |
+					      KDBUS_ATTACH_AUDIT);
+		if (ret < 0)
+			goto exit_unref;
+	}
+
+	/*
+	 * Account the connection against the current user (UID), or for
+	 * custom endpoints use the anonymous user assigned to the endpoint.
+	 * Note that limits are always accounted against the real UID, not
+	 * the effective UID (cred->user always points to the accounting of
+	 * cred->uid, not cred->euid).
+	 */
+	if (ep->user) {
+		conn->user = kdbus_domain_user_ref(ep->user);
+	} else {
+		conn->user = kdbus_domain_get_user(ep->bus->domain,
+						   current_uid());
+		if (IS_ERR(conn->user)) {
+			ret = PTR_ERR(conn->user);
+			conn->user = NULL;
+			goto exit_unref;
+		}
+	}
+
+	if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
+		/* decremented by destructor as conn->user is valid */
+		ret = -EMFILE;
+		goto exit_unref;
+	}
+
+	bloom_item.size = sizeof(bloom_item);
+	bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
+	bloom_item.bloom = bus->bloom;
+	kdbus_kvec_set(&kvec[0], &items, sizeof(items), &items.size);
+	kdbus_kvec_set(&kvec[1], &bloom_item, bloom_item.size, &items.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, items.size, kvec, NULL,
+				       ARRAY_SIZE(kvec));
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit_unref;
+	}
+
+	kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
+	kdbus_pool_slice_release(slice);
+
+	return conn;
+
+exit_unref:
+	kdbus_pool_slice_release(slice);
+	kdbus_conn_unref(conn);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_conn_connect() - introduce a connection to a bus
+ * @conn:		Connection
+ * @hello:		Hello parameters
+ *
+ * This puts life into a kdbus-conn object. A connection to the bus is
+ * established and the peer will be reachable via the bus (if it is an ordinary
+ * connection).
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_conn_connect(struct kdbus_conn *conn, struct kdbus_cmd_hello *hello)
+{
+	struct kdbus_ep *ep = conn->ep;
+	struct kdbus_bus *bus = ep->bus;
+	int ret;
+
+	if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
+		return -EALREADY;
+
+	/* make sure the ep-node is active while we add our connection */
+	if (!kdbus_node_acquire(&ep->node))
+		return -ESHUTDOWN;
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&bus->lock);
+	mutex_lock(&ep->lock);
+	down_write(&bus->conn_rwlock);
+
+	/* link into monitor list */
+	if (kdbus_conn_is_monitor(conn))
+		list_add_tail(&conn->monitor_entry, &bus->monitors_list);
+
+	/* link into bus and endpoint */
+	list_add_tail(&conn->ep_entry, &ep->conn_list);
+	hash_add(bus->conn_hash, &conn->hentry, conn->id);
+
+	/* enable lookups and acquire active ref */
+	atomic_set(&conn->active, 1);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
+#endif
+
+	up_write(&bus->conn_rwlock);
+	mutex_unlock(&ep->lock);
+	mutex_unlock(&bus->lock);
+
+	kdbus_node_release(&ep->node);
+
+	/*
+	 * Notify subscribers about the new active connection, unless it is
+	 * a monitor. Monitors are invisible on the bus, can't be addressed
+	 * directly, and won't cause any notifications.
+	 */
+	if (!kdbus_conn_is_monitor(conn)) {
+		ret = kdbus_notify_id_change(conn->ep->bus, KDBUS_ITEM_ID_ADD,
+					     conn->id, conn->flags);
+		if (ret < 0)
+			goto exit_disconnect;
+	}
+
+	if (kdbus_conn_is_activator(conn)) {
+		u64 flags = KDBUS_NAME_ACTIVATOR;
+		const char *name;
+
+		name = kdbus_items_get_str(hello->items,
+					   KDBUS_ITEMS_SIZE(hello, items),
+					   KDBUS_ITEM_NAME);
+		if (WARN_ON(!name)) {
+			ret = -EINVAL;
+			goto exit_disconnect;
+		}
+
+		ret = kdbus_name_acquire(bus->name_registry, conn, name,
+					 &flags);
+		if (ret < 0)
+			goto exit_disconnect;
+	}
+
+	kdbus_conn_release(conn);
+	kdbus_notify_flush(bus);
+	return 0;
+
+exit_disconnect:
+	kdbus_conn_release(conn);
+	kdbus_conn_disconnect(conn, false);
+	return ret;
+}
+
+/**
+ * kdbus_conn_has_name() - check if a connection owns a name
+ * @conn:		Connection
+ * @name:		Well-know name to check for
+ *
+ * Return: true if the name is currently owned by the connection
+ */
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
+{
+	struct kdbus_name_entry *e;
+	bool match = false;
+
+	/* No need to go further if we do not own names */
+	if (atomic_read(&conn->name_count) == 0)
+		return false;
+
+	mutex_lock(&conn->lock);
+	list_for_each_entry(e, &conn->names_list, conn_entry) {
+		if (strcmp(e->name, name) == 0) {
+			match = true;
+			break;
+		}
+	}
+	mutex_unlock(&conn->lock);
+
+	return match;
+}
+
+/* query the policy-database for all names of @whom */
+static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
+					const struct cred *conn_creds,
+					struct kdbus_policy_db *db,
+					struct kdbus_conn *whom,
+					unsigned int access)
+{
+	struct kdbus_name_entry *ne;
+	bool pass = false;
+	int res;
+
+	down_read(&db->entries_rwlock);
+	mutex_lock(&whom->lock);
+
+	list_for_each_entry(ne, &whom->names_list, conn_entry) {
+		res = kdbus_policy_query_unlocked(db, conn_creds ? : conn->cred,
+						  ne->name,
+						  kdbus_strhash(ne->name));
+		if (res >= (int)access) {
+			pass = true;
+			break;
+		}
+	}
+
+	mutex_unlock(&whom->lock);
+	up_read(&db->entries_rwlock);
+
+	return pass;
+}
+
+/**
+ * kdbus_conn_policy_own_name() - verify a connection can own the given name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to acquire the well-known name @name.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name)
+{
+	unsigned int hash = kdbus_strhash(name);
+	int res;
+
+	if (!conn_creds)
+		conn_creds = conn->cred;
+
+	if (conn->ep->has_policy) {
+		res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
+					 name, hash);
+		if (res < KDBUS_POLICY_OWN)
+			return false;
+	}
+
+	if (conn->privileged)
+		return true;
+
+	res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
+				 name, hash);
+	return res >= KDBUS_POLICY_OWN;
+}
+
+/**
+ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
+ * @conn:		Connection that tries to talk
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @to:			Connection that is talked to
+ *
+ * This verifies that @conn is allowed to talk to @to.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
+			    const struct cred *conn_creds,
+			    struct kdbus_conn *to)
+{
+	if (!conn_creds)
+		conn_creds = conn->cred;
+
+	if (conn->ep->has_policy &&
+	    !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
+					 to, KDBUS_POLICY_TALK))
+		return false;
+
+	if (conn->privileged)
+		return true;
+	if (uid_eq(conn_creds->euid, to->cred->uid))
+		return true;
+
+	return kdbus_conn_policy_query_all(conn, conn_creds,
+					   &conn->ep->bus->policy_db, to,
+					   KDBUS_POLICY_TALK);
+}
+
+/**
+ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
+ *					   name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to see the well-known name @name. Caller
+ * must hold policy-lock.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
+					 const struct cred *conn_creds,
+					 const char *name)
+{
+	int res;
+
+	/*
+	 * By default, all names are visible on a bus. SEE policies can only be
+	 * installed on custom endpoints, where by default no name is visible.
+	 */
+	if (!conn->ep->has_policy)
+		return true;
+
+	res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
+					  conn_creds ? : conn->cred,
+					  name, kdbus_strhash(name));
+	return res >= KDBUS_POLICY_SEE;
+}
+
+/**
+ * kdbus_conn_policy_see_name() - verify a connection can see a given name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to see the well-known name @name.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name)
+{
+	bool res;
+
+	down_read(&conn->ep->policy_db.entries_rwlock);
+	res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
+	up_read(&conn->ep->policy_db.entries_rwlock);
+
+	return res;
+}
+
+/**
+ * kdbus_conn_policy_see() - verify a connection can see a given peer
+ * @conn:		Connection to verify whether it sees a peer
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @whom:		Peer destination that is to be 'seen'
+ *
+ * This checks whether @conn is able to see @whom.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see(struct kdbus_conn *conn,
+			   const struct cred *conn_creds,
+			   struct kdbus_conn *whom)
+{
+	/*
+	 * By default, all names are visible on a bus, so a connection can
+	 * always see other connections. SEE policies can only be installed on
+	 * custom endpoints, where by default no name is visible and we hide
+	 * peers from each other, unless you see at least _one_ name of the
+	 * peer.
+	 */
+	return !conn->ep->has_policy ||
+	       kdbus_conn_policy_query_all(conn, conn_creds,
+					   &conn->ep->policy_db, whom,
+					   KDBUS_POLICY_SEE);
+}
+
+/**
+ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
+ *					  receive a given kernel notification
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @kmsg:		The message carrying the notification
+ *
+ * This checks whether @conn is allowed to see the kernel notification @kmsg.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
+					const struct cred *conn_creds,
+					const struct kdbus_kmsg *kmsg)
+{
+	if (WARN_ON(kmsg->msg.src_id != KDBUS_SRC_ID_KERNEL))
+		return false;
+
+	/*
+	 * Depending on the notification type, broadcasted kernel notifications
+	 * have to be filtered:
+	 *
+	 * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
+	 *     to a peer if, and only if, that peer can see the name this
+	 *     notification is for.
+	 *
+	 * KDBUS_ITEM_ID_{ADD,REMOVE}: As new peers cannot have names, and all
+	 *     names are dropped before a peer is removed, those notifications
+	 *     cannot be seen on custom endpoints. Thus, we only pass them
+	 *     through on default endpoints.
+	 */
+
+	switch (kmsg->notify_type) {
+	case KDBUS_ITEM_NAME_ADD:
+	case KDBUS_ITEM_NAME_REMOVE:
+	case KDBUS_ITEM_NAME_CHANGE:
+		return kdbus_conn_policy_see_name(conn, conn_creds,
+						  kmsg->notify_name);
+
+	case KDBUS_ITEM_ID_ADD:
+	case KDBUS_ITEM_ID_REMOVE:
+		return !conn->ep->has_policy;
+
+	default:
+		WARN(1, "Invalid type for notification broadcast: %llu\n",
+		     (unsigned long long)kmsg->notify_type);
+		return false;
+	}
+}
diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
new file mode 100644
index 000000000000..ff25931a4dd0
--- /dev/null
+++ b/ipc/kdbus/connection.h
@@ -0,0 +1,262 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_CONNECTION_H
+#define __KDBUS_CONNECTION_H
+
+#include <linux/atomic.h>
+#include <linux/kref.h>
+#include <linux/lockdep.h>
+#include <linux/path.h>
+
+#include "limits.h"
+#include "metadata.h"
+#include "pool.h"
+#include "queue.h"
+#include "util.h"
+
+#define KDBUS_HELLO_SPECIAL_CONN	(KDBUS_HELLO_ACTIVATOR | \
+					 KDBUS_HELLO_POLICY_HOLDER | \
+					 KDBUS_HELLO_MONITOR)
+
+/**
+ * struct kdbus_conn - connection to a bus
+ * @kref:		Reference count
+ * @active:		Active references to the connection
+ * @id:			Connection ID
+ * @flags:		KDBUS_HELLO_* flags
+ * @attach_flags_send:	KDBUS_ATTACH_* flags for sending
+ * @attach_flags_recv:	KDBUS_ATTACH_* flags for receiving
+ * @description:	Human-readable connection description, used for
+ *			debugging. This field is only set when the
+ *			connection is created.
+ * @ep:			The endpoint this connection belongs to
+ * @lock:		Connection data lock
+ * @msg_users:		Array to account the number of queued messages per
+ *			individual user
+ * @msg_users_max:	Size of the users array
+ * @hentry:		Entry in ID <-> connection map
+ * @ep_entry:		Entry in endpoint
+ * @monitor_entry:	Entry in monitor, if the connection is a monitor
+ * @names_list:		List of well-known names
+ * @names_queue_list:	Well-known names this connection waits for
+ * @reply_list:		List of connections this connection should
+ *			reply to
+ * @work:		Delayed work to handle timeouts
+ * @activator_of:	Well-known name entry this connection acts as an
+ *			activator for
+ * @match_db:		Subscription filter to broadcast messages
+ * @meta:		Active connection creator's metadata/credentials,
+ *			either from the handle or from HELLO
+ * @pool:		The user's buffer to receive messages
+ * @user:		Owner of the connection
+ * @cred:		The credentials of the connection at creation time
+ * @name_count:		Number of owned well-known names
+ * @request_count:	Number of pending requests issued by this
+ *			connection that are waiting for replies from
+ *			other peers
+ * @lost_count:		Number of lost broadcast messages
+ * @wait:		Wake up this endpoint
+ * @queue:		The message queue associated with this connection
+ * @privileged:		Whether this connection is privileged on the bus
+ * @faked_meta:		Whether the metadata was faked on HELLO
+ */
+struct kdbus_conn {
+	struct kref kref;
+	atomic_t active;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map dep_map;
+#endif
+	u64 id;
+	u64 flags;
+	atomic64_t attach_flags_send;
+	atomic64_t attach_flags_recv;
+	const char *description;
+	struct kdbus_ep *ep;
+	struct mutex lock;
+	unsigned int *msg_users;
+	unsigned int msg_users_max;
+	struct hlist_node hentry;
+	struct list_head ep_entry;
+	struct list_head monitor_entry;
+	struct list_head names_list;
+	struct list_head names_queue_list;
+	struct list_head reply_list;
+	struct delayed_work work;
+	struct kdbus_name_entry *activator_of;
+	struct kdbus_match_db *match_db;
+	struct kdbus_meta_proc *meta;
+	struct kdbus_pool *pool;
+	struct kdbus_domain_user *user;
+	const struct cred *cred;
+	atomic_t name_count;
+	atomic_t request_count;
+	atomic_t lost_count;
+	wait_queue_head_t wait;
+	struct kdbus_queue queue;
+
+	bool privileged:1;
+	bool faked_meta:1;
+};
+
+struct kdbus_kmsg;
+struct kdbus_name_registry;
+
+struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
+				  struct kdbus_cmd_hello *hello,
+				  bool privileged);
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
+int kdbus_conn_acquire(struct kdbus_conn *conn);
+void kdbus_conn_release(struct kdbus_conn *conn);
+int kdbus_conn_connect(struct kdbus_conn *conn, struct kdbus_cmd_hello *hello);
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
+bool kdbus_conn_active(const struct kdbus_conn *conn);
+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
+			    struct kdbus_conn *conn_dst,
+			    const struct kdbus_kmsg *kmsg,
+			    struct kdbus_reply *reply);
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+			     struct kdbus_conn *conn_src,
+			     u64 name_id);
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
+
+/* policy */
+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name);
+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
+			    const struct cred *conn_creds,
+			    struct kdbus_conn *to);
+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
+					 const struct cred *curr_creds,
+					 const char *name);
+bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
+				const struct cred *curr_creds,
+				const char *name);
+bool kdbus_conn_policy_see(struct kdbus_conn *conn,
+			   const struct cred *curr_creds,
+			   struct kdbus_conn *whom);
+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
+					const struct cred *curr_creds,
+					const struct kdbus_kmsg *kmsg);
+
+/* command dispatcher */
+int kdbus_cmd_msg_send(struct kdbus_conn *conn_src,
+		       struct kdbus_cmd_send *cmd_send,
+		       struct file *ioctl_file,
+		       struct kdbus_kmsg *kmsg);
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+		       struct kdbus_cmd_recv *recv);
+int kdbus_cmd_conn_info(struct kdbus_conn *conn,
+			struct kdbus_cmd_info *cmd_info);
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+			  const struct kdbus_cmd_update *cmd_update);
+
+/**
+ * kdbus_conn_is_ordinary() - Check if connection is ordinary
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is an ordinary connection
+ */
+static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
+{
+	return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
+}
+
+/**
+ * kdbus_conn_is_activator() - Check if connection is an activator
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is an activator
+ */
+static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_ACTIVATOR;
+}
+
+/**
+ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is a policy holder
+ */
+static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
+}
+
+/**
+ * kdbus_conn_is_monitor() - Check if connection is a monitor
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is a monitor
+ */
+static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_MONITOR;
+}
+
+/**
+ * kdbus_conn_lock2() - Lock two connections
+ * @a:		connection A to lock or NULL
+ * @b:		connection B to lock or NULL
+ *
+ * Lock two connections at once. As we need to have a stable locking order, we
+ * always lock the connection with lower memory address first.
+ */
+static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
+{
+	if (a < b) {
+		if (a)
+			mutex_lock(&a->lock);
+		if (b && b != a)
+			mutex_lock_nested(&b->lock, !!a);
+	} else {
+		if (b)
+			mutex_lock(&b->lock);
+		if (a && a != b)
+			mutex_lock_nested(&a->lock, !!b);
+	}
+}
+
+/**
+ * kdbus_conn_unlock2() - Unlock two connections
+ * @a:		connection A to unlock or NULL
+ * @b:		connection B to unlock or NULL
+ *
+ * Unlock two connections at once. See kdbus_conn_lock2().
+ */
+static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
+				      struct kdbus_conn *b)
+{
+	if (a)
+		mutex_unlock(&a->lock);
+	if (b && b != a)
+		mutex_unlock(&b->lock);
+}
+
+/**
+ * kdbus_conn_assert_active() - lockdep assert on active lock
+ * @conn:	connection that shall be active
+ *
+ * This verifies via lockdep that the caller holds an active reference to the
+ * given connection.
+ */
+static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
+{
+	lockdep_assert_held(conn);
+}
+
+#endif
diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
new file mode 100644
index 000000000000..95bc3822ed45
--- /dev/null
+++ b/ipc/kdbus/item.c
@@ -0,0 +1,309 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+#define KDBUS_ITEM_VALID(_i, _is, _s)					\
+	((_i)->size >= KDBUS_ITEM_HEADER_SIZE &&			\
+	 (u8 *)(_i) + (_i)->size > (u8 *)(_i) &&			\
+	 (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) &&		\
+	 (u8 *)(_i) >= (u8 *)(_is))
+
+#define KDBUS_ITEMS_END(_i, _is, _s)					\
+	((u8 *)_i == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
+
+/**
+ * kdbus_item_validate_name() - validate an item containing a name
+ * @item:		Item to validate
+ *
+ * Return: zero on success or an negative error code on failure
+ */
+int kdbus_item_validate_name(const struct kdbus_item *item)
+{
+	if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
+		return -EINVAL;
+
+	if (item->size > KDBUS_ITEM_HEADER_SIZE +
+			 KDBUS_SYSNAME_MAX_LEN + 1)
+		return -ENAMETOOLONG;
+
+	if (!kdbus_str_valid(item->str, KDBUS_ITEM_PAYLOAD_SIZE(item)))
+		return -EINVAL;
+
+	return kdbus_sysname_is_valid(item->str);
+}
+
+static int kdbus_item_validate(const struct kdbus_item *item)
+{
+	size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+	size_t l;
+	int ret;
+
+	if (item->size < KDBUS_ITEM_HEADER_SIZE)
+		return -EINVAL;
+
+	switch (item->type) {
+	case KDBUS_ITEM_PAYLOAD_VEC:
+		if (payload_size != sizeof(struct kdbus_vec))
+			return -EINVAL;
+		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_PAYLOAD_OFF:
+		if (payload_size != sizeof(struct kdbus_vec))
+			return -EINVAL;
+		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_PAYLOAD_MEMFD:
+		if (payload_size != sizeof(struct kdbus_memfd))
+			return -EINVAL;
+		if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
+			return -EINVAL;
+		if (item->memfd.fd < 0)
+			return -EBADF;
+		break;
+
+	case KDBUS_ITEM_FDS:
+		if (payload_size % sizeof(int) != 0)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CANCEL_FD:
+		if (payload_size != sizeof(int))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_PARAMETER:
+		if (payload_size != sizeof(struct kdbus_bloom_parameter))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_FILTER:
+		/* followed by the bloom-mask, depends on the bloom-size */
+		if (payload_size < sizeof(struct kdbus_bloom_filter))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_MASK:
+		/* size depends on bloom-size of bus */
+		break;
+
+	case KDBUS_ITEM_CONN_DESCRIPTION:
+	case KDBUS_ITEM_MAKE_NAME:
+		ret = kdbus_item_validate_name(item);
+		if (ret < 0)
+			return ret;
+		break;
+
+	case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+	case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+	case KDBUS_ITEM_ID:
+		if (payload_size != sizeof(u64))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_TIMESTAMP:
+		if (payload_size != sizeof(struct kdbus_timestamp))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CREDS:
+		if (payload_size != sizeof(struct kdbus_creds))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_AUXGROUPS:
+		if (payload_size % sizeof(u32) != 0)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_NAME:
+	case KDBUS_ITEM_DST_NAME:
+	case KDBUS_ITEM_PID_COMM:
+	case KDBUS_ITEM_TID_COMM:
+	case KDBUS_ITEM_EXE:
+	case KDBUS_ITEM_CMDLINE:
+	case KDBUS_ITEM_CGROUP:
+	case KDBUS_ITEM_SECLABEL:
+		if (!kdbus_str_valid(item->str, payload_size))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CAPS:
+		/* TODO */
+		break;
+
+	case KDBUS_ITEM_AUDIT:
+		if (payload_size != sizeof(struct kdbus_audit))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_POLICY_ACCESS:
+		if (payload_size != sizeof(struct kdbus_policy_access))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_NAME_ADD:
+	case KDBUS_ITEM_NAME_REMOVE:
+	case KDBUS_ITEM_NAME_CHANGE:
+		if (payload_size < sizeof(struct kdbus_notify_name_change))
+			return -EINVAL;
+		l = payload_size - offsetof(struct kdbus_notify_name_change,
+					    name);
+		if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_ID_ADD:
+	case KDBUS_ITEM_ID_REMOVE:
+		if (payload_size != sizeof(struct kdbus_notify_id_change))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_REPLY_TIMEOUT:
+	case KDBUS_ITEM_REPLY_DEAD:
+		if (payload_size != 0)
+			return -EINVAL;
+		break;
+
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+/**
+ * kdbus_items_validate() - validate items passed by user-space
+ * @items:		items to validate
+ * @items_size:		number of items
+ *
+ * This verifies that the passed items pointer is consistent and valid.
+ * Furthermore, each item is checked for:
+ *  - valid "size" value
+ *  - payload is of expected type
+ *  - payload is fully included in the item
+ *  - string payloads are zero-terminated
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
+{
+	const struct kdbus_item *item;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
+		if (!KDBUS_ITEM_VALID(item, items, items_size))
+			return -EINVAL;
+
+		ret = kdbus_item_validate(item);
+		if (ret < 0)
+			return ret;
+	}
+
+	if (!KDBUS_ITEMS_END(item, items, items_size))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_items_get() - Find unique item in item-array
+ * @items:		items to search through
+ * @items_size:		total size of item array
+ * @item_type:		item-type to find
+ *
+ * Return: Pointer to found item, ERR_PTR if not found or available multiple
+ *         times.
+ */
+struct kdbus_item *kdbus_items_get(const struct kdbus_item *items,
+				   size_t items_size,
+				   unsigned int item_type)
+{
+	const struct kdbus_item *iter, *found = NULL;
+
+	KDBUS_ITEMS_FOREACH(iter, items, items_size) {
+		if (iter->type == item_type) {
+			if (found)
+				return ERR_PTR(-EEXIST);
+			found = iter;
+		}
+	}
+
+	return (struct kdbus_item *)found ? : ERR_PTR(-EBADMSG);
+}
+
+/**
+ * kdbus_items_get_str() - get string from a list of items
+ * @items:		The items to walk
+ * @items_size:		The size of all items
+ * @item_type:		The item type to look for
+ *
+ * This function walks a list of items and searches for items of type
+ * @item_type. If it finds exactly one such item, @str_ret will be set to
+ * the .str member of the item.
+ *
+ * Return: the string, if the item was found exactly once, ERR_PTR(-EEXIST)
+ * if the item was found more than once, and ERR_PTR(-EBADMSG) if there was
+ * no item of the given type.
+ */
+const char *kdbus_items_get_str(const struct kdbus_item *items,
+				size_t items_size,
+				unsigned int item_type)
+{
+	const struct kdbus_item *item;
+
+	item = kdbus_items_get(items, items_size, item_type);
+	return IS_ERR(item) ? ERR_CAST(item) : item->str;
+}
+
+/**
+ * kdbus_item_set() - Set item content
+ * @item:	The item to modify
+ * @type:	The item type to set (KDBUS_ITEM_*)
+ * @data:	Data to copy to item->data, may be %NULL
+ * @len:	Number of bytes in @data
+ *
+ * This sets type, size and data fields of an item. If @data is NULL, the data
+ * memory is cleared.
+ *
+ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
+ * case @len is not 8byte aligned) is cleared by this call.
+ *
+ * Returns: Pointer to the following item.
+ */
+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
+				  const void *data, size_t len)
+{
+	item->type = type;
+	item->size = KDBUS_ITEM_HEADER_SIZE + len;
+
+	if (data) {
+		memcpy(item->data, data, len);
+		memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
+	} else {
+		memset(item->data, 0, KDBUS_ALIGN8(len));
+	}
+
+	return KDBUS_ITEM_NEXT(item);
+}
diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
new file mode 100644
index 000000000000..6c4f26ba226b
--- /dev/null
+++ b/ipc/kdbus/item.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ITEM_H
+#define __KDBUS_ITEM_H
+
+#include <linux/kernel.h>
+#include <uapi/linux/kdbus.h>
+
+#include "util.h"
+
+/* generic access and iterators over a stream of items */
+#define KDBUS_ITEM_NEXT(_i) (typeof(_i))(((u8 *)_i) + KDBUS_ALIGN8((_i)->size))
+#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*_h), _is))
+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
+#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
+#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
+
+#define KDBUS_ITEMS_FOREACH(_i, _is, _s)				\
+	for (_i = _is;							\
+	     ((u8 *)(_i) < (u8 *)(_is) + (_s)) &&			\
+	       ((u8 *)(_i) >= (u8 *)(_is));				\
+	     _i = KDBUS_ITEM_NEXT(_i))
+
+/**
+ * struct kdbus_item_header - Describes the fix part of an item
+ * @size:	The total size of the item
+ * @type:	The item type, one of KDBUS_ITEM_*
+ */
+struct kdbus_item_header {
+	u64 size;
+	u64 type;
+};
+
+int kdbus_item_validate_name(const struct kdbus_item *item);
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
+struct kdbus_item *kdbus_items_get(const struct kdbus_item *items,
+				   size_t items_size,
+				   unsigned int item_type);
+const char *kdbus_items_get_str(const struct kdbus_item *items,
+				size_t items_size,
+				unsigned int item_type);
+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
+				  const void *data, size_t len);
+
+#endif
diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
new file mode 100644
index 000000000000..3ec2afc8ff5c
--- /dev/null
+++ b/ipc/kdbus/message.c
@@ -0,0 +1,598 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <net/sock.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_KMSG_HEADER_SIZE offsetof(struct kdbus_kmsg, msg)
+
+static struct kdbus_msg_resources *kdbus_msg_resources_new(void)
+{
+	struct kdbus_msg_resources *r;
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&r->kref);
+
+	return r;
+}
+
+static void __kdbus_msg_resources_free(struct kref *kref)
+{
+	struct kdbus_msg_resources *r =
+		container_of(kref, struct kdbus_msg_resources, kref);
+	size_t i;
+
+	for (i = 0; i < r->data_count; ++i) {
+		switch (r->data[i].type) {
+		case KDBUS_MSG_DATA_VEC:
+			/* nothing to do */
+			break;
+		case KDBUS_MSG_DATA_MEMFD:
+			if (r->data[i].memfd.file)
+				fput(r->data[i].memfd.file);
+			break;
+		}
+	}
+
+	kfree(r->data);
+
+	kdbus_fput_files(r->fds, r->fds_count);
+	kfree(r->fds);
+
+	kfree(r->dst_name);
+	kfree(r);
+}
+
+/**
+ * kdbus_msg_resources_ref() - Acquire reference to msg resources
+ * @r:		resources to acquire ref to
+ *
+ * Return: The acquired resource
+ */
+struct kdbus_msg_resources *
+kdbus_msg_resources_ref(struct kdbus_msg_resources *r)
+{
+	if (r)
+		kref_get(&r->kref);
+	return r;
+}
+
+/**
+ * kdbus_msg_resources_unref() - Drop reference to msg resources
+ * @r:		resources to drop reference of
+ *
+ * Return: NULL
+ */
+struct kdbus_msg_resources *
+kdbus_msg_resources_unref(struct kdbus_msg_resources *r)
+{
+	if (r)
+		kref_put(&r->kref, __kdbus_msg_resources_free);
+	return NULL;
+}
+
+/**
+ * kdbus_kmsg_free() - free allocated message
+ * @kmsg:		Message
+ */
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg)
+{
+	kdbus_msg_resources_unref(kmsg->res);
+	kdbus_meta_conn_unref(kmsg->conn_meta);
+	kdbus_meta_proc_unref(kmsg->proc_meta);
+	kfree(kmsg->iov);
+	kfree(kmsg);
+}
+
+/**
+ * kdbus_kmsg_new() - allocate message
+ * @extra_size:		Additional size to reserve for data
+ *
+ * Return: new kdbus_kmsg on success, ERR_PTR on failure.
+ */
+struct kdbus_kmsg *kdbus_kmsg_new(size_t extra_size)
+{
+	struct kdbus_kmsg *m;
+	size_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_kmsg) + KDBUS_ITEM_SIZE(extra_size);
+	m = kzalloc(size, GFP_KERNEL);
+	if (!m)
+		return ERR_PTR(-ENOMEM);
+
+	m->msg.size = size - KDBUS_KMSG_HEADER_SIZE;
+	m->msg.items[0].size = KDBUS_ITEM_SIZE(extra_size);
+
+	m->proc_meta = kdbus_meta_proc_new();
+	if (IS_ERR(m->proc_meta)) {
+		ret = PTR_ERR(m->proc_meta);
+		goto exit;
+	}
+
+	m->conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(m->conn_meta)) {
+		ret = PTR_ERR(m->conn_meta);
+		goto exit;
+	}
+
+	return m;
+
+exit:
+	kdbus_kmsg_free(m);
+	return ERR_PTR(ret);
+}
+
+static int kdbus_handle_check_file(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct socket *sock;
+
+	/*
+	 * Don't allow file descriptors in the transport that themselves allow
+	 * file descriptor queueing. This will eventually be allowed once both
+	 * unix domain sockets and kdbus share a generic garbage collector.
+	 */
+
+	if (file->f_op == &kdbus_handle_ep_ops)
+		return -EOPNOTSUPP;
+
+	if (!S_ISSOCK(inode->i_mode))
+		return 0;
+
+	if (file->f_mode & FMODE_PATH)
+		return 0;
+
+	sock = SOCKET_I(inode);
+	if (sock->sk && sock->ops && sock->ops->family == PF_UNIX)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static const char * const zeros = "\0\0\0\0\0\0\0";
+
+/*
+ * kdbus_msg_scan_items() - validate incoming data and prepare parsing
+ * @kmsg:		Message
+ * @bus:		Bus the message is sent over
+ *
+ * Return: 0 on success, negative errno on failure.
+ *
+ * Files references in MEMFD or FDS items are pinned.
+ *
+ * On errors, the caller should drop any taken reference with
+ * kdbus_kmsg_free()
+ */
+static int kdbus_msg_scan_items(struct kdbus_kmsg *kmsg,
+				struct kdbus_bus *bus)
+{
+	struct kdbus_msg_resources *res = kmsg->res;
+	const struct kdbus_msg *msg = &kmsg->msg;
+	const struct kdbus_item *item;
+	size_t n, n_vecs, n_memfds;
+	bool has_bloom = false;
+	bool has_name = false;
+	bool has_fds = false;
+	bool is_broadcast;
+	bool is_signal;
+	u64 vec_size;
+
+	is_broadcast = (msg->dst_id == KDBUS_DST_ID_BROADCAST);
+	is_signal = !!(msg->flags & KDBUS_MSG_SIGNAL);
+
+	/* count data payloads */
+	n_vecs = 0;
+	n_memfds = 0;
+	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_VEC:
+			++n_vecs;
+			break;
+		case KDBUS_ITEM_PAYLOAD_MEMFD:
+			++n_memfds;
+			if (item->memfd.size % 8)
+				++n_vecs;
+			break;
+		default:
+			break;
+		}
+	}
+
+	n = n_vecs + n_memfds;
+	if (n > 0) {
+		res->data = kcalloc(n, sizeof(*res->data), GFP_KERNEL);
+		if (!res->data)
+			return -ENOMEM;
+	}
+
+	if (n_vecs > 0) {
+		kmsg->iov = kcalloc(n_vecs, sizeof(*kmsg->iov), GFP_KERNEL);
+		if (!kmsg->iov)
+			return -ENOMEM;
+	}
+
+	/* import data payloads */
+	n = 0;
+	vec_size = 0;
+	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
+		size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+		struct iovec *iov = kmsg->iov + kmsg->iov_count;
+
+		if (++n > KDBUS_MSG_MAX_ITEMS)
+			return -E2BIG;
+
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_VEC: {
+			struct kdbus_msg_data *d = res->data + res->data_count;
+			void __force __user *ptr = KDBUS_PTR(item->vec.address);
+			size_t size = item->vec.size;
+
+			if (vec_size + size < vec_size)
+				return -EMSGSIZE;
+			if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
+				return -EMSGSIZE;
+
+			d->type = KDBUS_MSG_DATA_VEC;
+			d->size = size;
+
+			if (ptr) {
+				d->vec.off = kmsg->pool_size;
+				iov->iov_base = ptr;
+				iov->iov_len = size;
+			} else {
+				d->vec.off = ~0ULL;
+				iov->iov_base = (char __user *)zeros;
+				iov->iov_len = size % 8;
+			}
+
+			if (kmsg->pool_size + iov->iov_len < kmsg->pool_size)
+				return -EMSGSIZE;
+
+			kmsg->pool_size += iov->iov_len;
+			++kmsg->iov_count;
+			++res->vec_count;
+			++res->data_count;
+			vec_size += size;
+
+			break;
+		}
+
+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
+			struct kdbus_msg_data *d = res->data + res->data_count;
+			u64 start = item->memfd.start;
+			u64 size = item->memfd.size;
+			size_t pad = size % 8;
+			int seals, mask;
+			struct file *f;
+
+			if (kmsg->pool_size + size % 8 < kmsg->pool_size)
+				return -EMSGSIZE;
+			if (start + size < start)
+				return -EMSGSIZE;
+
+			if (item->memfd.fd < 0)
+				return -EBADF;
+
+			f = fget(item->memfd.fd);
+			if (!f)
+				return -EBADF;
+
+			if (pad) {
+				iov->iov_base = (char __user *)zeros;
+				iov->iov_len = pad;
+
+				kmsg->pool_size += pad;
+				++kmsg->iov_count;
+			}
+
+			++res->data_count;
+			++res->memfd_count;
+
+			d->type = KDBUS_MSG_DATA_MEMFD;
+			d->size = size;
+			d->memfd.start = start;
+			d->memfd.file = f;
+
+			/*
+			 * We only accept a sealed memfd file whose content
+			 * cannot be altered by the sender or anybody else
+			 * while it is shared or in-flight. Other files need
+			 * to be passed with KDBUS_MSG_FDS.
+			 */
+			seals = shmem_get_seals(f);
+			if (seals < 0)
+				return -EMEDIUMTYPE;
+
+			mask = F_SEAL_SHRINK | F_SEAL_GROW |
+				F_SEAL_WRITE | F_SEAL_SEAL;
+			if ((seals & mask) != mask)
+				return -ETXTBSY;
+
+			if (start + size > (u64)i_size_read(file_inode(f)))
+				return -EBADF;
+
+			break;
+		}
+
+		case KDBUS_ITEM_FDS: {
+			unsigned int i;
+			unsigned int fds_count = payload_size / sizeof(int);
+
+			/* do not allow multiple fd arrays */
+			if (has_fds)
+				return -EEXIST;
+			has_fds = true;
+
+			/* Do not allow to broadcast file descriptors */
+			if (is_broadcast)
+				return -ENOTUNIQ;
+
+			if (fds_count > KDBUS_MSG_MAX_FDS)
+				return -EMFILE;
+
+			res->fds = kcalloc(fds_count, sizeof(struct file *),
+					   GFP_KERNEL);
+			if (!res->fds)
+				return -ENOMEM;
+
+			for (i = 0; i < fds_count; i++) {
+				int fd = item->fds[i];
+				int ret;
+
+				/*
+				 * Verify the fd and increment the usage count.
+				 * Use fget_raw() to allow passing O_PATH fds.
+				 */
+				if (fd < 0)
+					return -EBADF;
+
+				res->fds[i] = fget_raw(fd);
+				if (!res->fds[i])
+					return -EBADF;
+
+				res->fds_count++;
+
+				ret = kdbus_handle_check_file(res->fds[i]);
+				if (ret < 0)
+					return ret;
+			}
+
+			break;
+		}
+
+		case KDBUS_ITEM_BLOOM_FILTER: {
+			u64 bloom_size;
+
+			/* do not allow multiple bloom filters */
+			if (has_bloom)
+				return -EEXIST;
+			has_bloom = true;
+
+			bloom_size = payload_size -
+				     offsetof(struct kdbus_bloom_filter, data);
+
+			/*
+			* Allow only bloom filter sizes of a multiple of 64bit.
+			*/
+			if (!KDBUS_IS_ALIGNED8(bloom_size))
+				return -EFAULT;
+
+			/* do not allow mismatching bloom filter sizes */
+			if (bloom_size != bus->bloom.size)
+				return -EDOM;
+
+			kmsg->bloom_filter = &item->bloom_filter;
+			break;
+		}
+
+		case KDBUS_ITEM_DST_NAME:
+			/* do not allow multiple names */
+			if (has_name)
+				return -EEXIST;
+			has_name = true;
+
+			if (!kdbus_name_is_valid(item->str, false))
+				return -EINVAL;
+
+			res->dst_name = kstrdup(item->str, GFP_KERNEL);
+			if (!res->dst_name)
+				return -ENOMEM;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* name is needed if no ID is given */
+	if (msg->dst_id == KDBUS_DST_ID_NAME && !has_name)
+		return -EDESTADDRREQ;
+
+	if (is_broadcast) {
+		/* Broadcasts can't take names */
+		if (has_name)
+			return -EBADMSG;
+
+		/* All broadcasts have to be signals */
+		if (!is_signal)
+			return -EBADMSG;
+
+		/* Timeouts are not allowed for broadcasts */
+		if (msg->timeout_ns > 0)
+			return -ENOTUNIQ;
+	}
+
+	/*
+	 * Signal messages require a bloom filter, and bloom filters are
+	 * only valid with signals.
+	 */
+	if (is_signal ^ has_bloom)
+		return -EBADMSG;
+
+	return 0;
+}
+
+/**
+ * kdbus_kmsg_new_from_cmd() - create kernel message from send payload
+ * @conn:		Connection
+ * @buf:		The user-buffer location of @cmd
+ * @cmd_send:		Payload of KDBUS_CMD_SEND
+ *
+ * Return: a new kdbus_kmsg on success, ERR_PTR on failure.
+ */
+struct kdbus_kmsg *kdbus_kmsg_new_from_cmd(struct kdbus_conn *conn,
+					   void __user *buf,
+					   struct kdbus_cmd_send *cmd_send)
+{
+	struct kdbus_kmsg *m;
+	u64 size;
+	int ret;
+
+	ret = kdbus_copy_from_user(&size, KDBUS_PTR(cmd_send->msg_address),
+				   sizeof(size));
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	if (size < sizeof(struct kdbus_msg) || size > KDBUS_MSG_MAX_SIZE)
+		return ERR_PTR(-EINVAL);
+
+	m = kmalloc(size + KDBUS_KMSG_HEADER_SIZE, GFP_KERNEL);
+	if (!m)
+		return ERR_PTR(-ENOMEM);
+
+	memset(m, 0, KDBUS_KMSG_HEADER_SIZE);
+
+	m->proc_meta = kdbus_meta_proc_new();
+	if (IS_ERR(m->proc_meta)) {
+		ret = PTR_ERR(m->proc_meta);
+		m->proc_meta = NULL;
+		goto exit_free;
+	}
+
+	m->conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(m->conn_meta)) {
+		ret = PTR_ERR(m->conn_meta);
+		m->conn_meta = NULL;
+		goto exit_free;
+	}
+
+	if (copy_from_user(&m->msg, KDBUS_PTR(cmd_send->msg_address), size)) {
+		ret = -EFAULT;
+		goto exit_free;
+	}
+
+	if (m->msg.size != size) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+
+	ret = kdbus_check_and_write_flags(m->msg.flags, buf,
+					  offsetof(struct kdbus_cmd_send,
+						   kernel_msg_flags),
+					  KDBUS_MSG_EXPECT_REPLY	|
+					  KDBUS_MSG_NO_AUTO_START	|
+					  KDBUS_MSG_SIGNAL);
+	if (ret < 0)
+		goto exit_free;
+
+	ret = kdbus_items_validate(m->msg.items,
+				   KDBUS_ITEMS_SIZE(&m->msg, items));
+	if (ret < 0)
+		goto exit_free;
+
+	m->res = kdbus_msg_resources_new();
+	if (IS_ERR(m->res)) {
+		ret = PTR_ERR(m->res);
+		m->res = NULL;
+		goto exit_free;
+	}
+
+	/* do not accept kernel-generated messages */
+	if (m->msg.payload_type == KDBUS_PAYLOAD_KERNEL) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+
+	if (m->msg.flags & KDBUS_MSG_EXPECT_REPLY) {
+		/* requests for replies need timeout and cookie */
+		if (m->msg.timeout_ns == 0 || m->msg.cookie == 0) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+
+		/* replies may not be expected for broadcasts */
+		if (m->msg.dst_id == KDBUS_DST_ID_BROADCAST) {
+			ret = -ENOTUNIQ;
+			goto exit_free;
+		}
+
+		/* replies may not be expected for signals */
+		if (m->msg.flags & KDBUS_MSG_SIGNAL) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+	} else {
+		/*
+		 * KDBUS_SEND_SYNC_REPLY is only valid together with
+		 * KDBUS_MSG_EXPECT_REPLY
+		 */
+		if (cmd_send->flags & KDBUS_SEND_SYNC_REPLY) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+	}
+
+	ret = kdbus_msg_scan_items(m, conn->ep->bus);
+	if (ret < 0)
+		goto exit_free;
+
+	/* patch-in the source of this message */
+	if (m->msg.src_id > 0 && m->msg.src_id != conn->id) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+	m->msg.src_id = conn->id;
+
+	return m;
+
+exit_free:
+	kdbus_kmsg_free(m);
+	return ERR_PTR(ret);
+}
diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
new file mode 100644
index 000000000000..28f1893b002a
--- /dev/null
+++ b/ipc/kdbus/message.h
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MESSAGE_H
+#define __KDBUS_MESSAGE_H
+
+#include "util.h"
+#include "metadata.h"
+
+/**
+ * enum kdbus_msg_data_type - Type of kdbus_msg_data payloads
+ * @KDBUS_MSG_DATA_VEC:		Data vector provided by user-space
+ * @KDBUS_MSG_DATA_MEMFD:	Memfd payload
+ */
+enum kdbus_msg_data_type {
+	KDBUS_MSG_DATA_VEC,
+	KDBUS_MSG_DATA_MEMFD,
+};
+
+/**
+ * struct kdbus_msg_data - Data payload as stored by messages
+ * @type:	Type of payload (KDBUS_MSG_DATA_*)
+ * @size:	Size of the described payload
+ * @off:	The offset, relative to the vec slice
+ * @start:	Offset inside the memfd
+ * @file:	Backing file referenced by the memfd
+ */
+struct kdbus_msg_data {
+	unsigned int type;
+	u64 size;
+
+	union {
+		struct {
+			u64 off;
+		} vec;
+		struct {
+			u64 start;
+			struct file *file;
+		} memfd;
+	};
+};
+
+/**
+ * struct kdbus_kmsg_resources - resources of a message
+ * @kref:		Reference counter
+ * @dst_name:		Short-cut to msg for faster lookup
+ * @fds:		Array of file descriptors to pass
+ * @fds_count:		Number of file descriptors to pass
+ * @data:		Array of data payloads
+ * @vec_count:		Number of VEC entries
+ * @memfd_count:	Number of MEMFD entries in @data
+ * @data_count:		Sum of @vec_count + @memfd_count
+ */
+struct kdbus_msg_resources {
+	struct kref kref;
+	const char *dst_name;
+
+	struct file **fds;
+	unsigned int fds_count;
+
+	struct kdbus_msg_data *data;
+	size_t vec_count;
+	size_t memfd_count;
+	size_t data_count;
+};
+
+struct kdbus_msg_resources *
+kdbus_msg_resources_ref(struct kdbus_msg_resources *r);
+struct kdbus_msg_resources *
+kdbus_msg_resources_unref(struct kdbus_msg_resources *r);
+
+/**
+ * struct kdbus_kmsg - internal message handling data
+ * @seq:		Domain-global message sequence number
+ * @notify_type:	Short-cut for faster lookup
+ * @notify_old_id:	Short-cut for faster lookup
+ * @notify_new_id:	Short-cut for faster lookup
+ * @notify_name:	Short-cut for faster lookup
+ * @dst_name_id:	Short-cut to msg for faster lookup
+ * @bloom_filter:	Bloom filter to match message properties
+ * @bloom_generation:	Generation of bloom element set
+ * @notify_entry:	List of kernel-generated notifications
+ * @iov:		Array of iovec, describing the payload to copy
+ * @iov_count:		Number of array members in @iov
+ * @pool_size:		Overall size of inlined data referenced by @iov
+ * @proc_meta:		Appended SCM-like metadata of the sending process
+ * @conn_meta:		Appended SCM-like metadata of the sending connection
+ * @res:		Message resources
+ * @msg:		Message from or to userspace
+ */
+struct kdbus_kmsg {
+	u64 seq;
+	u64 notify_type;
+	u64 notify_old_id;
+	u64 notify_new_id;
+	const char *notify_name;
+
+	u64 dst_name_id;
+	const struct kdbus_bloom_filter *bloom_filter;
+	u64 bloom_generation;
+	struct list_head notify_entry;
+
+	struct iovec *iov;
+	size_t iov_count;
+	u64 pool_size;
+
+	struct kdbus_meta_proc *proc_meta;
+	struct kdbus_meta_conn *conn_meta;
+	struct kdbus_msg_resources *res;
+
+	/* variable size, must be the last member */
+	struct kdbus_msg msg;
+};
+
+struct kdbus_conn;
+
+struct kdbus_kmsg *kdbus_kmsg_new(size_t extra_size);
+struct kdbus_kmsg *kdbus_kmsg_new_from_cmd(struct kdbus_conn *conn,
+					   void __user *buf,
+					   struct kdbus_cmd_send *cmd_send);
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg);
+
+#endif
diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
new file mode 100644
index 000000000000..53ab51a0f791
--- /dev/null
+++ b/ipc/kdbus/queue.c
@@ -0,0 +1,505 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uio.h>
+
+#include "util.h"
+#include "domain.h"
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "queue.h"
+#include "reply.h"
+
+/**
+ * kdbus_queue_entry_add() - Add an queue entry to a queue
+ * @queue:	The queue to attach the item to
+ * @entry:	The entry to attach
+ *
+ * Adds a previously allocated queue item to a queue, and maintains the
+ * priority r/b tree.
+ */
+/* add queue entry to connection, maintain priority queue */
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+			   struct kdbus_queue_entry *entry)
+{
+	struct rb_node **n, *pn = NULL;
+	bool highest = true;
+
+	/* sort into priority entry tree */
+	n = &queue->msg_prio_queue.rb_node;
+	while (*n) {
+		struct kdbus_queue_entry *e;
+
+		pn = *n;
+		e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
+
+		/* existing node for this priority, add to its list */
+		if (likely(entry->msg.priority == e->msg.priority)) {
+			list_add_tail(&entry->prio_entry, &e->prio_entry);
+			goto prio_done;
+		}
+
+		if (entry->msg.priority < e->msg.priority) {
+			n = &pn->rb_left;
+		} else {
+			n = &pn->rb_right;
+			highest = false;
+		}
+	}
+
+	/* cache highest-priority entry */
+	if (highest)
+		queue->msg_prio_highest = &entry->prio_node;
+
+	/* new node for this priority */
+	rb_link_node(&entry->prio_node, pn, n);
+	rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
+	INIT_LIST_HEAD(&entry->prio_entry);
+
+prio_done:
+	/* add to unsorted fifo list */
+	list_add_tail(&entry->entry, &queue->msg_list);
+	queue->msg_count++;
+}
+
+/**
+ * kdbus_queue_entry_peek() - Retrieves an entry from a queue
+ *
+ * @queue:		The queue
+ * @priority:		The minimum priority of the entry to peek
+ * @use_priority:	Boolean flag whether or not to peek by priority
+ *
+ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
+ * The entry is not freed, put off the queue's lists or anything else.
+ *
+ * Return: the peeked queue entry on success, ERR_PTR(-ENOMSG) if there is no
+ * entry with the requested priority, or ERR_PTR(-EAGAIN) if there are no
+ * entries at all.
+ */
+struct kdbus_queue_entry *kdbus_queue_entry_peek(struct kdbus_queue *queue,
+						 s64 priority,
+						 bool use_priority)
+{
+	struct kdbus_queue_entry *e;
+
+	if (queue->msg_count == 0)
+		return ERR_PTR(-EAGAIN);
+
+	if (use_priority) {
+		/* get next entry with highest priority */
+		e = rb_entry(queue->msg_prio_highest,
+			     struct kdbus_queue_entry, prio_node);
+
+		/* no entry with the requested priority */
+		if (e->msg.priority > priority)
+			return ERR_PTR(-ENOMSG);
+	} else {
+		/* ignore the priority, return the next entry in the entry */
+		e = list_first_entry(&queue->msg_list,
+				     struct kdbus_queue_entry, entry);
+	}
+
+	return e;
+}
+
+/**
+ * kdbus_queue_entry_remove() - Remove an entry from a queue
+ * @conn:	The connection containing the queue
+ * @entry:	The entry to remove
+ *
+ * Remove an entry from both the queue's list and the priority r/b tree.
+ */
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+			      struct kdbus_queue_entry *entry)
+{
+	struct kdbus_queue *queue = &conn->queue;
+
+	list_del(&entry->entry);
+	queue->msg_count--;
+
+	/* user quota */
+	if (entry->user) {
+		BUG_ON(conn->msg_users[entry->user->idr] == 0);
+		conn->msg_users[entry->user->idr]--;
+		entry->user = kdbus_domain_user_unref(entry->user);
+	}
+
+	/* the queue is empty, remove the user quota accounting */
+	if (queue->msg_count == 0 && conn->msg_users_max > 0) {
+		kfree(conn->msg_users);
+		conn->msg_users = NULL;
+		conn->msg_users_max = 0;
+	}
+
+	if (list_empty(&entry->prio_entry)) {
+		/*
+		 * Single entry for this priority, update cached
+		 * highest-priority entry, remove the tree node.
+		 */
+		if (queue->msg_prio_highest == &entry->prio_node)
+			queue->msg_prio_highest = rb_next(&entry->prio_node);
+
+		rb_erase(&entry->prio_node, &queue->msg_prio_queue);
+	} else {
+		struct kdbus_queue_entry *q;
+
+		/*
+		 * Multiple entries for this priority entry, get next one in
+		 * the list. Update cached highest-priority entry, store the
+		 * new one as the tree node.
+		 */
+		q = list_first_entry(&entry->prio_entry,
+				     struct kdbus_queue_entry, prio_entry);
+		list_del(&entry->prio_entry);
+
+		if (queue->msg_prio_highest == &entry->prio_node)
+			queue->msg_prio_highest = &q->prio_node;
+
+		rb_replace_node(&entry->prio_node, &q->prio_node,
+				&queue->msg_prio_queue);
+	}
+}
+
+/**
+ * kdbus_queue_entry_alloc() - allocate a queue entry
+ * @pool:	The pool to allocate the slice in
+ * @kmsg:	The kmsg object the queue entry should track
+ *
+ * Allocates a queue entry based on a given kmsg and allocate space for
+ * the message payload and the requested metadata in the connection's pool.
+ * The entry is not actually added to the queue's lists at this point.
+ *
+ * Return: the allocated entry on success, or an ERR_PTR on failures.
+ */
+struct kdbus_queue_entry *kdbus_queue_entry_alloc(struct kdbus_pool *pool,
+						  const struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_msg_resources *res = kmsg->res;
+	const struct kdbus_msg *msg = &kmsg->msg;
+	struct kdbus_queue_entry *entry;
+	int ret = 0;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&entry->entry);
+	entry->msg_res = kdbus_msg_resources_ref(res);
+	entry->proc_meta = kdbus_meta_proc_ref(kmsg->proc_meta);
+	entry->conn_meta = kdbus_meta_conn_ref(kmsg->conn_meta);
+	memcpy(&entry->msg, msg, sizeof(*msg));
+
+	if (kmsg->iov_count) {
+		size_t pool_avail = kdbus_pool_remain(pool);
+
+		/* do not give out more than half of the remaining space */
+		if (kmsg->pool_size < pool_avail &&
+		    kmsg->pool_size > pool_avail / 2) {
+			ret = -EXFULL;
+			goto exit_free_entry;
+		}
+
+		/* allocate the needed space in the pool of the receiver */
+		entry->slice_vecs = kdbus_pool_slice_alloc(pool,
+							   kmsg->pool_size,
+							   NULL, kmsg->iov,
+							   kmsg->iov_count);
+		if (IS_ERR(entry->slice_vecs)) {
+			ret = PTR_ERR(entry->slice_vecs);
+			entry->slice_vecs = NULL;
+			goto exit_free_entry;
+		}
+	}
+
+	if (msg->src_id == KDBUS_SRC_ID_KERNEL) {
+		size_t extra_size = msg->size - sizeof(*msg);
+
+		entry->msg_extra = kmemdup((u8 *)msg + sizeof(*msg),
+					   extra_size, GFP_KERNEL);
+		if (!entry->msg_extra) {
+			ret = -ENOMEM;
+			goto exit_free_slice;
+		}
+
+		entry->msg_extra_size = extra_size;
+	}
+
+	return entry;
+
+exit_free_slice:
+	kdbus_pool_slice_release(entry->slice_vecs);
+exit_free_entry:
+	kdbus_queue_entry_free(entry);
+	return ERR_PTR(ret);
+}
+
+static struct kdbus_item *
+kdbus_msg_make_items(const struct kdbus_msg_resources *res, off_t payload_off,
+		     bool install_fds, u64 *return_flags, size_t *out_size)
+{
+	struct kdbus_item *items, *item;
+	bool incomplete_fds = false;
+	size_t i, size = 0;
+
+	/* sum up how much space we need for the 'control' part */
+	size += res->vec_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+	size += res->memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+
+	if (res->fds_count)
+		size += KDBUS_ITEM_SIZE(sizeof(int) * res->fds_count);
+
+	if (res->dst_name)
+		size += KDBUS_ITEM_SIZE(strlen(res->dst_name) + 1);
+
+	items = kzalloc(size, GFP_KERNEL);
+	if (!items)
+		return ERR_PTR(-ENOMEM);
+
+	item = items;
+
+	if (res->dst_name) {
+		kdbus_item_set(item, KDBUS_ITEM_DST_NAME,
+			       res->dst_name, strlen(res->dst_name) + 1);
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	for (i = 0; i < res->data_count; ++i) {
+		struct kdbus_msg_data *d = res->data + i;
+		struct kdbus_memfd m = {};
+		struct kdbus_vec v = {};
+
+		switch (d->type) {
+		case KDBUS_MSG_DATA_VEC:
+			v.size = d->size;
+			v.offset = d->vec.off;
+			if (v.offset != ~0ULL)
+				v.offset += payload_off;
+
+			kdbus_item_set(item, KDBUS_ITEM_PAYLOAD_OFF,
+				       &v, sizeof(v));
+			item = KDBUS_ITEM_NEXT(item);
+			break;
+
+		case KDBUS_MSG_DATA_MEMFD:
+			m.start = d->memfd.start;
+			m.size = d->size;
+			m.fd = -1;
+			if (install_fds) {
+				m.fd = get_unused_fd_flags(O_CLOEXEC);
+				if (m.fd >= 0)
+					fd_install(m.fd,
+						   get_file(d->memfd.file));
+				else
+					incomplete_fds = true;
+			}
+
+			kdbus_item_set(item, KDBUS_ITEM_PAYLOAD_MEMFD,
+				       &m, sizeof(m));
+			item = KDBUS_ITEM_NEXT(item);
+			break;
+		}
+	}
+
+	if (res->fds_count) {
+		kdbus_item_set(item, KDBUS_ITEM_FDS,
+			       NULL, sizeof(int) * res->fds_count);
+		for (i = 0; i < res->fds_count; i++) {
+			if (install_fds) {
+				item->fds[i] = get_unused_fd_flags(O_CLOEXEC);
+				if (item->fds[i] >= 0)
+					fd_install(item->fds[i],
+						   get_file(res->fds[i]));
+				else
+					incomplete_fds = true;
+			} else {
+				item->fds[i] = -1;
+			}
+		}
+
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	/* Make sure the sizes actually match */
+	BUG_ON((u8 *)item != (u8 *)items + size);
+
+	if (incomplete_fds)
+		*return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
+
+	*out_size = size;
+	return items;
+}
+
+/**
+ * kdbus_queue_entry_install() - install message components into the
+ *				 receiver's process
+ * @entry:		The queue entry to install
+ * @conn_dst:		The receiver connection
+ * @return_flags:	Pointer to store the return flags for userspace
+ * @install_fds:	Whether or not to install associated file descriptors
+ *
+ * This function will create a slice to transport the message header, the
+ * metadata items and other items for information stored in @entry, and
+ * store it as entry->slice.
+ *
+ * If @install_fds is %true, file descriptors will as well be installed.
+ * This function must always be called from the task context of the receiver.
+ *
+ * Return: 0 on success.
+ */
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
+			      struct kdbus_conn *conn_dst,
+			      u64 *return_flags, bool install_fds)
+{
+	size_t meta_size = 0, payload_items_size = 0;
+	struct kdbus_item *payload_items = NULL;
+	struct kdbus_item *meta_items = NULL;
+	off_t payload_off = 0;
+	struct kvec kvec[4];
+	size_t kvec_count = 0;
+	int ret = 0;
+
+	if (entry->proc_meta || entry->conn_meta) {
+		u64 attach_flags = atomic64_read(&conn_dst->attach_flags_recv);
+
+		meta_items = kdbus_meta_export(entry->proc_meta,
+					       entry->conn_meta,
+					       attach_flags,
+					       &meta_size);
+		if (IS_ERR(meta_items)) {
+			ret = PTR_ERR(meta_items);
+			meta_items = NULL;
+			goto exit_free;
+		}
+	}
+
+	/*
+	 * The offsets stored in the slice are relative to the the start
+	 * of the payload slice. When exporting them, they need to become
+	 * relative to the pool, so get the payload slice's offset first.
+	 */
+	if (entry->slice_vecs)
+		payload_off = kdbus_pool_slice_offset(entry->slice_vecs);
+
+	if (entry->msg_res) {
+		payload_items = kdbus_msg_make_items(entry->msg_res,
+						     payload_off,
+						     install_fds, return_flags,
+						     &payload_items_size);
+		if (IS_ERR(payload_items)) {
+			ret = PTR_ERR(payload_items);
+			payload_items = NULL;
+			goto exit_free;
+		}
+	}
+
+	entry->msg.size = 0;
+
+	kdbus_kvec_set(&kvec[kvec_count++], &entry->msg, sizeof(entry->msg),
+		       &entry->msg.size);
+
+	if (entry->msg_extra_size)
+		kdbus_kvec_set(&kvec[kvec_count++], entry->msg_extra,
+			       entry->msg_extra_size, &entry->msg.size);
+
+	if (payload_items_size)
+		kdbus_kvec_set(&kvec[kvec_count++], payload_items,
+			       payload_items_size, &entry->msg.size);
+
+	if (meta_size)
+		kdbus_kvec_set(&kvec[kvec_count++], meta_items, meta_size,
+			       &entry->msg.size);
+
+	entry->slice = kdbus_pool_slice_alloc(conn_dst->pool, entry->msg.size,
+					      kvec, NULL, kvec_count);
+	if (IS_ERR(entry->slice)) {
+		ret = PTR_ERR(entry->slice);
+		entry->slice = NULL;
+		goto exit_free;
+	}
+
+	kdbus_pool_slice_set_child(entry->slice, entry->slice_vecs);
+
+exit_free:
+	kfree(payload_items);
+	kfree(meta_items);
+
+	return ret;
+}
+
+/**
+ * kdbus_queue_entry_move() - move an entry from one queue to another
+ * @conn_dst:	Connection holding the queue to copy to
+ * @entry:	The queue entry to move
+ *
+ * Return: 0 on success, nagative error otherwise
+ */
+int kdbus_queue_entry_move(struct kdbus_conn *conn_dst,
+			   struct kdbus_queue_entry *entry)
+{
+	int ret = 0;
+
+	if (entry->slice_vecs)
+		ret = kdbus_pool_slice_move(conn_dst->pool, &entry->slice_vecs);
+
+	if (ret < 0)
+		kdbus_queue_entry_free(entry);
+	else
+		kdbus_queue_entry_add(&conn_dst->queue, entry);
+
+	return ret;
+}
+
+/**
+ * kdbus_queue_entry_free() - free resources of an entry
+ * @entry:	The entry to free
+ *
+ * Removes resources allocated by a queue entry, along with the entry itself.
+ * Note that the entry's slice is not freed at this point.
+ */
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
+{
+	kdbus_msg_resources_unref(entry->msg_res);
+	kdbus_meta_conn_unref(entry->conn_meta);
+	kdbus_meta_proc_unref(entry->proc_meta);
+	kdbus_reply_unref(entry->reply);
+	kfree(entry->msg_extra);
+	kfree(entry);
+}
+
+/**
+ * kdbus_queue_init() - initialize data structure related to a queue
+ * @queue:	The queue to initialize
+ */
+void kdbus_queue_init(struct kdbus_queue *queue)
+{
+	INIT_LIST_HEAD(&queue->msg_list);
+	queue->msg_prio_queue = RB_ROOT;
+}
diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
new file mode 100644
index 000000000000..8e9961fd3ecd
--- /dev/null
+++ b/ipc/kdbus/queue.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_QUEUE_H
+#define __KDBUS_QUEUE_H
+
+struct kdbus_domain_user;
+
+/**
+ * struct kdbus_queue - a connection's message queue
+ * @msg_count		Number of messages in the queue
+ * @msg_list:		List head for kdbus_queue_entry objects
+ * @msg_prio_queue:	RB tree root for messages, sorted by priority
+ * @msg_prio_highest:	Link to the RB node referencing the message with the
+ *			highest priority in the tree.
+ */
+struct kdbus_queue {
+	size_t msg_count;
+	struct list_head msg_list;
+	struct rb_root msg_prio_queue;
+	struct rb_node *msg_prio_highest;
+};
+
+/**
+ * struct kdbus_queue_entry - messages waiting to be read
+ * @entry:		Entry in the connection's list
+ * @prio_node:		Entry in the priority queue tree
+ * @prio_entry:		Queue tree node entry in the list of one priority
+ * @msg:		Message header, either as received from userspace
+ *			process, or as crafted by the kernel as notification
+ * @msg_extra:		For notifications, contains more fixed parts of a
+ *			message, which will be copied to the final message
+ *			slice verbatim.
+ * @slice:		Slice in the receiver's pool for the message
+ * @slice_vecs:		Slice in the receiver's pool for message payload
+ * @memfds:		Arrays of offsets where to update the installed
+ *			fd number
+ * @dst_name:		Destination well-known-name
+ * @vecs:		Array of struct kdbus_queue_vecs
+ * @vec_count:		Number of elements in @vecs
+ * @memfds_fp:		Array memfd files queued up for this message
+ * @memfd_size:		Array of size_t values, describing the sizes of memfds
+ * @memfds_count:	Number of elements in @memfds_fp
+ * @fds_fp:		Array of passed files queued up for this message
+ * @fds_count:		Number of elements in @fds_fp
+ * @dst_name_id:	The sequence number of the name this message is
+ *			addressed to, 0 for messages sent to an ID
+ * @proc_meta:		Process metadata, captured at message arrival
+ * @conn_meta:		Connection metadata, captured at message arrival
+ * @reply:		The reply block if a reply to this message is expected.
+ * @user:		Index in per-user message counter, -1 for unused
+ */
+struct kdbus_queue_entry {
+	struct list_head entry;
+	struct rb_node prio_node;
+	struct list_head prio_entry;
+
+	struct kdbus_msg msg;
+
+	char *msg_extra;
+	size_t msg_extra_size;
+
+	struct kdbus_pool_slice *slice;
+	struct kdbus_pool_slice *slice_vecs;
+
+	u64 dst_name_id;
+
+	struct kdbus_msg_resources *msg_res;
+	struct kdbus_meta_proc *proc_meta;
+	struct kdbus_meta_conn *conn_meta;
+	struct kdbus_reply *reply;
+	struct kdbus_domain_user *user;
+};
+
+struct kdbus_kmsg;
+
+void kdbus_queue_init(struct kdbus_queue *queue);
+
+struct kdbus_queue_entry *
+kdbus_queue_entry_alloc(struct kdbus_pool *pool,
+			const struct kdbus_kmsg *kmsg);
+int kdbus_queue_entry_move(struct kdbus_conn *conn_dst,
+			   struct kdbus_queue_entry *entry);
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
+
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+			   struct kdbus_queue_entry *entry);
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+			      struct kdbus_queue_entry *entry);
+struct kdbus_queue_entry *kdbus_queue_entry_peek(struct kdbus_queue *queue,
+						 s64 priority,
+						 bool use_priority);
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
+			      struct kdbus_conn *conn_dst,
+			      u64 *return_flags, bool install_fds);
+
+#endif /* __KDBUS_QUEUE_H */
diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
new file mode 100644
index 000000000000..9e3559d1ed4a
--- /dev/null
+++ b/ipc/kdbus/reply.c
@@ -0,0 +1,262 @@
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "message.h"
+#include "metadata.h"
+#include "domain.h"
+#include "item.h"
+#include "notify.h"
+#include "policy.h"
+#include "reply.h"
+#include "util.h"
+
+/**
+ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
+ * @reply_src:		The connection a reply is expected from
+ * @reply_dst:		The connection this reply object belongs to
+ * @msg:		Message associated with the reply
+ * @name_entry:		Name entry used to send the message
+ * @sync:		Whether or not to make this reply synchronous
+ *
+ * Allocate and fill a new kdbus_reply object.
+ *
+ * Return: New kdbus_conn object on success, ERR_PTR on error.
+ */
+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
+				    struct kdbus_conn *reply_dst,
+				    const struct kdbus_msg *msg,
+				    struct kdbus_name_entry *name_entry,
+				    bool sync)
+{
+	struct kdbus_reply *r;
+	int ret = 0;
+
+	if (atomic_inc_return(&reply_dst->request_count) >
+	    KDBUS_CONN_MAX_REQUESTS_PENDING) {
+		ret = -EMLINK;
+		goto exit_dec_request_count;
+	}
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r) {
+		ret = -ENOMEM;
+		goto exit_dec_request_count;
+	}
+
+	kref_init(&r->kref);
+	INIT_LIST_HEAD(&r->entry);
+	r->reply_src = kdbus_conn_ref(reply_src);
+	r->reply_dst = kdbus_conn_ref(reply_dst);
+	r->cookie = msg->cookie;
+	r->name_id = name_entry ? name_entry->name_id : 0;
+	r->deadline_ns = msg->timeout_ns;
+
+	if (sync) {
+		r->sync = true;
+		r->waiting = true;
+	}
+
+exit_dec_request_count:
+	if (ret < 0) {
+		atomic_dec(&reply_dst->request_count);
+		return ERR_PTR(ret);
+	}
+
+	return r;
+}
+
+static void __kdbus_reply_free(struct kref *kref)
+{
+	struct kdbus_reply *reply =
+		container_of(kref, struct kdbus_reply, kref);
+
+	atomic_dec(&reply->reply_dst->request_count);
+	kdbus_conn_unref(reply->reply_src);
+	kdbus_conn_unref(reply->reply_dst);
+	kfree(reply);
+}
+
+/**
+ * kdbus_reply_ref() - Increase reference on kdbus_reply
+ * @r:		The reply, may be %NULL
+ *
+ * Return: The reply object with an extra reference
+ */
+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
+{
+	if (r)
+		kref_get(&r->kref);
+	return r;
+}
+
+/**
+ * kdbus_reply_unref() - Decrease reference on kdbus_reply
+ * @r:		The reply, may be %NULL
+ *
+ * Return: NULL
+ */
+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
+{
+	if (r)
+		kref_put(&r->kref, __kdbus_reply_free);
+	return NULL;
+}
+
+/**
+ * kdbus_reply_link() - Link reply object into target connection
+ * @r:		Reply to link
+ */
+void kdbus_reply_link(struct kdbus_reply *r)
+{
+	if (WARN_ON(!list_empty(&r->entry)))
+		return;
+
+	list_add(&r->entry, &r->reply_dst->reply_list);
+	kdbus_reply_ref(r);
+}
+
+/**
+ * kdbus_reply_unlink() - Unlink reply object from target connection
+ * @r:		Reply to unlink
+ */
+void kdbus_reply_unlink(struct kdbus_reply *r)
+{
+	if (!list_empty(&r->entry)) {
+		list_del_init(&r->entry);
+		kdbus_reply_unref(r);
+	}
+}
+
+/**
+ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
+ * @reply:	The reply object
+ * @err:	Error code to set on the remote side
+ *
+ * Remove the synchronous reply object from its connection reply_list, and
+ * wake up remote peer (method origin) with the appropriate synchronous reply
+ * code.
+ */
+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
+{
+	if (WARN_ON(!reply->sync))
+		return;
+
+	reply->waiting = false;
+	reply->err = err;
+	wake_up_interruptible(&reply->reply_dst->wait);
+}
+
+/**
+ * kdbus_reply_find() - Find the corresponding reply object
+ * @replying:	The replying connection
+ * @reply_dst:	The connection the reply will be sent to
+ *		(method origin)
+ * @cookie:	The cookie of the requesting message
+ *
+ * Lookup a reply object that should be sent as a reply by
+ * @replying to @reply_dst with the given cookie.
+ *
+ * For optimizations, callers should first check 'request_count' of
+ * @reply_dst to see if the connection has issued any requests
+ * that are waiting for replies, before calling this function.
+ *
+ * Callers must take the @reply_dst lock.
+ *
+ * Return: the corresponding reply object or NULL if not found
+ */
+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
+				     struct kdbus_conn *reply_dst,
+				     u64 cookie)
+{
+	struct kdbus_reply *r, *reply = NULL;
+
+	list_for_each_entry(r, &reply_dst->reply_list, entry) {
+		if (r->reply_src == replying &&
+		    r->cookie == cookie) {
+			reply = r;
+			break;
+		}
+	}
+
+	return reply;
+}
+
+/**
+ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
+ *				  connection for exceeded timeouts
+ * @work:		Work struct of the connection to scan
+ *
+ * Walk the list of replies stored with a connection and look for entries
+ * that have exceeded their timeout. If such an entry is found, a timeout
+ * notification is sent to the waiting peer, and the reply is removed from
+ * the list.
+ *
+ * The work is rescheduled to the nearest timeout found during the list
+ * iteration.
+ */
+void kdbus_reply_list_scan_work(struct work_struct *work)
+{
+	struct kdbus_conn *conn =
+		container_of(work, struct kdbus_conn, work.work);
+	struct kdbus_reply *reply, *reply_tmp;
+	u64 deadline = ~0ULL;
+	struct timespec64 ts;
+	u64 now;
+
+	ktime_get_ts64(&ts);
+	now = timespec64_to_ns(&ts);
+
+	mutex_lock(&conn->lock);
+	if (!kdbus_conn_active(conn)) {
+		mutex_unlock(&conn->lock);
+		return;
+	}
+
+	list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
+		/*
+		 * If the reply block is waiting for synchronous I/O,
+		 * the timeout is handled by wait_event_*_timeout(),
+		 * so we don't have to care for it here.
+		 */
+		if (reply->sync && !reply->interrupted)
+			continue;
+
+		WARN_ON(reply->reply_dst != conn);
+
+		if (reply->deadline_ns > now) {
+			/* remember next timeout */
+			if (deadline > reply->deadline_ns)
+				deadline = reply->deadline_ns;
+
+			continue;
+		}
+
+		/*
+		 * A zero deadline means the connection died, was
+		 * cleaned up already and the notification was sent.
+		 * Don't send notifications for reply trackers that were
+		 * left in an interrupted syscall state.
+		 */
+		if (reply->deadline_ns != 0 && !reply->interrupted)
+			kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
+						   reply->cookie);
+
+		kdbus_reply_unlink(reply);
+	}
+
+	/* rearm delayed work with next timeout */
+	if (deadline != ~0ULL)
+		schedule_delayed_work(&conn->work,
+				      nsecs_to_jiffies(deadline - now));
+
+	mutex_unlock(&conn->lock);
+
+	kdbus_notify_flush(conn->ep->bus);
+}
diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
new file mode 100644
index 000000000000..7cecea210bf5
--- /dev/null
+++ b/ipc/kdbus/reply.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_REPLY_H
+#define __KDBUS_REPLY_H
+
+/**
+ * struct kdbus_reply - an entry of kdbus_conn's list of replies
+ * @kref:		Ref-count of this object
+ * @entry:		The entry of the connection's reply_list
+ * @reply_src:		The connection the reply will be sent from
+ * @reply_dst:		The connection the reply will be sent to
+ * @queue_entry:	The queue entry item that is prepared by the replying
+ *			connection
+ * @deadline_ns:	The deadline of the reply, in nanoseconds
+ * @cookie:		The cookie of the requesting message
+ * @name_id:		ID of the well-known name the original msg was sent to
+ * @sync:		The reply block is waiting for synchronous I/O
+ * @waiting:		The condition to synchronously wait for
+ * @interrupted:	The sync reply was left in an interrupted state
+ * @err:		The error code for the synchronous reply
+ */
+struct kdbus_reply {
+	struct kref kref;
+	struct list_head entry;
+	struct kdbus_conn *reply_src;
+	struct kdbus_conn *reply_dst;
+	struct kdbus_queue_entry *queue_entry;
+	u64 deadline_ns;
+	u64 cookie;
+	u64 name_id;
+	bool sync:1;
+	bool waiting:1;
+	bool interrupted:1;
+	int err;
+};
+
+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
+				    struct kdbus_conn *reply_dst,
+				    const struct kdbus_msg *msg,
+				    struct kdbus_name_entry *name_entry,
+				    bool sync);
+
+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
+
+void kdbus_reply_link(struct kdbus_reply *r);
+void kdbus_reply_unlink(struct kdbus_reply *r);
+
+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
+				     struct kdbus_conn *reply_dst,
+				     u64 cookie);
+
+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
+void kdbus_reply_list_scan_work(struct work_struct *work);
+
+#endif /* __KDBUS_REPLY_H */
diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
index 33d31f6274e0..241bbcc1c19f 100644
--- a/ipc/kdbus/util.h
+++ b/ipc/kdbus/util.h
@@ -19,7 +19,7 @@
 #include <linux/ioctl.h>
 #include <linux/uidgid.h>
 
-#include "kdbus.h"
+#include <uapi/linux/kdbus.h>
 
 /* all exported addresses are 64 bit */
 #define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 05/13] kdbus: add connection, queue handling and message validation code
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.

Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
 ipc/kdbus/connection.c | 2004 ++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/connection.h |  262 +++++++
 ipc/kdbus/item.c       |  309 ++++++++
 ipc/kdbus/item.h       |   57 ++
 ipc/kdbus/message.c    |  598 +++++++++++++++
 ipc/kdbus/message.h    |  133 ++++
 ipc/kdbus/queue.c      |  505 ++++++++++++
 ipc/kdbus/queue.h      |  108 +++
 ipc/kdbus/reply.c      |  262 +++++++
 ipc/kdbus/reply.h      |   68 ++
 ipc/kdbus/util.h       |    2 +-
 11 files changed, 4307 insertions(+), 1 deletion(-)
 create mode 100644 ipc/kdbus/connection.c
 create mode 100644 ipc/kdbus/connection.h
 create mode 100644 ipc/kdbus/item.c
 create mode 100644 ipc/kdbus/item.h
 create mode 100644 ipc/kdbus/message.c
 create mode 100644 ipc/kdbus/message.h
 create mode 100644 ipc/kdbus/queue.c
 create mode 100644 ipc/kdbus/queue.h
 create mode 100644 ipc/kdbus/reply.c
 create mode 100644 ipc/kdbus/reply.h

diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
new file mode 100644
index 000000000000..75e2ea161a0e
--- /dev/null
+++ b/ipc/kdbus/connection.c
@@ -0,0 +1,2004 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fs_struct.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/path.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "domain.h"
+#include "item.h"
+#include "notify.h"
+#include "policy.h"
+#include "pool.h"
+#include "reply.h"
+#include "util.h"
+#include "queue.h"
+
+#define KDBUS_CONN_ACTIVE_BIAS	(INT_MIN + 2)
+#define KDBUS_CONN_ACTIVE_NEW	(INT_MIN + 1)
+
+/*
+ * Check for maximum number of messages per individual user. This
+ * should prevent a single user from being able to fill the receiver's
+ * queue.
+ */
+static int kdbus_conn_queue_user_quota(const struct kdbus_conn *conn_src,
+				       struct kdbus_conn *conn_dst,
+				       struct kdbus_queue_entry *entry)
+{
+	struct kdbus_domain_user *user;
+
+	/*
+	 * When the kernel is the sender we do not do per user
+	 * accouting, instead we just count how many messages have
+	 * been queued and we check the quota limit when inserting
+	 * message into the receiver queue.
+	 */
+	if (!conn_src)
+		return 0;
+
+	/*
+	 * Per-user accounting can be expensive if we have many different
+	 * users on the bus. Allow one set of messages to pass through
+	 * un-accounted. Only once we hit that limit, we start accounting.
+	 */
+	if (conn_dst->queue.msg_count < KDBUS_CONN_MAX_MSGS_UNACCOUNTED)
+		return 0;
+
+	user = conn_src->user;
+
+	/* extend array to store the user message counters */
+	if (user->idr >= conn_dst->msg_users_max) {
+		unsigned int *users;
+		unsigned int i;
+
+		i = 8 + KDBUS_ALIGN8(user->idr);
+		users = krealloc(conn_dst->msg_users, i * sizeof(unsigned int),
+				 GFP_KERNEL | __GFP_ZERO);
+		if (!users)
+			return -ENOMEM;
+
+		conn_dst->msg_users = users;
+		conn_dst->msg_users_max = i;
+	}
+
+	if (conn_dst->msg_users[user->idr] >= KDBUS_CONN_MAX_MSGS_PER_USER)
+		return -ENOBUFS;
+
+	conn_dst->msg_users[user->idr]++;
+	entry->user = kdbus_domain_user_ref(user);
+	return 0;
+}
+
+/**
+ * kdbus_cmd_msg_recv() - receive a message from the queue
+ * @conn:		Connection to work on
+ * @recv:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+		       struct kdbus_cmd_recv *recv)
+{
+	bool install = !(recv->flags & KDBUS_RECV_PEEK);
+	struct kdbus_queue_entry *entry = NULL;
+	unsigned int lost_count;
+	int ret = 0;
+
+	if (recv->msg.offset > 0)
+		return -EINVAL;
+
+	mutex_lock(&conn->lock);
+	entry = kdbus_queue_entry_peek(&conn->queue, recv->priority,
+				       recv->flags & KDBUS_RECV_USE_PRIORITY);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto exit_unlock;
+	}
+
+	/*
+	 * Make sure to never install fds into a connection that has
+	 * refused to receive any.
+	 */
+	if (WARN_ON(!(conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
+	    entry->msg_res && entry->msg_res->fds_count > 0)) {
+		ret = -EINVAL;
+		goto exit_unlock;
+	}
+
+	/* just drop the message */
+	if (recv->flags & KDBUS_RECV_DROP) {
+		struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
+
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+
+		mutex_unlock(&conn->lock);
+
+		if (reply) {
+			/*
+			 * See if the reply object is still linked in
+			 * reply_dst, and kill it. Notify the waiting peer
+			 * that there won't be an answer (-EPIPE).
+			 */
+			mutex_lock(&reply->reply_dst->lock);
+			if (!list_empty(&reply->entry)) {
+				kdbus_reply_unlink(reply);
+				if (reply->sync)
+					kdbus_sync_reply_wakeup(reply, -EPIPE);
+				else
+					kdbus_notify_reply_dead(conn->ep->bus,
+							entry->msg.src_id,
+							entry->msg.cookie);
+			}
+			mutex_unlock(&reply->reply_dst->lock);
+		}
+
+		kdbus_notify_flush(conn->ep->bus);
+		kdbus_queue_entry_free(entry);
+		kdbus_reply_unref(reply);
+
+		return 0;
+	}
+
+	/*
+	 * If there have been lost broadcast messages, report the number
+	 * in the overloaded recv->dropped_msgs field and return -EOVERFLOW.
+	 */
+	lost_count = atomic_read(&conn->lost_count);
+	if (lost_count) {
+		recv->dropped_msgs = lost_count;
+		atomic_sub(lost_count, &conn->lost_count);
+		ret = -EOVERFLOW;
+		goto exit_unlock;
+	}
+
+	/*
+	 * PEEK just returns the location of the next message. Do not install
+	 * file descriptors or anything else. This is usually used to
+	 * determine the sender of the next queued message.
+	 *
+	 * File descriptor numbers referenced in the message items
+	 * are undefined, they are only valid with the full receive
+	 * not with peek.
+	 *
+	 * Only if no PEEK is specified, the FDs are installed and the message
+	 * is dropped from internal queues.
+	 */
+	ret = kdbus_queue_entry_install(entry, conn, &recv->msg.return_flags,
+					install);
+	if (ret < 0)
+		goto exit_unlock;
+
+	/* Give the offset+size back to the caller. */
+	kdbus_pool_slice_publish(entry->slice, &recv->msg.offset,
+				 &recv->msg.msg_size);
+
+	if (install) {
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+
+exit_unlock:
+	mutex_unlock(&conn->lock);
+	kdbus_notify_flush(conn->ep->bus);
+	return ret;
+}
+
+static int kdbus_conn_check_access(struct kdbus_conn *conn_src,
+				   const struct cred *conn_src_creds,
+				   struct kdbus_conn *conn_dst,
+				   const struct kdbus_msg *msg,
+				   struct kdbus_reply **reply_wake)
+{
+	/*
+	 * If the message is a reply, its cookie_reply field must match any
+	 * of the connection's expected replies. Otherwise, access to send the
+	 * message will be denied.
+	 */
+	if (reply_wake && msg->cookie_reply > 0) {
+		struct kdbus_reply *r;
+
+		/*
+		 * The connection that we are replying to has not
+		 * issued any request or perhaps we have already
+		 * replied, in anycase the supplied cookie_reply is
+		 * no more valid, so fail.
+		 */
+		if (atomic_read(&conn_dst->request_count) == 0)
+			return -EPERM;
+
+		mutex_lock(&conn_dst->lock);
+		r = kdbus_reply_find(conn_src, conn_dst, msg->cookie_reply);
+		if (r) {
+			if (r->sync)
+				*reply_wake = kdbus_reply_ref(r);
+			kdbus_reply_unlink(r);
+		}
+		mutex_unlock(&conn_dst->lock);
+
+		return r ? 0 : -EPERM;
+	}
+
+	/* ... otherwise, ask the policy DBs for permission */
+	if (!kdbus_conn_policy_talk(conn_src, conn_src_creds, conn_dst))
+		return -EPERM;
+
+	return 0;
+}
+
+/* Callers should take the conn_dst lock */
+static struct kdbus_queue_entry *
+kdbus_conn_entry_make(struct kdbus_conn *conn_dst,
+		      const struct kdbus_kmsg *kmsg)
+{
+	/* The remote connection was disconnected */
+	if (!kdbus_conn_active(conn_dst))
+		return ERR_PTR(-ECONNRESET);
+
+	/* The connection does not accept file descriptors */
+	if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
+	    kmsg->res && kmsg->res->fds_count > 0)
+		return ERR_PTR(-ECOMM);
+
+	return kdbus_queue_entry_alloc(conn_dst->pool, kmsg);
+}
+
+/*
+ * Synchronously responding to a message, allocate a queue entry
+ * and attach it to the reply tracking object.
+ * The connection's queue will never get to see it.
+ */
+static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
+					const struct kdbus_kmsg *kmsg,
+					struct kdbus_reply *reply_wake)
+{
+	struct kdbus_queue_entry *entry;
+	int remote_ret;
+	int ret = 0;
+
+	mutex_lock(&reply_wake->reply_dst->lock);
+
+	/*
+	 * If we are still waiting then proceed, allocate a queue
+	 * entry and attach it to the reply object
+	 */
+	if (reply_wake->waiting) {
+		entry = kdbus_conn_entry_make(conn_dst, kmsg);
+		if (IS_ERR(entry))
+			ret = PTR_ERR(entry);
+		else
+			/* Attach the entry to the reply object */
+			reply_wake->queue_entry = entry;
+	} else {
+		ret = -ECONNRESET;
+	}
+
+	/*
+	 * Update the reply object and wake up remote peer only
+	 * on appropriate return codes
+	 *
+	 * * -ECOMM: if the replying connection failed with -ECOMM
+	 *           then wakeup remote peer with -EREMOTEIO
+	 *
+	 *           We do this to differenciate between -ECOMM errors
+	 *           from the original sender perspective:
+	 *           -ECOMM error during the sync send and
+	 *           -ECOMM error during the sync reply, this last
+	 *           one is rewritten to -EREMOTEIO
+	 *
+	 * * Wake up on all other return codes.
+	 */
+	remote_ret = ret;
+
+	if (ret == -ECOMM)
+		remote_ret = -EREMOTEIO;
+
+	kdbus_sync_reply_wakeup(reply_wake, remote_ret);
+	kdbus_reply_unlink(reply_wake);
+	mutex_unlock(&reply_wake->reply_dst->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
+ * @conn_src:		The sending connection
+ * @conn_dst:		The connection to queue into
+ * @kmsg:		The kmsg to queue
+ * @reply:		The reply tracker to attach to the queue entry
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
+			    struct kdbus_conn *conn_dst,
+			    const struct kdbus_kmsg *kmsg,
+			    struct kdbus_reply *reply)
+{
+	struct kdbus_queue_entry *entry;
+	int ret;
+
+	kdbus_conn_lock2(conn_src, conn_dst);
+
+	/*
+	 * Limit the maximum number of queued messages. This applies
+	 * to all messages, user messages and kernel notifications
+	 *
+	 * The kernel sends notifications to subscribed connections
+	 * only. If the connection do not clean its queue, no further
+	 * message delivery.
+	 * Kernel is able to queue KDBUS_CONN_MAX_MSGS messages, this
+	 * includes all type of notifications.
+	 */
+	if (conn_dst->queue.msg_count >= KDBUS_CONN_MAX_MSGS) {
+		ret = -ENOBUFS;
+		goto exit_unlock;
+	}
+
+	entry = kdbus_conn_entry_make(conn_dst, kmsg);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto exit_unlock;
+	}
+
+	/* limit the number of queued messages from the same individual user */
+	ret = kdbus_conn_queue_user_quota(conn_src, conn_dst, entry);
+	if (ret < 0)
+		goto exit_queue_free;
+
+	/*
+	 * Remember the reply associated with this queue entry, so we can
+	 * move the reply entry's connection when a connection moves from an
+	 * activator to an implementer.
+	 */
+	entry->reply = kdbus_reply_ref(reply);
+
+	if (reply) {
+		kdbus_reply_link(reply);
+		if (!reply->sync)
+			schedule_delayed_work(&conn_src->work, 0);
+	}
+
+	/* link the message into the receiver's entry */
+	kdbus_queue_entry_add(&conn_dst->queue, entry);
+
+	/* wake up poll() */
+	wake_up_interruptible(&conn_dst->wait);
+
+	ret = 0;
+	goto exit_unlock;
+
+exit_queue_free:
+	kdbus_queue_entry_free(entry);
+exit_unlock:
+	kdbus_conn_unlock2(conn_src, conn_dst);
+	return ret;
+}
+
+/**
+ * kdbus_conn_wait_reply() - Wait for the reply of a synchronous send
+ *			     operation
+ * @conn_src:		The sending connection (origin)
+ * @conn_dst:		The replying connection
+ * @cmd_send:		Payload of SEND command
+ * @ioctl_file:		struct file used to issue this ioctl
+ * @cancel_fd:		Pinned file that reflects KDBUS_ITEM_CANCEL_FD
+ *			item, used to cancel the blocking send call
+ * @reply_wait:		The tracked reply that we are waiting for.
+ * @expire:		Reply timeout
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
+				 struct kdbus_conn *conn_dst,
+				 struct kdbus_cmd_send *cmd_send,
+				 struct file *ioctl_file,
+				 struct file *cancel_fd,
+				 struct kdbus_reply *reply_wait,
+				 ktime_t expire)
+{
+	struct kdbus_queue_entry *entry;
+	struct poll_wqueues pwq = {};
+	int ret;
+
+	if (WARN_ON(!reply_wait))
+		return -EIO;
+
+	/*
+	 * Block until the reply arrives. reply_wait is left untouched
+	 * by the timeout scans that might be conducted for other,
+	 * asynchronous replies of conn_src.
+	 */
+
+	poll_initwait(&pwq);
+	poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
+
+	for (;;) {
+		/*
+		 * Any of the following conditions will stop our synchronously
+		 * blocking SEND command:
+		 *
+		 * a) The origin sender closed its connection
+		 * b) The remote peer answered, setting reply_wait->waiting = 0
+		 * c) The cancel FD was written to
+		 * d) A signal was received
+		 * e) The specified timeout was reached, and none of the above
+		 *    conditions kicked in.
+		 */
+
+		/*
+		 * We have already acquired an active reference when
+		 * entering here, but another thread may call
+		 * KDBUS_CMD_BYEBYE which does not acquire an active
+		 * reference, therefore kdbus_conn_disconnect() will
+		 * not wait for us.
+		 */
+		if (!kdbus_conn_active(conn_src)) {
+			ret = -ECONNRESET;
+			break;
+		}
+
+		/*
+		 * After the replying peer unset the waiting variable
+		 * it will wake up us.
+		 */
+		if (!reply_wait->waiting) {
+			ret = reply_wait->err;
+			break;
+		}
+
+		if (cancel_fd) {
+			unsigned int r;
+
+			r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
+			if (r & POLLIN) {
+				ret = -ECANCELED;
+				break;
+			}
+		}
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
+					   &expire, 0)) {
+			ret = -ETIMEDOUT;
+			break;
+		}
+
+		/*
+		 * Reset the poll worker func, so the waitqueues are not
+		 * added to the poll table again. We just reuse what we've
+		 * collected earlier for further iterations.
+		 */
+		init_poll_funcptr(&pwq.pt, NULL);
+	}
+
+	poll_freewait(&pwq);
+
+	if (ret == -EINTR) {
+		/*
+		 * Interrupted system call. Unref the reply object, and pass
+		 * the return value down the chain. Mark the reply as
+		 * interrupted, so the cleanup work can remove it, but do not
+		 * unlink it from the list. Once the syscall restarts, we'll
+		 * pick it up and wait on it again.
+		 */
+		mutex_lock(&conn_src->lock);
+		reply_wait->interrupted = true;
+		schedule_delayed_work(&conn_src->work, 0);
+		mutex_unlock(&conn_src->lock);
+
+		return -ERESTARTSYS;
+	}
+
+	mutex_lock(&conn_src->lock);
+	reply_wait->waiting = false;
+	entry = reply_wait->queue_entry;
+	if (entry) {
+		ret = kdbus_queue_entry_install(entry, conn_src,
+						&cmd_send->reply.return_flags,
+						true);
+		kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
+					 &cmd_send->reply.msg_size);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+	kdbus_reply_unlink(reply_wait);
+	mutex_unlock(&conn_src->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_msg_send() - send a message
+ * @conn_src:		Connection
+ * @cmd:		Payload of SEND command
+ * @ioctl_file:		struct file used to issue this ioctl
+ * @kmsg:		Message to send
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_cmd_msg_send(struct kdbus_conn *conn_src,
+		       struct kdbus_cmd_send *cmd,
+		       struct file *ioctl_file,
+		       struct kdbus_kmsg *kmsg)
+{
+	bool sync = cmd->flags & KDBUS_SEND_SYNC_REPLY;
+	struct kdbus_name_entry *name_entry = NULL;
+	struct kdbus_reply *reply_wait = NULL;
+	struct kdbus_reply *reply_wake = NULL;
+	struct kdbus_msg *msg = &kmsg->msg;
+	struct kdbus_conn *conn_dst = NULL;
+	struct kdbus_bus *bus = conn_src->ep->bus;
+	struct file *cancel_fd = NULL;
+	struct kdbus_item *item;
+	int ret = 0;
+
+	/* assign domain-global message sequence number */
+	if (WARN_ON(kmsg->seq > 0))
+		return -EINVAL;
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_CANCEL_FD:
+			/* install cancel_fd only if synchronous */
+			if (!sync)
+				break;
+
+			if (cancel_fd) {
+				ret = -EEXIST;
+				goto exit_put_cancelfd;
+			}
+
+			cancel_fd = fget(item->fds[0]);
+			if (IS_ERR(cancel_fd))
+				return PTR_ERR(cancel_fd);
+
+			if (!cancel_fd->f_op->poll) {
+				ret = -EINVAL;
+				goto exit_put_cancelfd;
+			}
+			break;
+
+		default:
+			ret = -EINVAL;
+			goto exit_put_cancelfd;
+		}
+	}
+
+	kmsg->seq = atomic64_inc_return(&bus->domain->msg_seq_last);
+
+	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
+		kdbus_bus_broadcast(bus, conn_src, kmsg);
+		goto exit_put_cancelfd;
+	}
+
+	if (kmsg->res && kmsg->res->dst_name) {
+		/*
+		 * Lock the destination name so it will not get dropped or
+		 * moved between activator/implementer while we try to queue a
+		 * message. We also rely on this to read-lock the entire
+		 * registry so kdbus_meta_conn_collect() will have a consistent
+		 * view of all acquired names on both connections.
+		 * If kdbus_name_lock() gets changed to a per-name lock, we
+		 * really need to read-lock the whole registry here.
+		 */
+		name_entry = kdbus_name_lock(bus->name_registry,
+					     kmsg->res->dst_name);
+		if (!name_entry) {
+			ret = -ESRCH;
+			goto exit_put_cancelfd;
+		}
+
+		/*
+		 * If both a name and a connection ID are given as destination
+		 * of a message, check that the currently owning connection of
+		 * the name matches the specified ID.
+		 * This way, we allow userspace to send the message to a
+		 * specific connection by ID only if the connection currently
+		 * owns the given name.
+		 */
+		if (msg->dst_id != KDBUS_DST_ID_NAME &&
+		    msg->dst_id != name_entry->conn->id) {
+			ret = -EREMCHG;
+			goto exit_name_unlock;
+		}
+
+		if (!name_entry->conn && name_entry->activator)
+			conn_dst = kdbus_conn_ref(name_entry->activator);
+		else
+			conn_dst = kdbus_conn_ref(name_entry->conn);
+
+		if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
+		    kdbus_conn_is_activator(conn_dst)) {
+			ret = -EADDRNOTAVAIL;
+			goto exit_unref;
+		}
+	} else {
+		/* unicast message to unique name */
+		conn_dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
+		if (!conn_dst)
+			return -ENXIO;
+
+		/*
+		 * Special-purpose connections are not allowed to be addressed
+		 * via their unique IDs.
+		 */
+		if (!kdbus_conn_is_ordinary(conn_dst)) {
+			ret = -ENXIO;
+			goto exit_unref;
+		}
+	}
+
+	/*
+	 * Record the sequence number of the registered name;
+	 * it will be passed on to the queue, in case messages
+	 * addressed to a name need to be moved from or to
+	 * activator connections of the same name.
+	 */
+	if (name_entry)
+		kmsg->dst_name_id = name_entry->name_id;
+
+	if (conn_src) {
+		u64 attach_flags;
+
+		/*
+		 * If we got here due to an interrupted system call, our reply
+		 * wait object is still queued on conn_dst, with the former
+		 * cookie. Look it up, and in case it exists, go dormant right
+		 * away again, and don't queue the message again.
+		 *
+		 * We also need to make sure that conn_src did really
+		 * issue a request or if the request did not get
+		 * canceled on the way before looking up any reply
+		 * object.
+		 */
+		if (sync && atomic_read(&conn_src->request_count) > 0) {
+			mutex_lock(&conn_src->lock);
+			reply_wait = kdbus_reply_find(conn_dst, conn_src,
+						      kmsg->msg.cookie);
+			if (reply_wait) {
+				if (reply_wait->interrupted) {
+					kdbus_reply_ref(reply_wait);
+					reply_wait->interrupted = false;
+				} else {
+					reply_wait = NULL;
+				}
+			}
+			mutex_unlock(&conn_src->lock);
+
+			if (reply_wait)
+				goto wait_sync;
+		}
+
+		/* Calculate attach flags of conn_src & conn_dst */
+		attach_flags = kdbus_meta_calc_attach_flags(conn_src, conn_dst);
+
+		/*
+		 * If this connection did not fake its metadata then
+		 * lets augment its metadata by the current valid
+		 * metadata
+		 */
+		if (!conn_src->faked_meta) {
+			ret = kdbus_meta_proc_collect(kmsg->proc_meta,
+						      attach_flags);
+			if (ret < 0)
+				goto exit_unref;
+		}
+
+		/*
+		 * If requested, then we always send the current
+		 * description and owned names of source connection
+		 */
+		ret = kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+					      attach_flags);
+		if (ret < 0)
+			goto exit_unref;
+
+		if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
+			ret = kdbus_conn_check_access(conn_src, current_cred(),
+						      conn_dst, msg, NULL);
+			if (ret < 0)
+				goto exit_unref;
+
+			reply_wait = kdbus_reply_new(conn_dst, conn_src, msg,
+						     name_entry, sync);
+			if (IS_ERR(reply_wait)) {
+				ret = PTR_ERR(reply_wait);
+				reply_wait = NULL;
+				goto exit_unref;
+			}
+		} else if (msg->flags & KDBUS_MSG_SIGNAL) {
+			if (!kdbus_match_db_match_kmsg(conn_dst->match_db,
+						       conn_src, kmsg)) {
+				ret = -EPERM;
+				goto exit_unref;
+			}
+
+			/*
+			 * A receiver needs TALK access to the sender
+			 * in order to receive signals.
+			 */
+			ret = kdbus_conn_check_access(conn_dst, NULL, conn_src,
+						      msg, NULL);
+			if (ret < 0)
+				goto exit_unref;
+		} else {
+			ret = kdbus_conn_check_access(conn_src, current_cred(),
+						      conn_dst, msg,
+						      &reply_wake);
+			if (ret < 0)
+				goto exit_unref;
+		}
+	}
+
+	/*
+	 * Forward to monitors before queuing the message. Otherwise, the
+	 * receiver might queue a reply before the original message is queued
+	 * on the monitors.
+	 * We never guarantee consistent ordering across connections, but for
+	 * monitors we should at least make sure they get the message before
+	 * anyone else.
+	 */
+	kdbus_bus_eavesdrop(bus, conn_src, kmsg);
+
+	if (reply_wake) {
+		/*
+		 * If we're synchronously responding to a message, allocate a
+		 * queue item and attach it to the reply tracking object.
+		 * The connection's queue will never get to see it.
+		 */
+		ret = kdbus_conn_entry_sync_attach(conn_dst, kmsg, reply_wake);
+		if (ret < 0)
+			goto exit_unref;
+	} else {
+		/*
+		 * Otherwise, put it in the queue and wait for the connection
+		 * to dequeue and receive the message.
+		 */
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst,
+					      kmsg, reply_wait);
+		if (ret < 0)
+			goto exit_unref;
+	}
+
+wait_sync:
+	/* no reason to keep names locked for replies */
+	name_entry = kdbus_name_unlock(bus->name_registry, name_entry);
+
+	if (sync) {
+		ktime_t now = ktime_get();
+		ktime_t expire = ns_to_ktime(msg->timeout_ns);
+
+		if (likely(ktime_compare(now, expire) < 0))
+			ret = kdbus_conn_wait_reply(conn_src, conn_dst, cmd,
+						    ioctl_file, cancel_fd,
+						    reply_wait, expire);
+		else
+			ret = -ETIMEDOUT;
+	}
+
+exit_unref:
+	kdbus_reply_unref(reply_wait);
+	kdbus_reply_unref(reply_wake);
+	kdbus_conn_unref(conn_dst);
+exit_name_unlock:
+	kdbus_name_unlock(bus->name_registry, name_entry);
+exit_put_cancelfd:
+	if (cancel_fd)
+		fput(cancel_fd);
+
+	return ret;
+}
+
+/**
+ * kdbus_conn_disconnect() - disconnect a connection
+ * @conn:		The connection to disconnect
+ * @ensure_queue_empty:	Flag to indicate if the call should fail in
+ *			case the connection's message list is not
+ *			empty
+ *
+ * If @ensure_msg_list_empty is true, and the connection has pending messages,
+ * -EBUSY is returned.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
+{
+	struct kdbus_queue_entry *entry, *tmp;
+	struct kdbus_bus *bus = conn->ep->bus;
+	struct kdbus_reply *r, *r_tmp;
+	struct kdbus_conn *c;
+	int i, v;
+
+	mutex_lock(&conn->lock);
+	v = atomic_read(&conn->active);
+	if (v == KDBUS_CONN_ACTIVE_NEW) {
+		/* was never connected */
+		mutex_unlock(&conn->lock);
+		return 0;
+	}
+	if (v < 0) {
+		/* already dead */
+		mutex_unlock(&conn->lock);
+		return -EALREADY;
+	}
+	if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
+		/* still busy */
+		mutex_unlock(&conn->lock);
+		return -EBUSY;
+	}
+
+	atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
+	mutex_unlock(&conn->lock);
+
+	wake_up_interruptible(&conn->wait);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
+	if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
+		lock_contended(&conn->dep_map, _RET_IP_);
+#endif
+
+	wait_event(conn->wait,
+		   atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	lock_acquired(&conn->dep_map, _RET_IP_);
+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+	cancel_delayed_work_sync(&conn->work);
+	kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&conn->ep->lock);
+	down_write(&bus->conn_rwlock);
+
+	/* remove from bus and endpoint */
+	hash_del(&conn->hentry);
+	list_del(&conn->monitor_entry);
+	list_del(&conn->ep_entry);
+
+	up_write(&bus->conn_rwlock);
+	mutex_unlock(&conn->ep->lock);
+
+	/*
+	 * Remove all names associated with this connection; this possibly
+	 * moves queued messages back to the activator connection.
+	 */
+	kdbus_name_remove_by_conn(bus->name_registry, conn);
+
+	/* if we die while other connections wait for our reply, notify them */
+	mutex_lock(&conn->lock);
+	list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
+		if (entry->reply)
+			kdbus_notify_reply_dead(bus, entry->msg.src_id,
+						entry->msg.cookie);
+
+		kdbus_queue_entry_remove(conn, entry);
+		kdbus_pool_slice_release(entry->slice);
+		kdbus_queue_entry_free(entry);
+	}
+
+	list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
+		kdbus_reply_unlink(r);
+	mutex_unlock(&conn->lock);
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	down_read(&bus->conn_rwlock);
+	hash_for_each(bus->conn_hash, i, c, hentry) {
+		mutex_lock(&c->lock);
+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
+			if (r->reply_src == conn) {
+				if (r->sync) {
+					kdbus_sync_reply_wakeup(r, -EPIPE);
+					kdbus_reply_unlink(r);
+					continue;
+				}
+
+				/* send a 'connection dead' notification */
+				kdbus_notify_reply_dead(bus, c->id, r->cookie);
+				kdbus_reply_unlink(r);
+			}
+		}
+		mutex_unlock(&c->lock);
+	}
+	up_read(&bus->conn_rwlock);
+
+	if (!kdbus_conn_is_monitor(conn))
+		kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
+				       conn->id, conn->flags);
+
+	kdbus_notify_flush(bus);
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_active() - connection is not disconnected
+ * @conn:		Connection to check
+ *
+ * Return true if the connection was not disconnected, yet. Note that a
+ * connection might be disconnected asynchronously, unless you hold the
+ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
+ * suppress connection shutdown for a short period.
+ *
+ * Return: true if the connection is still active
+ */
+bool kdbus_conn_active(const struct kdbus_conn *conn)
+{
+	return atomic_read(&conn->active) >= 0;
+}
+
+static void __kdbus_conn_free(struct kref *kref)
+{
+	struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
+
+	WARN_ON(kdbus_conn_active(conn));
+	WARN_ON(delayed_work_pending(&conn->work));
+	WARN_ON(!list_empty(&conn->queue.msg_list));
+	WARN_ON(!list_empty(&conn->names_list));
+	WARN_ON(!list_empty(&conn->names_queue_list));
+	WARN_ON(!list_empty(&conn->reply_list));
+
+	if (conn->user) {
+		atomic_dec(&conn->user->connections);
+		kdbus_domain_user_unref(conn->user);
+	}
+
+	kdbus_meta_proc_unref(conn->meta);
+	kdbus_match_db_free(conn->match_db);
+	kdbus_pool_free(conn->pool);
+	kdbus_ep_unref(conn->ep);
+	put_cred(conn->cred);
+	kfree(conn->description);
+	kfree(conn);
+}
+
+/**
+ * kdbus_conn_ref() - take a connection reference
+ * @conn:		Connection, may be %NULL
+ *
+ * Return: the connection itself
+ */
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
+{
+	if (conn)
+		kref_get(&conn->kref);
+	return conn;
+}
+
+/**
+ * kdbus_conn_unref() - drop a connection reference
+ * @conn:		Connection (may be NULL)
+ *
+ * When the last reference is dropped, the connection's internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
+{
+	if (conn)
+		kref_put(&conn->kref, __kdbus_conn_free);
+	return NULL;
+}
+
+/**
+ * kdbus_conn_acquire() - acquire an active connection reference
+ * @conn:		Connection
+ *
+ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
+ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
+ * user-visible action on this connection and signal ECONNRESET instead.
+ * To avoid testing for connection availability everytime you take the
+ * connection-lock, you can acquire a connection for short periods.
+ *
+ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
+ * connection. You must also hold a regular reference at any time! As long as
+ * you hold the active-ref, the connection will not be shut down. However, if
+ * the connection was shut down, you can never acquire an active-ref again.
+ *
+ * kdbus_conn_disconnect() disables the connection and then waits for all active
+ * references to be dropped. It will also wake up any pending operation.
+ * However, you must not sleep for an indefinite period while holding an
+ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
+ * to sleep for an indefinite period, either release the reference and try to
+ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
+ * your wait-queue.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_conn_acquire(struct kdbus_conn *conn)
+{
+	if (!atomic_inc_unless_negative(&conn->active))
+		return -ECONNRESET;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
+#endif
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_release() - release an active connection reference
+ * @conn:		Connection
+ *
+ * This releases an active reference that has been acquired via
+ * kdbus_conn_acquire(). If the connection was already disabled and this is the
+ * last active-ref that is dropped, the disconnect-waiter will be woken up and
+ * properly close the connection.
+ */
+void kdbus_conn_release(struct kdbus_conn *conn)
+{
+	int v;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+	v = atomic_dec_return(&conn->active);
+	if (v != KDBUS_CONN_ACTIVE_BIAS)
+		return;
+
+	wake_up_all(&conn->wait);
+}
+
+/**
+ * kdbus_conn_move_messages() - move messages from one connection to another
+ * @conn_dst:		Connection to copy to
+ * @conn_src:		Connection to copy from
+ * @name_id:		Filter for the sequence number of the registered
+ *			name, 0 means no filtering.
+ *
+ * Move all messages from one connection to another. This is used when
+ * an implementer connection is taking over/giving back a well-known name
+ * from/to an activator connection.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+			     struct kdbus_conn *conn_src,
+			     u64 name_id)
+{
+	struct kdbus_queue_entry *q, *q_tmp;
+	struct kdbus_reply *r, *r_tmp;
+	struct kdbus_bus *bus;
+	struct kdbus_conn *c;
+	LIST_HEAD(msg_list);
+	int i, ret = 0;
+
+	if (WARN_ON(!mutex_is_locked(&conn_dst->ep->bus->lock)))
+		return -EINVAL;
+
+	if (WARN_ON(conn_src == conn_dst))
+		return -EINVAL;
+
+	bus = conn_src->ep->bus;
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	down_read(&bus->conn_rwlock);
+	hash_for_each(bus->conn_hash, i, c, hentry) {
+		if (c == conn_src || c == conn_dst)
+			continue;
+
+		mutex_lock(&c->lock);
+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
+			if (r->reply_src != conn_src)
+				continue;
+
+			/* filter messages for a specific name */
+			if (name_id > 0 && r->name_id != name_id)
+				continue;
+
+			kdbus_conn_unref(r->reply_src);
+			r->reply_src = kdbus_conn_ref(conn_dst);
+		}
+		mutex_unlock(&c->lock);
+	}
+	up_read(&bus->conn_rwlock);
+
+	kdbus_conn_lock2(conn_src, conn_dst);
+	list_for_each_entry_safe(q, q_tmp, &conn_src->queue.msg_list, entry) {
+		/* filter messages for a specific name */
+		if (name_id > 0 && q->dst_name_id != name_id)
+			continue;
+
+		kdbus_queue_entry_remove(conn_src, q);
+
+		if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
+		    q->msg_res && q->msg_res->fds_count > 0) {
+			atomic_inc(&conn_dst->lost_count);
+			continue;
+		}
+
+		ret = kdbus_queue_entry_move(conn_dst, q);
+		if (ret < 0) {
+			atomic_inc(&conn_dst->lost_count);
+			kdbus_queue_entry_free(q);
+		}
+	}
+	kdbus_conn_unlock2(conn_src, conn_dst);
+
+	/* wake up poll() */
+	wake_up_interruptible(&conn_dst->wait);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_conn_info() - retrieve info about a connection
+ * @conn:		Connection
+ * @cmd_info:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_conn_info(struct kdbus_conn *conn,
+			struct kdbus_cmd_info *cmd_info)
+{
+	struct kdbus_meta_conn *conn_meta = NULL;
+	struct kdbus_pool_slice *slice = NULL;
+	struct kdbus_name_entry *entry = NULL;
+	struct kdbus_conn *owner_conn = NULL;
+	struct kdbus_item *meta_items = NULL;
+	struct kdbus_info info = {};
+	struct kvec kvec[2];
+	size_t meta_size;
+	u64 attach_flags;
+	int ret = 0;
+
+	if (cmd_info->id == 0) {
+		const char *name;
+
+		name = kdbus_items_get_str(cmd_info->items,
+					   KDBUS_ITEMS_SIZE(cmd_info, items),
+					   KDBUS_ITEM_NAME);
+		if (IS_ERR(name))
+			return -EINVAL;
+
+		if (!kdbus_name_is_valid(name, false))
+			return -EINVAL;
+
+		entry = kdbus_name_lock(conn->ep->bus->name_registry, name);
+		if (!entry || !kdbus_conn_policy_see_name(conn, current_cred(),
+							  name)) {
+			/* pretend a name doesn't exist if you cannot see it */
+			ret = -ESRCH;
+			goto exit;
+		}
+
+		if (entry->conn)
+			owner_conn = kdbus_conn_ref(entry->conn);
+	} else {
+		owner_conn = kdbus_bus_find_conn_by_id(conn->ep->bus,
+						       cmd_info->id);
+		if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
+							  owner_conn)) {
+			/* pretend an id doesn't exist if you cannot see it */
+			ret = -ENXIO;
+			goto exit;
+		}
+	}
+
+	info.id = owner_conn->id;
+	info.flags = owner_conn->flags;
+
+	/* mask out what information the connection wants to pass us */
+	attach_flags = cmd_info->flags &
+		       atomic64_read(&owner_conn->attach_flags_send);
+
+	conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(conn_meta)) {
+		ret = PTR_ERR(conn_meta);
+		conn_meta = NULL;
+		goto exit;
+	}
+
+	ret = kdbus_meta_conn_collect(conn_meta, NULL, owner_conn,
+				      attach_flags);
+	if (ret < 0)
+		goto exit;
+
+	meta_items = kdbus_meta_export(owner_conn->meta, conn_meta,
+				       attach_flags, &meta_size);
+	if (IS_ERR(meta_items)) {
+		ret = PTR_ERR(meta_items);
+		meta_items = NULL;
+		goto exit;
+	}
+
+	kdbus_kvec_set(&kvec[0], &info, sizeof(info), &info.size);
+	kdbus_kvec_set(&kvec[1], meta_items, meta_size, &info.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, info.size,
+				       kvec, NULL, ARRAY_SIZE(kvec));
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit;
+	}
+
+	/* write back the offset */
+	kdbus_pool_slice_publish(slice, &cmd_info->offset,
+				 &cmd_info->info_size);
+	ret = 0;
+
+	kdbus_pool_slice_release(slice);
+exit:
+	kfree(meta_items);
+	kdbus_meta_conn_unref(conn_meta);
+	kdbus_conn_unref(owner_conn);
+	kdbus_name_unlock(conn->ep->bus->name_registry, entry);
+
+	return ret;
+}
+
+/**
+ * kdbus_cmd_conn_update() - update the attach-flags of a connection or
+ *			     the policy entries of a policy holding one
+ * @conn:		Connection
+ * @cmd:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+			  const struct kdbus_cmd_update *cmd)
+{
+	struct kdbus_bus *bus = conn->ep->bus;
+	bool send_flags_provided = false;
+	bool recv_flags_provided = false;
+	bool policy_provided = false;
+	const struct kdbus_item *item;
+	u64 attach_send;
+	u64 attach_recv;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+			/*
+			 * Only ordinary or monitor connections may update
+			 * their attach-flags-send. attach-flags-recv can
+			 * additionally be updated by activators.
+			 */
+			if (!kdbus_conn_is_ordinary(conn) &&
+			    !kdbus_conn_is_monitor(conn))
+				return -EOPNOTSUPP;
+
+			ret = kdbus_sanitize_attach_flags(item->data64[0],
+							  &attach_send);
+			if (ret < 0)
+				return ret;
+
+			send_flags_provided = true;
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+			if (!kdbus_conn_is_ordinary(conn) &&
+			    !kdbus_conn_is_monitor(conn) &&
+			    !kdbus_conn_is_activator(conn))
+				return -EOPNOTSUPP;
+
+			ret = kdbus_sanitize_attach_flags(item->data64[0],
+							  &attach_recv);
+			if (ret < 0)
+				return ret;
+
+			recv_flags_provided = true;
+			break;
+
+		case KDBUS_ITEM_NAME:
+		case KDBUS_ITEM_POLICY_ACCESS:
+			/*
+			 * Only policy holders may update their policy
+			 * entries. Policy holders are privileged
+			 * connections.
+			 */
+			if (!kdbus_conn_is_policy_holder(conn))
+				return -EOPNOTSUPP;
+
+			policy_provided = true;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+
+	if (policy_provided) {
+		ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
+				       KDBUS_ITEMS_SIZE(cmd, items),
+				       1, true, conn);
+		if (ret < 0)
+			return ret;
+	}
+
+	if (send_flags_provided) {
+		/*
+		 * The attach flags send must always satisfy the
+		 * bus requirements.
+		 */
+		if (bus->attach_flags_req & ~attach_send)
+			return -EINVAL;
+
+		atomic64_set(&conn->attach_flags_send, attach_send);
+	}
+
+	if (recv_flags_provided)
+		atomic64_set(&conn->attach_flags_recv, attach_recv);
+
+	return 0;
+}
+
+/**
+ * kdbus_conn_new() - create a new connection
+ * @ep:			The endpoint the connection is connected to
+ * @hello:		The kdbus_cmd_hello as passed in by the user
+ * @privileged:		Whether to create a privileged connection
+ *
+ * Return: a new kdbus_conn on success, ERR_PTR on failure
+ */
+struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
+				  struct kdbus_cmd_hello *hello,
+				  bool privileged)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	static struct lock_class_key __key;
+#endif
+	const struct kdbus_creds *creds = NULL;
+	struct kdbus_pool_slice *slice = NULL;
+	const struct kdbus_pids *pids = NULL;
+	struct kdbus_item_list items = {};
+	struct kdbus_bus *bus = ep->bus;
+	const struct kdbus_item *item;
+	const char *conn_description = NULL;
+	const char *seclabel = NULL;
+	const char *name = NULL;
+	struct kdbus_conn *conn;
+	u64 attach_flags_send;
+	u64 attach_flags_recv;
+	bool is_policy_holder;
+	bool is_activator;
+	bool is_monitor;
+	struct kvec kvec[2];
+	int ret;
+
+	struct {
+		/* bloom item */
+		u64 size;
+		u64 type;
+		struct kdbus_bloom_parameter bloom;
+	} bloom_item;
+
+	is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
+	is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
+	is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
+
+	/* can only be one of monitor/activator/policy_holder */
+	if (is_monitor + is_activator + is_policy_holder > 1)
+		return ERR_PTR(-EINVAL);
+
+	/* Monitors are disallowed on custom endpoints */
+	if (is_monitor && ep->has_policy)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* only privileged connections can activate and monitor */
+	if (!privileged && (is_activator || is_policy_holder || is_monitor))
+		return ERR_PTR(-EPERM);
+
+	KDBUS_ITEMS_FOREACH(item, hello->items,
+			    KDBUS_ITEMS_SIZE(hello, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_NAME:
+			if (!is_activator && !is_policy_holder)
+				return ERR_PTR(-EINVAL);
+
+			if (name)
+				return ERR_PTR(-EINVAL);
+
+			if (!kdbus_name_is_valid(item->str, true))
+				return ERR_PTR(-EINVAL);
+
+			name = item->str;
+			break;
+
+		case KDBUS_ITEM_CREDS:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			if (item->size != KDBUS_ITEM_SIZE(sizeof(*creds)))
+				return ERR_PTR(-EINVAL);
+
+			creds = &item->creds;
+			break;
+
+		case KDBUS_ITEM_PIDS:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			if (item->size != KDBUS_ITEM_SIZE(sizeof(*pids)))
+				return ERR_PTR(-EINVAL);
+
+			pids = &item->pids;
+			break;
+
+		case KDBUS_ITEM_SECLABEL:
+			/* privileged processes can impersonate somebody else */
+			if (!privileged)
+				return ERR_PTR(-EPERM);
+
+			seclabel = item->str;
+			break;
+
+		case KDBUS_ITEM_CONN_DESCRIPTION:
+			/* human-readable connection name (debugging) */
+			if (conn_description)
+				return ERR_PTR(-EINVAL);
+
+			conn_description = item->str;
+			break;
+
+		case KDBUS_ITEM_POLICY_ACCESS:
+		case KDBUS_ITEM_BLOOM_MASK:
+		case KDBUS_ITEM_ID:
+		case KDBUS_ITEM_NAME_ADD:
+		case KDBUS_ITEM_NAME_REMOVE:
+		case KDBUS_ITEM_NAME_CHANGE:
+		case KDBUS_ITEM_ID_ADD:
+		case KDBUS_ITEM_ID_REMOVE:
+			/* will be handled by policy and match code */
+			break;
+
+		default:
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	if ((is_activator || is_policy_holder) && !name)
+		return ERR_PTR(-EINVAL);
+
+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
+					  &attach_flags_send);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
+					  &attach_flags_recv);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	/* Let userspace know which flags are enforced by the bus */
+	hello->attach_flags_send = bus->attach_flags_req | KDBUS_FLAG_KERNEL;
+
+	/*
+	 * The attach flags must always satisfy the bus
+	 * requirements.
+	 */
+	if (bus->attach_flags_req & ~attach_flags_send)
+		return ERR_PTR(-ECONNREFUSED);
+
+	conn = kzalloc(sizeof(*conn), GFP_KERNEL);
+	if (!conn)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&conn->kref);
+	atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
+#endif
+	mutex_init(&conn->lock);
+	INIT_LIST_HEAD(&conn->names_list);
+	INIT_LIST_HEAD(&conn->names_queue_list);
+	INIT_LIST_HEAD(&conn->reply_list);
+	atomic_set(&conn->name_count, 0);
+	atomic_set(&conn->request_count, 0);
+	atomic_set(&conn->lost_count, 0);
+	INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
+	conn->cred = get_current_cred();
+	init_waitqueue_head(&conn->wait);
+	kdbus_queue_init(&conn->queue);
+	conn->privileged = privileged;
+	conn->ep = kdbus_ep_ref(ep);
+	conn->id = atomic64_inc_return(&bus->conn_seq_last);
+	conn->flags = hello->flags;
+	atomic64_set(&conn->attach_flags_send, attach_flags_send);
+	atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
+	/* init entry, so we can remove it unconditionally */
+	INIT_LIST_HEAD(&conn->monitor_entry);
+
+	if (conn_description) {
+		conn->description = kstrdup(conn_description, GFP_KERNEL);
+		if (!conn->description) {
+			ret = -ENOMEM;
+			goto exit_unref;
+		}
+	}
+
+	conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
+	if (IS_ERR(conn->pool)) {
+		ret = PTR_ERR(conn->pool);
+		conn->pool = NULL;
+		goto exit_unref;
+	}
+
+	conn->match_db = kdbus_match_db_new();
+	if (IS_ERR(conn->match_db)) {
+		ret = PTR_ERR(conn->match_db);
+		conn->match_db = NULL;
+		goto exit_unref;
+	}
+
+	/* return properties of this connection to the caller */
+	hello->bus_flags = bus->bus_flags;
+	hello->id = conn->id;
+
+	BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
+	memcpy(hello->id128, bus->id128, sizeof(hello->id128));
+
+	conn->meta = kdbus_meta_proc_new();
+	if (IS_ERR(conn->meta)) {
+		ret = PTR_ERR(conn->meta);
+		conn->meta = NULL;
+		goto exit_unref;
+	}
+
+	/* privileged processes can impersonate somebody else */
+	if (creds || pids || seclabel) {
+		ret = kdbus_meta_proc_fake(conn->meta, creds, pids, seclabel);
+		if (ret < 0)
+			goto exit_unref;
+
+		conn->faked_meta = true;
+	} else {
+		ret = kdbus_meta_proc_collect(conn->meta,
+					      KDBUS_ATTACH_CREDS |
+					      KDBUS_ATTACH_PIDS |
+					      KDBUS_ATTACH_AUXGROUPS |
+					      KDBUS_ATTACH_TID_COMM |
+					      KDBUS_ATTACH_PID_COMM |
+					      KDBUS_ATTACH_EXE |
+					      KDBUS_ATTACH_CMDLINE |
+					      KDBUS_ATTACH_CGROUP |
+					      KDBUS_ATTACH_CAPS |
+					      KDBUS_ATTACH_SECLABEL |
+					      KDBUS_ATTACH_AUDIT);
+		if (ret < 0)
+			goto exit_unref;
+	}
+
+	/*
+	 * Account the connection against the current user (UID), or for
+	 * custom endpoints use the anonymous user assigned to the endpoint.
+	 * Note that limits are always accounted against the real UID, not
+	 * the effective UID (cred->user always points to the accounting of
+	 * cred->uid, not cred->euid).
+	 */
+	if (ep->user) {
+		conn->user = kdbus_domain_user_ref(ep->user);
+	} else {
+		conn->user = kdbus_domain_get_user(ep->bus->domain,
+						   current_uid());
+		if (IS_ERR(conn->user)) {
+			ret = PTR_ERR(conn->user);
+			conn->user = NULL;
+			goto exit_unref;
+		}
+	}
+
+	if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
+		/* decremented by destructor as conn->user is valid */
+		ret = -EMFILE;
+		goto exit_unref;
+	}
+
+	bloom_item.size = sizeof(bloom_item);
+	bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
+	bloom_item.bloom = bus->bloom;
+	kdbus_kvec_set(&kvec[0], &items, sizeof(items), &items.size);
+	kdbus_kvec_set(&kvec[1], &bloom_item, bloom_item.size, &items.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, items.size, kvec, NULL,
+				       ARRAY_SIZE(kvec));
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit_unref;
+	}
+
+	kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
+	kdbus_pool_slice_release(slice);
+
+	return conn;
+
+exit_unref:
+	kdbus_pool_slice_release(slice);
+	kdbus_conn_unref(conn);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_conn_connect() - introduce a connection to a bus
+ * @conn:		Connection
+ * @hello:		Hello parameters
+ *
+ * This puts life into a kdbus-conn object. A connection to the bus is
+ * established and the peer will be reachable via the bus (if it is an ordinary
+ * connection).
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_conn_connect(struct kdbus_conn *conn, struct kdbus_cmd_hello *hello)
+{
+	struct kdbus_ep *ep = conn->ep;
+	struct kdbus_bus *bus = ep->bus;
+	int ret;
+
+	if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
+		return -EALREADY;
+
+	/* make sure the ep-node is active while we add our connection */
+	if (!kdbus_node_acquire(&ep->node))
+		return -ESHUTDOWN;
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&bus->lock);
+	mutex_lock(&ep->lock);
+	down_write(&bus->conn_rwlock);
+
+	/* link into monitor list */
+	if (kdbus_conn_is_monitor(conn))
+		list_add_tail(&conn->monitor_entry, &bus->monitors_list);
+
+	/* link into bus and endpoint */
+	list_add_tail(&conn->ep_entry, &ep->conn_list);
+	hash_add(bus->conn_hash, &conn->hentry, conn->id);
+
+	/* enable lookups and acquire active ref */
+	atomic_set(&conn->active, 1);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
+#endif
+
+	up_write(&bus->conn_rwlock);
+	mutex_unlock(&ep->lock);
+	mutex_unlock(&bus->lock);
+
+	kdbus_node_release(&ep->node);
+
+	/*
+	 * Notify subscribers about the new active connection, unless it is
+	 * a monitor. Monitors are invisible on the bus, can't be addressed
+	 * directly, and won't cause any notifications.
+	 */
+	if (!kdbus_conn_is_monitor(conn)) {
+		ret = kdbus_notify_id_change(conn->ep->bus, KDBUS_ITEM_ID_ADD,
+					     conn->id, conn->flags);
+		if (ret < 0)
+			goto exit_disconnect;
+	}
+
+	if (kdbus_conn_is_activator(conn)) {
+		u64 flags = KDBUS_NAME_ACTIVATOR;
+		const char *name;
+
+		name = kdbus_items_get_str(hello->items,
+					   KDBUS_ITEMS_SIZE(hello, items),
+					   KDBUS_ITEM_NAME);
+		if (WARN_ON(!name)) {
+			ret = -EINVAL;
+			goto exit_disconnect;
+		}
+
+		ret = kdbus_name_acquire(bus->name_registry, conn, name,
+					 &flags);
+		if (ret < 0)
+			goto exit_disconnect;
+	}
+
+	kdbus_conn_release(conn);
+	kdbus_notify_flush(bus);
+	return 0;
+
+exit_disconnect:
+	kdbus_conn_release(conn);
+	kdbus_conn_disconnect(conn, false);
+	return ret;
+}
+
+/**
+ * kdbus_conn_has_name() - check if a connection owns a name
+ * @conn:		Connection
+ * @name:		Well-know name to check for
+ *
+ * Return: true if the name is currently owned by the connection
+ */
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
+{
+	struct kdbus_name_entry *e;
+	bool match = false;
+
+	/* No need to go further if we do not own names */
+	if (atomic_read(&conn->name_count) == 0)
+		return false;
+
+	mutex_lock(&conn->lock);
+	list_for_each_entry(e, &conn->names_list, conn_entry) {
+		if (strcmp(e->name, name) == 0) {
+			match = true;
+			break;
+		}
+	}
+	mutex_unlock(&conn->lock);
+
+	return match;
+}
+
+/* query the policy-database for all names of @whom */
+static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
+					const struct cred *conn_creds,
+					struct kdbus_policy_db *db,
+					struct kdbus_conn *whom,
+					unsigned int access)
+{
+	struct kdbus_name_entry *ne;
+	bool pass = false;
+	int res;
+
+	down_read(&db->entries_rwlock);
+	mutex_lock(&whom->lock);
+
+	list_for_each_entry(ne, &whom->names_list, conn_entry) {
+		res = kdbus_policy_query_unlocked(db, conn_creds ? : conn->cred,
+						  ne->name,
+						  kdbus_strhash(ne->name));
+		if (res >= (int)access) {
+			pass = true;
+			break;
+		}
+	}
+
+	mutex_unlock(&whom->lock);
+	up_read(&db->entries_rwlock);
+
+	return pass;
+}
+
+/**
+ * kdbus_conn_policy_own_name() - verify a connection can own the given name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to acquire the well-known name @name.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name)
+{
+	unsigned int hash = kdbus_strhash(name);
+	int res;
+
+	if (!conn_creds)
+		conn_creds = conn->cred;
+
+	if (conn->ep->has_policy) {
+		res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
+					 name, hash);
+		if (res < KDBUS_POLICY_OWN)
+			return false;
+	}
+
+	if (conn->privileged)
+		return true;
+
+	res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
+				 name, hash);
+	return res >= KDBUS_POLICY_OWN;
+}
+
+/**
+ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
+ * @conn:		Connection that tries to talk
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @to:			Connection that is talked to
+ *
+ * This verifies that @conn is allowed to talk to @to.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
+			    const struct cred *conn_creds,
+			    struct kdbus_conn *to)
+{
+	if (!conn_creds)
+		conn_creds = conn->cred;
+
+	if (conn->ep->has_policy &&
+	    !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
+					 to, KDBUS_POLICY_TALK))
+		return false;
+
+	if (conn->privileged)
+		return true;
+	if (uid_eq(conn_creds->euid, to->cred->uid))
+		return true;
+
+	return kdbus_conn_policy_query_all(conn, conn_creds,
+					   &conn->ep->bus->policy_db, to,
+					   KDBUS_POLICY_TALK);
+}
+
+/**
+ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
+ *					   name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to see the well-known name @name. Caller
+ * must hold policy-lock.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
+					 const struct cred *conn_creds,
+					 const char *name)
+{
+	int res;
+
+	/*
+	 * By default, all names are visible on a bus. SEE policies can only be
+	 * installed on custom endpoints, where by default no name is visible.
+	 */
+	if (!conn->ep->has_policy)
+		return true;
+
+	res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
+					  conn_creds ? : conn->cred,
+					  name, kdbus_strhash(name));
+	return res >= KDBUS_POLICY_SEE;
+}
+
+/**
+ * kdbus_conn_policy_see_name() - verify a connection can see a given name
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @name:		Name
+ *
+ * This verifies that @conn is allowed to see the well-known name @name.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name)
+{
+	bool res;
+
+	down_read(&conn->ep->policy_db.entries_rwlock);
+	res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
+	up_read(&conn->ep->policy_db.entries_rwlock);
+
+	return res;
+}
+
+/**
+ * kdbus_conn_policy_see() - verify a connection can see a given peer
+ * @conn:		Connection to verify whether it sees a peer
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @whom:		Peer destination that is to be 'seen'
+ *
+ * This checks whether @conn is able to see @whom.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see(struct kdbus_conn *conn,
+			   const struct cred *conn_creds,
+			   struct kdbus_conn *whom)
+{
+	/*
+	 * By default, all names are visible on a bus, so a connection can
+	 * always see other connections. SEE policies can only be installed on
+	 * custom endpoints, where by default no name is visible and we hide
+	 * peers from each other, unless you see at least _one_ name of the
+	 * peer.
+	 */
+	return !conn->ep->has_policy ||
+	       kdbus_conn_policy_query_all(conn, conn_creds,
+					   &conn->ep->policy_db, whom,
+					   KDBUS_POLICY_SEE);
+}
+
+/**
+ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
+ *					  receive a given kernel notification
+ * @conn:		Connection
+ * @conn_creds:		Credentials of @conn to use for policy check
+ * @kmsg:		The message carrying the notification
+ *
+ * This checks whether @conn is allowed to see the kernel notification @kmsg.
+ *
+ * Return: true if allowed, false if not.
+ */
+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
+					const struct cred *conn_creds,
+					const struct kdbus_kmsg *kmsg)
+{
+	if (WARN_ON(kmsg->msg.src_id != KDBUS_SRC_ID_KERNEL))
+		return false;
+
+	/*
+	 * Depending on the notification type, broadcasted kernel notifications
+	 * have to be filtered:
+	 *
+	 * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
+	 *     to a peer if, and only if, that peer can see the name this
+	 *     notification is for.
+	 *
+	 * KDBUS_ITEM_ID_{ADD,REMOVE}: As new peers cannot have names, and all
+	 *     names are dropped before a peer is removed, those notifications
+	 *     cannot be seen on custom endpoints. Thus, we only pass them
+	 *     through on default endpoints.
+	 */
+
+	switch (kmsg->notify_type) {
+	case KDBUS_ITEM_NAME_ADD:
+	case KDBUS_ITEM_NAME_REMOVE:
+	case KDBUS_ITEM_NAME_CHANGE:
+		return kdbus_conn_policy_see_name(conn, conn_creds,
+						  kmsg->notify_name);
+
+	case KDBUS_ITEM_ID_ADD:
+	case KDBUS_ITEM_ID_REMOVE:
+		return !conn->ep->has_policy;
+
+	default:
+		WARN(1, "Invalid type for notification broadcast: %llu\n",
+		     (unsigned long long)kmsg->notify_type);
+		return false;
+	}
+}
diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
new file mode 100644
index 000000000000..ff25931a4dd0
--- /dev/null
+++ b/ipc/kdbus/connection.h
@@ -0,0 +1,262 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_CONNECTION_H
+#define __KDBUS_CONNECTION_H
+
+#include <linux/atomic.h>
+#include <linux/kref.h>
+#include <linux/lockdep.h>
+#include <linux/path.h>
+
+#include "limits.h"
+#include "metadata.h"
+#include "pool.h"
+#include "queue.h"
+#include "util.h"
+
+#define KDBUS_HELLO_SPECIAL_CONN	(KDBUS_HELLO_ACTIVATOR | \
+					 KDBUS_HELLO_POLICY_HOLDER | \
+					 KDBUS_HELLO_MONITOR)
+
+/**
+ * struct kdbus_conn - connection to a bus
+ * @kref:		Reference count
+ * @active:		Active references to the connection
+ * @id:			Connection ID
+ * @flags:		KDBUS_HELLO_* flags
+ * @attach_flags_send:	KDBUS_ATTACH_* flags for sending
+ * @attach_flags_recv:	KDBUS_ATTACH_* flags for receiving
+ * @description:	Human-readable connection description, used for
+ *			debugging. This field is only set when the
+ *			connection is created.
+ * @ep:			The endpoint this connection belongs to
+ * @lock:		Connection data lock
+ * @msg_users:		Array to account the number of queued messages per
+ *			individual user
+ * @msg_users_max:	Size of the users array
+ * @hentry:		Entry in ID <-> connection map
+ * @ep_entry:		Entry in endpoint
+ * @monitor_entry:	Entry in monitor, if the connection is a monitor
+ * @names_list:		List of well-known names
+ * @names_queue_list:	Well-known names this connection waits for
+ * @reply_list:		List of connections this connection should
+ *			reply to
+ * @work:		Delayed work to handle timeouts
+ * @activator_of:	Well-known name entry this connection acts as an
+ *			activator for
+ * @match_db:		Subscription filter to broadcast messages
+ * @meta:		Active connection creator's metadata/credentials,
+ *			either from the handle or from HELLO
+ * @pool:		The user's buffer to receive messages
+ * @user:		Owner of the connection
+ * @cred:		The credentials of the connection at creation time
+ * @name_count:		Number of owned well-known names
+ * @request_count:	Number of pending requests issued by this
+ *			connection that are waiting for replies from
+ *			other peers
+ * @lost_count:		Number of lost broadcast messages
+ * @wait:		Wake up this endpoint
+ * @queue:		The message queue associated with this connection
+ * @privileged:		Whether this connection is privileged on the bus
+ * @faked_meta:		Whether the metadata was faked on HELLO
+ */
+struct kdbus_conn {
+	struct kref kref;
+	atomic_t active;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map dep_map;
+#endif
+	u64 id;
+	u64 flags;
+	atomic64_t attach_flags_send;
+	atomic64_t attach_flags_recv;
+	const char *description;
+	struct kdbus_ep *ep;
+	struct mutex lock;
+	unsigned int *msg_users;
+	unsigned int msg_users_max;
+	struct hlist_node hentry;
+	struct list_head ep_entry;
+	struct list_head monitor_entry;
+	struct list_head names_list;
+	struct list_head names_queue_list;
+	struct list_head reply_list;
+	struct delayed_work work;
+	struct kdbus_name_entry *activator_of;
+	struct kdbus_match_db *match_db;
+	struct kdbus_meta_proc *meta;
+	struct kdbus_pool *pool;
+	struct kdbus_domain_user *user;
+	const struct cred *cred;
+	atomic_t name_count;
+	atomic_t request_count;
+	atomic_t lost_count;
+	wait_queue_head_t wait;
+	struct kdbus_queue queue;
+
+	bool privileged:1;
+	bool faked_meta:1;
+};
+
+struct kdbus_kmsg;
+struct kdbus_name_registry;
+
+struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
+				  struct kdbus_cmd_hello *hello,
+				  bool privileged);
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
+int kdbus_conn_acquire(struct kdbus_conn *conn);
+void kdbus_conn_release(struct kdbus_conn *conn);
+int kdbus_conn_connect(struct kdbus_conn *conn, struct kdbus_cmd_hello *hello);
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
+bool kdbus_conn_active(const struct kdbus_conn *conn);
+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
+			    struct kdbus_conn *conn_dst,
+			    const struct kdbus_kmsg *kmsg,
+			    struct kdbus_reply *reply);
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+			     struct kdbus_conn *conn_src,
+			     u64 name_id);
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
+
+/* policy */
+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
+				const struct cred *conn_creds,
+				const char *name);
+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
+			    const struct cred *conn_creds,
+			    struct kdbus_conn *to);
+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
+					 const struct cred *curr_creds,
+					 const char *name);
+bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
+				const struct cred *curr_creds,
+				const char *name);
+bool kdbus_conn_policy_see(struct kdbus_conn *conn,
+			   const struct cred *curr_creds,
+			   struct kdbus_conn *whom);
+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
+					const struct cred *curr_creds,
+					const struct kdbus_kmsg *kmsg);
+
+/* command dispatcher */
+int kdbus_cmd_msg_send(struct kdbus_conn *conn_src,
+		       struct kdbus_cmd_send *cmd_send,
+		       struct file *ioctl_file,
+		       struct kdbus_kmsg *kmsg);
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+		       struct kdbus_cmd_recv *recv);
+int kdbus_cmd_conn_info(struct kdbus_conn *conn,
+			struct kdbus_cmd_info *cmd_info);
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+			  const struct kdbus_cmd_update *cmd_update);
+
+/**
+ * kdbus_conn_is_ordinary() - Check if connection is ordinary
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is an ordinary connection
+ */
+static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
+{
+	return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
+}
+
+/**
+ * kdbus_conn_is_activator() - Check if connection is an activator
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is an activator
+ */
+static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_ACTIVATOR;
+}
+
+/**
+ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is a policy holder
+ */
+static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
+}
+
+/**
+ * kdbus_conn_is_monitor() - Check if connection is a monitor
+ * @conn:		The connection to check
+ *
+ * Return: Non-zero if the connection is a monitor
+ */
+static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
+{
+	return conn->flags & KDBUS_HELLO_MONITOR;
+}
+
+/**
+ * kdbus_conn_lock2() - Lock two connections
+ * @a:		connection A to lock or NULL
+ * @b:		connection B to lock or NULL
+ *
+ * Lock two connections at once. As we need to have a stable locking order, we
+ * always lock the connection with lower memory address first.
+ */
+static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
+{
+	if (a < b) {
+		if (a)
+			mutex_lock(&a->lock);
+		if (b && b != a)
+			mutex_lock_nested(&b->lock, !!a);
+	} else {
+		if (b)
+			mutex_lock(&b->lock);
+		if (a && a != b)
+			mutex_lock_nested(&a->lock, !!b);
+	}
+}
+
+/**
+ * kdbus_conn_unlock2() - Unlock two connections
+ * @a:		connection A to unlock or NULL
+ * @b:		connection B to unlock or NULL
+ *
+ * Unlock two connections at once. See kdbus_conn_lock2().
+ */
+static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
+				      struct kdbus_conn *b)
+{
+	if (a)
+		mutex_unlock(&a->lock);
+	if (b && b != a)
+		mutex_unlock(&b->lock);
+}
+
+/**
+ * kdbus_conn_assert_active() - lockdep assert on active lock
+ * @conn:	connection that shall be active
+ *
+ * This verifies via lockdep that the caller holds an active reference to the
+ * given connection.
+ */
+static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
+{
+	lockdep_assert_held(conn);
+}
+
+#endif
diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
new file mode 100644
index 000000000000..95bc3822ed45
--- /dev/null
+++ b/ipc/kdbus/item.c
@@ -0,0 +1,309 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+#define KDBUS_ITEM_VALID(_i, _is, _s)					\
+	((_i)->size >= KDBUS_ITEM_HEADER_SIZE &&			\
+	 (u8 *)(_i) + (_i)->size > (u8 *)(_i) &&			\
+	 (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) &&		\
+	 (u8 *)(_i) >= (u8 *)(_is))
+
+#define KDBUS_ITEMS_END(_i, _is, _s)					\
+	((u8 *)_i == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
+
+/**
+ * kdbus_item_validate_name() - validate an item containing a name
+ * @item:		Item to validate
+ *
+ * Return: zero on success or an negative error code on failure
+ */
+int kdbus_item_validate_name(const struct kdbus_item *item)
+{
+	if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
+		return -EINVAL;
+
+	if (item->size > KDBUS_ITEM_HEADER_SIZE +
+			 KDBUS_SYSNAME_MAX_LEN + 1)
+		return -ENAMETOOLONG;
+
+	if (!kdbus_str_valid(item->str, KDBUS_ITEM_PAYLOAD_SIZE(item)))
+		return -EINVAL;
+
+	return kdbus_sysname_is_valid(item->str);
+}
+
+static int kdbus_item_validate(const struct kdbus_item *item)
+{
+	size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+	size_t l;
+	int ret;
+
+	if (item->size < KDBUS_ITEM_HEADER_SIZE)
+		return -EINVAL;
+
+	switch (item->type) {
+	case KDBUS_ITEM_PAYLOAD_VEC:
+		if (payload_size != sizeof(struct kdbus_vec))
+			return -EINVAL;
+		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_PAYLOAD_OFF:
+		if (payload_size != sizeof(struct kdbus_vec))
+			return -EINVAL;
+		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_PAYLOAD_MEMFD:
+		if (payload_size != sizeof(struct kdbus_memfd))
+			return -EINVAL;
+		if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
+			return -EINVAL;
+		if (item->memfd.fd < 0)
+			return -EBADF;
+		break;
+
+	case KDBUS_ITEM_FDS:
+		if (payload_size % sizeof(int) != 0)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CANCEL_FD:
+		if (payload_size != sizeof(int))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_PARAMETER:
+		if (payload_size != sizeof(struct kdbus_bloom_parameter))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_FILTER:
+		/* followed by the bloom-mask, depends on the bloom-size */
+		if (payload_size < sizeof(struct kdbus_bloom_filter))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_BLOOM_MASK:
+		/* size depends on bloom-size of bus */
+		break;
+
+	case KDBUS_ITEM_CONN_DESCRIPTION:
+	case KDBUS_ITEM_MAKE_NAME:
+		ret = kdbus_item_validate_name(item);
+		if (ret < 0)
+			return ret;
+		break;
+
+	case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+	case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+	case KDBUS_ITEM_ID:
+		if (payload_size != sizeof(u64))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_TIMESTAMP:
+		if (payload_size != sizeof(struct kdbus_timestamp))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CREDS:
+		if (payload_size != sizeof(struct kdbus_creds))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_AUXGROUPS:
+		if (payload_size % sizeof(u32) != 0)
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_NAME:
+	case KDBUS_ITEM_DST_NAME:
+	case KDBUS_ITEM_PID_COMM:
+	case KDBUS_ITEM_TID_COMM:
+	case KDBUS_ITEM_EXE:
+	case KDBUS_ITEM_CMDLINE:
+	case KDBUS_ITEM_CGROUP:
+	case KDBUS_ITEM_SECLABEL:
+		if (!kdbus_str_valid(item->str, payload_size))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_CAPS:
+		/* TODO */
+		break;
+
+	case KDBUS_ITEM_AUDIT:
+		if (payload_size != sizeof(struct kdbus_audit))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_POLICY_ACCESS:
+		if (payload_size != sizeof(struct kdbus_policy_access))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_NAME_ADD:
+	case KDBUS_ITEM_NAME_REMOVE:
+	case KDBUS_ITEM_NAME_CHANGE:
+		if (payload_size < sizeof(struct kdbus_notify_name_change))
+			return -EINVAL;
+		l = payload_size - offsetof(struct kdbus_notify_name_change,
+					    name);
+		if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_ID_ADD:
+	case KDBUS_ITEM_ID_REMOVE:
+		if (payload_size != sizeof(struct kdbus_notify_id_change))
+			return -EINVAL;
+		break;
+
+	case KDBUS_ITEM_REPLY_TIMEOUT:
+	case KDBUS_ITEM_REPLY_DEAD:
+		if (payload_size != 0)
+			return -EINVAL;
+		break;
+
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+/**
+ * kdbus_items_validate() - validate items passed by user-space
+ * @items:		items to validate
+ * @items_size:		number of items
+ *
+ * This verifies that the passed items pointer is consistent and valid.
+ * Furthermore, each item is checked for:
+ *  - valid "size" value
+ *  - payload is of expected type
+ *  - payload is fully included in the item
+ *  - string payloads are zero-terminated
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
+{
+	const struct kdbus_item *item;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
+		if (!KDBUS_ITEM_VALID(item, items, items_size))
+			return -EINVAL;
+
+		ret = kdbus_item_validate(item);
+		if (ret < 0)
+			return ret;
+	}
+
+	if (!KDBUS_ITEMS_END(item, items, items_size))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * kdbus_items_get() - Find unique item in item-array
+ * @items:		items to search through
+ * @items_size:		total size of item array
+ * @item_type:		item-type to find
+ *
+ * Return: Pointer to found item, ERR_PTR if not found or available multiple
+ *         times.
+ */
+struct kdbus_item *kdbus_items_get(const struct kdbus_item *items,
+				   size_t items_size,
+				   unsigned int item_type)
+{
+	const struct kdbus_item *iter, *found = NULL;
+
+	KDBUS_ITEMS_FOREACH(iter, items, items_size) {
+		if (iter->type == item_type) {
+			if (found)
+				return ERR_PTR(-EEXIST);
+			found = iter;
+		}
+	}
+
+	return (struct kdbus_item *)found ? : ERR_PTR(-EBADMSG);
+}
+
+/**
+ * kdbus_items_get_str() - get string from a list of items
+ * @items:		The items to walk
+ * @items_size:		The size of all items
+ * @item_type:		The item type to look for
+ *
+ * This function walks a list of items and searches for items of type
+ * @item_type. If it finds exactly one such item, @str_ret will be set to
+ * the .str member of the item.
+ *
+ * Return: the string, if the item was found exactly once, ERR_PTR(-EEXIST)
+ * if the item was found more than once, and ERR_PTR(-EBADMSG) if there was
+ * no item of the given type.
+ */
+const char *kdbus_items_get_str(const struct kdbus_item *items,
+				size_t items_size,
+				unsigned int item_type)
+{
+	const struct kdbus_item *item;
+
+	item = kdbus_items_get(items, items_size, item_type);
+	return IS_ERR(item) ? ERR_CAST(item) : item->str;
+}
+
+/**
+ * kdbus_item_set() - Set item content
+ * @item:	The item to modify
+ * @type:	The item type to set (KDBUS_ITEM_*)
+ * @data:	Data to copy to item->data, may be %NULL
+ * @len:	Number of bytes in @data
+ *
+ * This sets type, size and data fields of an item. If @data is NULL, the data
+ * memory is cleared.
+ *
+ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
+ * case @len is not 8byte aligned) is cleared by this call.
+ *
+ * Returns: Pointer to the following item.
+ */
+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
+				  const void *data, size_t len)
+{
+	item->type = type;
+	item->size = KDBUS_ITEM_HEADER_SIZE + len;
+
+	if (data) {
+		memcpy(item->data, data, len);
+		memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
+	} else {
+		memset(item->data, 0, KDBUS_ALIGN8(len));
+	}
+
+	return KDBUS_ITEM_NEXT(item);
+}
diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
new file mode 100644
index 000000000000..6c4f26ba226b
--- /dev/null
+++ b/ipc/kdbus/item.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ITEM_H
+#define __KDBUS_ITEM_H
+
+#include <linux/kernel.h>
+#include <uapi/linux/kdbus.h>
+
+#include "util.h"
+
+/* generic access and iterators over a stream of items */
+#define KDBUS_ITEM_NEXT(_i) (typeof(_i))(((u8 *)_i) + KDBUS_ALIGN8((_i)->size))
+#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*_h), _is))
+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
+#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
+#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
+
+#define KDBUS_ITEMS_FOREACH(_i, _is, _s)				\
+	for (_i = _is;							\
+	     ((u8 *)(_i) < (u8 *)(_is) + (_s)) &&			\
+	       ((u8 *)(_i) >= (u8 *)(_is));				\
+	     _i = KDBUS_ITEM_NEXT(_i))
+
+/**
+ * struct kdbus_item_header - Describes the fix part of an item
+ * @size:	The total size of the item
+ * @type:	The item type, one of KDBUS_ITEM_*
+ */
+struct kdbus_item_header {
+	u64 size;
+	u64 type;
+};
+
+int kdbus_item_validate_name(const struct kdbus_item *item);
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
+struct kdbus_item *kdbus_items_get(const struct kdbus_item *items,
+				   size_t items_size,
+				   unsigned int item_type);
+const char *kdbus_items_get_str(const struct kdbus_item *items,
+				size_t items_size,
+				unsigned int item_type);
+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
+				  const void *data, size_t len);
+
+#endif
diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
new file mode 100644
index 000000000000..3ec2afc8ff5c
--- /dev/null
+++ b/ipc/kdbus/message.c
@@ -0,0 +1,598 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <net/sock.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_KMSG_HEADER_SIZE offsetof(struct kdbus_kmsg, msg)
+
+static struct kdbus_msg_resources *kdbus_msg_resources_new(void)
+{
+	struct kdbus_msg_resources *r;
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&r->kref);
+
+	return r;
+}
+
+static void __kdbus_msg_resources_free(struct kref *kref)
+{
+	struct kdbus_msg_resources *r =
+		container_of(kref, struct kdbus_msg_resources, kref);
+	size_t i;
+
+	for (i = 0; i < r->data_count; ++i) {
+		switch (r->data[i].type) {
+		case KDBUS_MSG_DATA_VEC:
+			/* nothing to do */
+			break;
+		case KDBUS_MSG_DATA_MEMFD:
+			if (r->data[i].memfd.file)
+				fput(r->data[i].memfd.file);
+			break;
+		}
+	}
+
+	kfree(r->data);
+
+	kdbus_fput_files(r->fds, r->fds_count);
+	kfree(r->fds);
+
+	kfree(r->dst_name);
+	kfree(r);
+}
+
+/**
+ * kdbus_msg_resources_ref() - Acquire reference to msg resources
+ * @r:		resources to acquire ref to
+ *
+ * Return: The acquired resource
+ */
+struct kdbus_msg_resources *
+kdbus_msg_resources_ref(struct kdbus_msg_resources *r)
+{
+	if (r)
+		kref_get(&r->kref);
+	return r;
+}
+
+/**
+ * kdbus_msg_resources_unref() - Drop reference to msg resources
+ * @r:		resources to drop reference of
+ *
+ * Return: NULL
+ */
+struct kdbus_msg_resources *
+kdbus_msg_resources_unref(struct kdbus_msg_resources *r)
+{
+	if (r)
+		kref_put(&r->kref, __kdbus_msg_resources_free);
+	return NULL;
+}
+
+/**
+ * kdbus_kmsg_free() - free allocated message
+ * @kmsg:		Message
+ */
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg)
+{
+	kdbus_msg_resources_unref(kmsg->res);
+	kdbus_meta_conn_unref(kmsg->conn_meta);
+	kdbus_meta_proc_unref(kmsg->proc_meta);
+	kfree(kmsg->iov);
+	kfree(kmsg);
+}
+
+/**
+ * kdbus_kmsg_new() - allocate message
+ * @extra_size:		Additional size to reserve for data
+ *
+ * Return: new kdbus_kmsg on success, ERR_PTR on failure.
+ */
+struct kdbus_kmsg *kdbus_kmsg_new(size_t extra_size)
+{
+	struct kdbus_kmsg *m;
+	size_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_kmsg) + KDBUS_ITEM_SIZE(extra_size);
+	m = kzalloc(size, GFP_KERNEL);
+	if (!m)
+		return ERR_PTR(-ENOMEM);
+
+	m->msg.size = size - KDBUS_KMSG_HEADER_SIZE;
+	m->msg.items[0].size = KDBUS_ITEM_SIZE(extra_size);
+
+	m->proc_meta = kdbus_meta_proc_new();
+	if (IS_ERR(m->proc_meta)) {
+		ret = PTR_ERR(m->proc_meta);
+		goto exit;
+	}
+
+	m->conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(m->conn_meta)) {
+		ret = PTR_ERR(m->conn_meta);
+		goto exit;
+	}
+
+	return m;
+
+exit:
+	kdbus_kmsg_free(m);
+	return ERR_PTR(ret);
+}
+
+static int kdbus_handle_check_file(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct socket *sock;
+
+	/*
+	 * Don't allow file descriptors in the transport that themselves allow
+	 * file descriptor queueing. This will eventually be allowed once both
+	 * unix domain sockets and kdbus share a generic garbage collector.
+	 */
+
+	if (file->f_op == &kdbus_handle_ep_ops)
+		return -EOPNOTSUPP;
+
+	if (!S_ISSOCK(inode->i_mode))
+		return 0;
+
+	if (file->f_mode & FMODE_PATH)
+		return 0;
+
+	sock = SOCKET_I(inode);
+	if (sock->sk && sock->ops && sock->ops->family == PF_UNIX)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static const char * const zeros = "\0\0\0\0\0\0\0";
+
+/*
+ * kdbus_msg_scan_items() - validate incoming data and prepare parsing
+ * @kmsg:		Message
+ * @bus:		Bus the message is sent over
+ *
+ * Return: 0 on success, negative errno on failure.
+ *
+ * Files references in MEMFD or FDS items are pinned.
+ *
+ * On errors, the caller should drop any taken reference with
+ * kdbus_kmsg_free()
+ */
+static int kdbus_msg_scan_items(struct kdbus_kmsg *kmsg,
+				struct kdbus_bus *bus)
+{
+	struct kdbus_msg_resources *res = kmsg->res;
+	const struct kdbus_msg *msg = &kmsg->msg;
+	const struct kdbus_item *item;
+	size_t n, n_vecs, n_memfds;
+	bool has_bloom = false;
+	bool has_name = false;
+	bool has_fds = false;
+	bool is_broadcast;
+	bool is_signal;
+	u64 vec_size;
+
+	is_broadcast = (msg->dst_id == KDBUS_DST_ID_BROADCAST);
+	is_signal = !!(msg->flags & KDBUS_MSG_SIGNAL);
+
+	/* count data payloads */
+	n_vecs = 0;
+	n_memfds = 0;
+	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_VEC:
+			++n_vecs;
+			break;
+		case KDBUS_ITEM_PAYLOAD_MEMFD:
+			++n_memfds;
+			if (item->memfd.size % 8)
+				++n_vecs;
+			break;
+		default:
+			break;
+		}
+	}
+
+	n = n_vecs + n_memfds;
+	if (n > 0) {
+		res->data = kcalloc(n, sizeof(*res->data), GFP_KERNEL);
+		if (!res->data)
+			return -ENOMEM;
+	}
+
+	if (n_vecs > 0) {
+		kmsg->iov = kcalloc(n_vecs, sizeof(*kmsg->iov), GFP_KERNEL);
+		if (!kmsg->iov)
+			return -ENOMEM;
+	}
+
+	/* import data payloads */
+	n = 0;
+	vec_size = 0;
+	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
+		size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+		struct iovec *iov = kmsg->iov + kmsg->iov_count;
+
+		if (++n > KDBUS_MSG_MAX_ITEMS)
+			return -E2BIG;
+
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_VEC: {
+			struct kdbus_msg_data *d = res->data + res->data_count;
+			void __force __user *ptr = KDBUS_PTR(item->vec.address);
+			size_t size = item->vec.size;
+
+			if (vec_size + size < vec_size)
+				return -EMSGSIZE;
+			if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
+				return -EMSGSIZE;
+
+			d->type = KDBUS_MSG_DATA_VEC;
+			d->size = size;
+
+			if (ptr) {
+				d->vec.off = kmsg->pool_size;
+				iov->iov_base = ptr;
+				iov->iov_len = size;
+			} else {
+				d->vec.off = ~0ULL;
+				iov->iov_base = (char __user *)zeros;
+				iov->iov_len = size % 8;
+			}
+
+			if (kmsg->pool_size + iov->iov_len < kmsg->pool_size)
+				return -EMSGSIZE;
+
+			kmsg->pool_size += iov->iov_len;
+			++kmsg->iov_count;
+			++res->vec_count;
+			++res->data_count;
+			vec_size += size;
+
+			break;
+		}
+
+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
+			struct kdbus_msg_data *d = res->data + res->data_count;
+			u64 start = item->memfd.start;
+			u64 size = item->memfd.size;
+			size_t pad = size % 8;
+			int seals, mask;
+			struct file *f;
+
+			if (kmsg->pool_size + size % 8 < kmsg->pool_size)
+				return -EMSGSIZE;
+			if (start + size < start)
+				return -EMSGSIZE;
+
+			if (item->memfd.fd < 0)
+				return -EBADF;
+
+			f = fget(item->memfd.fd);
+			if (!f)
+				return -EBADF;
+
+			if (pad) {
+				iov->iov_base = (char __user *)zeros;
+				iov->iov_len = pad;
+
+				kmsg->pool_size += pad;
+				++kmsg->iov_count;
+			}
+
+			++res->data_count;
+			++res->memfd_count;
+
+			d->type = KDBUS_MSG_DATA_MEMFD;
+			d->size = size;
+			d->memfd.start = start;
+			d->memfd.file = f;
+
+			/*
+			 * We only accept a sealed memfd file whose content
+			 * cannot be altered by the sender or anybody else
+			 * while it is shared or in-flight. Other files need
+			 * to be passed with KDBUS_MSG_FDS.
+			 */
+			seals = shmem_get_seals(f);
+			if (seals < 0)
+				return -EMEDIUMTYPE;
+
+			mask = F_SEAL_SHRINK | F_SEAL_GROW |
+				F_SEAL_WRITE | F_SEAL_SEAL;
+			if ((seals & mask) != mask)
+				return -ETXTBSY;
+
+			if (start + size > (u64)i_size_read(file_inode(f)))
+				return -EBADF;
+
+			break;
+		}
+
+		case KDBUS_ITEM_FDS: {
+			unsigned int i;
+			unsigned int fds_count = payload_size / sizeof(int);
+
+			/* do not allow multiple fd arrays */
+			if (has_fds)
+				return -EEXIST;
+			has_fds = true;
+
+			/* Do not allow to broadcast file descriptors */
+			if (is_broadcast)
+				return -ENOTUNIQ;
+
+			if (fds_count > KDBUS_MSG_MAX_FDS)
+				return -EMFILE;
+
+			res->fds = kcalloc(fds_count, sizeof(struct file *),
+					   GFP_KERNEL);
+			if (!res->fds)
+				return -ENOMEM;
+
+			for (i = 0; i < fds_count; i++) {
+				int fd = item->fds[i];
+				int ret;
+
+				/*
+				 * Verify the fd and increment the usage count.
+				 * Use fget_raw() to allow passing O_PATH fds.
+				 */
+				if (fd < 0)
+					return -EBADF;
+
+				res->fds[i] = fget_raw(fd);
+				if (!res->fds[i])
+					return -EBADF;
+
+				res->fds_count++;
+
+				ret = kdbus_handle_check_file(res->fds[i]);
+				if (ret < 0)
+					return ret;
+			}
+
+			break;
+		}
+
+		case KDBUS_ITEM_BLOOM_FILTER: {
+			u64 bloom_size;
+
+			/* do not allow multiple bloom filters */
+			if (has_bloom)
+				return -EEXIST;
+			has_bloom = true;
+
+			bloom_size = payload_size -
+				     offsetof(struct kdbus_bloom_filter, data);
+
+			/*
+			* Allow only bloom filter sizes of a multiple of 64bit.
+			*/
+			if (!KDBUS_IS_ALIGNED8(bloom_size))
+				return -EFAULT;
+
+			/* do not allow mismatching bloom filter sizes */
+			if (bloom_size != bus->bloom.size)
+				return -EDOM;
+
+			kmsg->bloom_filter = &item->bloom_filter;
+			break;
+		}
+
+		case KDBUS_ITEM_DST_NAME:
+			/* do not allow multiple names */
+			if (has_name)
+				return -EEXIST;
+			has_name = true;
+
+			if (!kdbus_name_is_valid(item->str, false))
+				return -EINVAL;
+
+			res->dst_name = kstrdup(item->str, GFP_KERNEL);
+			if (!res->dst_name)
+				return -ENOMEM;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* name is needed if no ID is given */
+	if (msg->dst_id == KDBUS_DST_ID_NAME && !has_name)
+		return -EDESTADDRREQ;
+
+	if (is_broadcast) {
+		/* Broadcasts can't take names */
+		if (has_name)
+			return -EBADMSG;
+
+		/* All broadcasts have to be signals */
+		if (!is_signal)
+			return -EBADMSG;
+
+		/* Timeouts are not allowed for broadcasts */
+		if (msg->timeout_ns > 0)
+			return -ENOTUNIQ;
+	}
+
+	/*
+	 * Signal messages require a bloom filter, and bloom filters are
+	 * only valid with signals.
+	 */
+	if (is_signal ^ has_bloom)
+		return -EBADMSG;
+
+	return 0;
+}
+
+/**
+ * kdbus_kmsg_new_from_cmd() - create kernel message from send payload
+ * @conn:		Connection
+ * @buf:		The user-buffer location of @cmd
+ * @cmd_send:		Payload of KDBUS_CMD_SEND
+ *
+ * Return: a new kdbus_kmsg on success, ERR_PTR on failure.
+ */
+struct kdbus_kmsg *kdbus_kmsg_new_from_cmd(struct kdbus_conn *conn,
+					   void __user *buf,
+					   struct kdbus_cmd_send *cmd_send)
+{
+	struct kdbus_kmsg *m;
+	u64 size;
+	int ret;
+
+	ret = kdbus_copy_from_user(&size, KDBUS_PTR(cmd_send->msg_address),
+				   sizeof(size));
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	if (size < sizeof(struct kdbus_msg) || size > KDBUS_MSG_MAX_SIZE)
+		return ERR_PTR(-EINVAL);
+
+	m = kmalloc(size + KDBUS_KMSG_HEADER_SIZE, GFP_KERNEL);
+	if (!m)
+		return ERR_PTR(-ENOMEM);
+
+	memset(m, 0, KDBUS_KMSG_HEADER_SIZE);
+
+	m->proc_meta = kdbus_meta_proc_new();
+	if (IS_ERR(m->proc_meta)) {
+		ret = PTR_ERR(m->proc_meta);
+		m->proc_meta = NULL;
+		goto exit_free;
+	}
+
+	m->conn_meta = kdbus_meta_conn_new();
+	if (IS_ERR(m->conn_meta)) {
+		ret = PTR_ERR(m->conn_meta);
+		m->conn_meta = NULL;
+		goto exit_free;
+	}
+
+	if (copy_from_user(&m->msg, KDBUS_PTR(cmd_send->msg_address), size)) {
+		ret = -EFAULT;
+		goto exit_free;
+	}
+
+	if (m->msg.size != size) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+
+	ret = kdbus_check_and_write_flags(m->msg.flags, buf,
+					  offsetof(struct kdbus_cmd_send,
+						   kernel_msg_flags),
+					  KDBUS_MSG_EXPECT_REPLY	|
+					  KDBUS_MSG_NO_AUTO_START	|
+					  KDBUS_MSG_SIGNAL);
+	if (ret < 0)
+		goto exit_free;
+
+	ret = kdbus_items_validate(m->msg.items,
+				   KDBUS_ITEMS_SIZE(&m->msg, items));
+	if (ret < 0)
+		goto exit_free;
+
+	m->res = kdbus_msg_resources_new();
+	if (IS_ERR(m->res)) {
+		ret = PTR_ERR(m->res);
+		m->res = NULL;
+		goto exit_free;
+	}
+
+	/* do not accept kernel-generated messages */
+	if (m->msg.payload_type == KDBUS_PAYLOAD_KERNEL) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+
+	if (m->msg.flags & KDBUS_MSG_EXPECT_REPLY) {
+		/* requests for replies need timeout and cookie */
+		if (m->msg.timeout_ns == 0 || m->msg.cookie == 0) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+
+		/* replies may not be expected for broadcasts */
+		if (m->msg.dst_id == KDBUS_DST_ID_BROADCAST) {
+			ret = -ENOTUNIQ;
+			goto exit_free;
+		}
+
+		/* replies may not be expected for signals */
+		if (m->msg.flags & KDBUS_MSG_SIGNAL) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+	} else {
+		/*
+		 * KDBUS_SEND_SYNC_REPLY is only valid together with
+		 * KDBUS_MSG_EXPECT_REPLY
+		 */
+		if (cmd_send->flags & KDBUS_SEND_SYNC_REPLY) {
+			ret = -EINVAL;
+			goto exit_free;
+		}
+	}
+
+	ret = kdbus_msg_scan_items(m, conn->ep->bus);
+	if (ret < 0)
+		goto exit_free;
+
+	/* patch-in the source of this message */
+	if (m->msg.src_id > 0 && m->msg.src_id != conn->id) {
+		ret = -EINVAL;
+		goto exit_free;
+	}
+	m->msg.src_id = conn->id;
+
+	return m;
+
+exit_free:
+	kdbus_kmsg_free(m);
+	return ERR_PTR(ret);
+}
diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
new file mode 100644
index 000000000000..28f1893b002a
--- /dev/null
+++ b/ipc/kdbus/message.h
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MESSAGE_H
+#define __KDBUS_MESSAGE_H
+
+#include "util.h"
+#include "metadata.h"
+
+/**
+ * enum kdbus_msg_data_type - Type of kdbus_msg_data payloads
+ * @KDBUS_MSG_DATA_VEC:		Data vector provided by user-space
+ * @KDBUS_MSG_DATA_MEMFD:	Memfd payload
+ */
+enum kdbus_msg_data_type {
+	KDBUS_MSG_DATA_VEC,
+	KDBUS_MSG_DATA_MEMFD,
+};
+
+/**
+ * struct kdbus_msg_data - Data payload as stored by messages
+ * @type:	Type of payload (KDBUS_MSG_DATA_*)
+ * @size:	Size of the described payload
+ * @off:	The offset, relative to the vec slice
+ * @start:	Offset inside the memfd
+ * @file:	Backing file referenced by the memfd
+ */
+struct kdbus_msg_data {
+	unsigned int type;
+	u64 size;
+
+	union {
+		struct {
+			u64 off;
+		} vec;
+		struct {
+			u64 start;
+			struct file *file;
+		} memfd;
+	};
+};
+
+/**
+ * struct kdbus_kmsg_resources - resources of a message
+ * @kref:		Reference counter
+ * @dst_name:		Short-cut to msg for faster lookup
+ * @fds:		Array of file descriptors to pass
+ * @fds_count:		Number of file descriptors to pass
+ * @data:		Array of data payloads
+ * @vec_count:		Number of VEC entries
+ * @memfd_count:	Number of MEMFD entries in @data
+ * @data_count:		Sum of @vec_count + @memfd_count
+ */
+struct kdbus_msg_resources {
+	struct kref kref;
+	const char *dst_name;
+
+	struct file **fds;
+	unsigned int fds_count;
+
+	struct kdbus_msg_data *data;
+	size_t vec_count;
+	size_t memfd_count;
+	size_t data_count;
+};
+
+struct kdbus_msg_resources *
+kdbus_msg_resources_ref(struct kdbus_msg_resources *r);
+struct kdbus_msg_resources *
+kdbus_msg_resources_unref(struct kdbus_msg_resources *r);
+
+/**
+ * struct kdbus_kmsg - internal message handling data
+ * @seq:		Domain-global message sequence number
+ * @notify_type:	Short-cut for faster lookup
+ * @notify_old_id:	Short-cut for faster lookup
+ * @notify_new_id:	Short-cut for faster lookup
+ * @notify_name:	Short-cut for faster lookup
+ * @dst_name_id:	Short-cut to msg for faster lookup
+ * @bloom_filter:	Bloom filter to match message properties
+ * @bloom_generation:	Generation of bloom element set
+ * @notify_entry:	List of kernel-generated notifications
+ * @iov:		Array of iovec, describing the payload to copy
+ * @iov_count:		Number of array members in @iov
+ * @pool_size:		Overall size of inlined data referenced by @iov
+ * @proc_meta:		Appended SCM-like metadata of the sending process
+ * @conn_meta:		Appended SCM-like metadata of the sending connection
+ * @res:		Message resources
+ * @msg:		Message from or to userspace
+ */
+struct kdbus_kmsg {
+	u64 seq;
+	u64 notify_type;
+	u64 notify_old_id;
+	u64 notify_new_id;
+	const char *notify_name;
+
+	u64 dst_name_id;
+	const struct kdbus_bloom_filter *bloom_filter;
+	u64 bloom_generation;
+	struct list_head notify_entry;
+
+	struct iovec *iov;
+	size_t iov_count;
+	u64 pool_size;
+
+	struct kdbus_meta_proc *proc_meta;
+	struct kdbus_meta_conn *conn_meta;
+	struct kdbus_msg_resources *res;
+
+	/* variable size, must be the last member */
+	struct kdbus_msg msg;
+};
+
+struct kdbus_conn;
+
+struct kdbus_kmsg *kdbus_kmsg_new(size_t extra_size);
+struct kdbus_kmsg *kdbus_kmsg_new_from_cmd(struct kdbus_conn *conn,
+					   void __user *buf,
+					   struct kdbus_cmd_send *cmd_send);
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg);
+
+#endif
diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
new file mode 100644
index 000000000000..53ab51a0f791
--- /dev/null
+++ b/ipc/kdbus/queue.c
@@ -0,0 +1,505 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uio.h>
+
+#include "util.h"
+#include "domain.h"
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "queue.h"
+#include "reply.h"
+
+/**
+ * kdbus_queue_entry_add() - Add an queue entry to a queue
+ * @queue:	The queue to attach the item to
+ * @entry:	The entry to attach
+ *
+ * Adds a previously allocated queue item to a queue, and maintains the
+ * priority r/b tree.
+ */
+/* add queue entry to connection, maintain priority queue */
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+			   struct kdbus_queue_entry *entry)
+{
+	struct rb_node **n, *pn = NULL;
+	bool highest = true;
+
+	/* sort into priority entry tree */
+	n = &queue->msg_prio_queue.rb_node;
+	while (*n) {
+		struct kdbus_queue_entry *e;
+
+		pn = *n;
+		e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
+
+		/* existing node for this priority, add to its list */
+		if (likely(entry->msg.priority == e->msg.priority)) {
+			list_add_tail(&entry->prio_entry, &e->prio_entry);
+			goto prio_done;
+		}
+
+		if (entry->msg.priority < e->msg.priority) {
+			n = &pn->rb_left;
+		} else {
+			n = &pn->rb_right;
+			highest = false;
+		}
+	}
+
+	/* cache highest-priority entry */
+	if (highest)
+		queue->msg_prio_highest = &entry->prio_node;
+
+	/* new node for this priority */
+	rb_link_node(&entry->prio_node, pn, n);
+	rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
+	INIT_LIST_HEAD(&entry->prio_entry);
+
+prio_done:
+	/* add to unsorted fifo list */
+	list_add_tail(&entry->entry, &queue->msg_list);
+	queue->msg_count++;
+}
+
+/**
+ * kdbus_queue_entry_peek() - Retrieves an entry from a queue
+ *
+ * @queue:		The queue
+ * @priority:		The minimum priority of the entry to peek
+ * @use_priority:	Boolean flag whether or not to peek by priority
+ *
+ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
+ * The entry is not freed, put off the queue's lists or anything else.
+ *
+ * Return: the peeked queue entry on success, ERR_PTR(-ENOMSG) if there is no
+ * entry with the requested priority, or ERR_PTR(-EAGAIN) if there are no
+ * entries at all.
+ */
+struct kdbus_queue_entry *kdbus_queue_entry_peek(struct kdbus_queue *queue,
+						 s64 priority,
+						 bool use_priority)
+{
+	struct kdbus_queue_entry *e;
+
+	if (queue->msg_count == 0)
+		return ERR_PTR(-EAGAIN);
+
+	if (use_priority) {
+		/* get next entry with highest priority */
+		e = rb_entry(queue->msg_prio_highest,
+			     struct kdbus_queue_entry, prio_node);
+
+		/* no entry with the requested priority */
+		if (e->msg.priority > priority)
+			return ERR_PTR(-ENOMSG);
+	} else {
+		/* ignore the priority, return the next entry in the entry */
+		e = list_first_entry(&queue->msg_list,
+				     struct kdbus_queue_entry, entry);
+	}
+
+	return e;
+}
+
+/**
+ * kdbus_queue_entry_remove() - Remove an entry from a queue
+ * @conn:	The connection containing the queue
+ * @entry:	The entry to remove
+ *
+ * Remove an entry from both the queue's list and the priority r/b tree.
+ */
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+			      struct kdbus_queue_entry *entry)
+{
+	struct kdbus_queue *queue = &conn->queue;
+
+	list_del(&entry->entry);
+	queue->msg_count--;
+
+	/* user quota */
+	if (entry->user) {
+		BUG_ON(conn->msg_users[entry->user->idr] == 0);
+		conn->msg_users[entry->user->idr]--;
+		entry->user = kdbus_domain_user_unref(entry->user);
+	}
+
+	/* the queue is empty, remove the user quota accounting */
+	if (queue->msg_count == 0 && conn->msg_users_max > 0) {
+		kfree(conn->msg_users);
+		conn->msg_users = NULL;
+		conn->msg_users_max = 0;
+	}
+
+	if (list_empty(&entry->prio_entry)) {
+		/*
+		 * Single entry for this priority, update cached
+		 * highest-priority entry, remove the tree node.
+		 */
+		if (queue->msg_prio_highest == &entry->prio_node)
+			queue->msg_prio_highest = rb_next(&entry->prio_node);
+
+		rb_erase(&entry->prio_node, &queue->msg_prio_queue);
+	} else {
+		struct kdbus_queue_entry *q;
+
+		/*
+		 * Multiple entries for this priority entry, get next one in
+		 * the list. Update cached highest-priority entry, store the
+		 * new one as the tree node.
+		 */
+		q = list_first_entry(&entry->prio_entry,
+				     struct kdbus_queue_entry, prio_entry);
+		list_del(&entry->prio_entry);
+
+		if (queue->msg_prio_highest == &entry->prio_node)
+			queue->msg_prio_highest = &q->prio_node;
+
+		rb_replace_node(&entry->prio_node, &q->prio_node,
+				&queue->msg_prio_queue);
+	}
+}
+
+/**
+ * kdbus_queue_entry_alloc() - allocate a queue entry
+ * @pool:	The pool to allocate the slice in
+ * @kmsg:	The kmsg object the queue entry should track
+ *
+ * Allocates a queue entry based on a given kmsg and allocate space for
+ * the message payload and the requested metadata in the connection's pool.
+ * The entry is not actually added to the queue's lists at this point.
+ *
+ * Return: the allocated entry on success, or an ERR_PTR on failures.
+ */
+struct kdbus_queue_entry *kdbus_queue_entry_alloc(struct kdbus_pool *pool,
+						  const struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_msg_resources *res = kmsg->res;
+	const struct kdbus_msg *msg = &kmsg->msg;
+	struct kdbus_queue_entry *entry;
+	int ret = 0;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&entry->entry);
+	entry->msg_res = kdbus_msg_resources_ref(res);
+	entry->proc_meta = kdbus_meta_proc_ref(kmsg->proc_meta);
+	entry->conn_meta = kdbus_meta_conn_ref(kmsg->conn_meta);
+	memcpy(&entry->msg, msg, sizeof(*msg));
+
+	if (kmsg->iov_count) {
+		size_t pool_avail = kdbus_pool_remain(pool);
+
+		/* do not give out more than half of the remaining space */
+		if (kmsg->pool_size < pool_avail &&
+		    kmsg->pool_size > pool_avail / 2) {
+			ret = -EXFULL;
+			goto exit_free_entry;
+		}
+
+		/* allocate the needed space in the pool of the receiver */
+		entry->slice_vecs = kdbus_pool_slice_alloc(pool,
+							   kmsg->pool_size,
+							   NULL, kmsg->iov,
+							   kmsg->iov_count);
+		if (IS_ERR(entry->slice_vecs)) {
+			ret = PTR_ERR(entry->slice_vecs);
+			entry->slice_vecs = NULL;
+			goto exit_free_entry;
+		}
+	}
+
+	if (msg->src_id == KDBUS_SRC_ID_KERNEL) {
+		size_t extra_size = msg->size - sizeof(*msg);
+
+		entry->msg_extra = kmemdup((u8 *)msg + sizeof(*msg),
+					   extra_size, GFP_KERNEL);
+		if (!entry->msg_extra) {
+			ret = -ENOMEM;
+			goto exit_free_slice;
+		}
+
+		entry->msg_extra_size = extra_size;
+	}
+
+	return entry;
+
+exit_free_slice:
+	kdbus_pool_slice_release(entry->slice_vecs);
+exit_free_entry:
+	kdbus_queue_entry_free(entry);
+	return ERR_PTR(ret);
+}
+
+static struct kdbus_item *
+kdbus_msg_make_items(const struct kdbus_msg_resources *res, off_t payload_off,
+		     bool install_fds, u64 *return_flags, size_t *out_size)
+{
+	struct kdbus_item *items, *item;
+	bool incomplete_fds = false;
+	size_t i, size = 0;
+
+	/* sum up how much space we need for the 'control' part */
+	size += res->vec_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+	size += res->memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+
+	if (res->fds_count)
+		size += KDBUS_ITEM_SIZE(sizeof(int) * res->fds_count);
+
+	if (res->dst_name)
+		size += KDBUS_ITEM_SIZE(strlen(res->dst_name) + 1);
+
+	items = kzalloc(size, GFP_KERNEL);
+	if (!items)
+		return ERR_PTR(-ENOMEM);
+
+	item = items;
+
+	if (res->dst_name) {
+		kdbus_item_set(item, KDBUS_ITEM_DST_NAME,
+			       res->dst_name, strlen(res->dst_name) + 1);
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	for (i = 0; i < res->data_count; ++i) {
+		struct kdbus_msg_data *d = res->data + i;
+		struct kdbus_memfd m = {};
+		struct kdbus_vec v = {};
+
+		switch (d->type) {
+		case KDBUS_MSG_DATA_VEC:
+			v.size = d->size;
+			v.offset = d->vec.off;
+			if (v.offset != ~0ULL)
+				v.offset += payload_off;
+
+			kdbus_item_set(item, KDBUS_ITEM_PAYLOAD_OFF,
+				       &v, sizeof(v));
+			item = KDBUS_ITEM_NEXT(item);
+			break;
+
+		case KDBUS_MSG_DATA_MEMFD:
+			m.start = d->memfd.start;
+			m.size = d->size;
+			m.fd = -1;
+			if (install_fds) {
+				m.fd = get_unused_fd_flags(O_CLOEXEC);
+				if (m.fd >= 0)
+					fd_install(m.fd,
+						   get_file(d->memfd.file));
+				else
+					incomplete_fds = true;
+			}
+
+			kdbus_item_set(item, KDBUS_ITEM_PAYLOAD_MEMFD,
+				       &m, sizeof(m));
+			item = KDBUS_ITEM_NEXT(item);
+			break;
+		}
+	}
+
+	if (res->fds_count) {
+		kdbus_item_set(item, KDBUS_ITEM_FDS,
+			       NULL, sizeof(int) * res->fds_count);
+		for (i = 0; i < res->fds_count; i++) {
+			if (install_fds) {
+				item->fds[i] = get_unused_fd_flags(O_CLOEXEC);
+				if (item->fds[i] >= 0)
+					fd_install(item->fds[i],
+						   get_file(res->fds[i]));
+				else
+					incomplete_fds = true;
+			} else {
+				item->fds[i] = -1;
+			}
+		}
+
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	/* Make sure the sizes actually match */
+	BUG_ON((u8 *)item != (u8 *)items + size);
+
+	if (incomplete_fds)
+		*return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
+
+	*out_size = size;
+	return items;
+}
+
+/**
+ * kdbus_queue_entry_install() - install message components into the
+ *				 receiver's process
+ * @entry:		The queue entry to install
+ * @conn_dst:		The receiver connection
+ * @return_flags:	Pointer to store the return flags for userspace
+ * @install_fds:	Whether or not to install associated file descriptors
+ *
+ * This function will create a slice to transport the message header, the
+ * metadata items and other items for information stored in @entry, and
+ * store it as entry->slice.
+ *
+ * If @install_fds is %true, file descriptors will as well be installed.
+ * This function must always be called from the task context of the receiver.
+ *
+ * Return: 0 on success.
+ */
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
+			      struct kdbus_conn *conn_dst,
+			      u64 *return_flags, bool install_fds)
+{
+	size_t meta_size = 0, payload_items_size = 0;
+	struct kdbus_item *payload_items = NULL;
+	struct kdbus_item *meta_items = NULL;
+	off_t payload_off = 0;
+	struct kvec kvec[4];
+	size_t kvec_count = 0;
+	int ret = 0;
+
+	if (entry->proc_meta || entry->conn_meta) {
+		u64 attach_flags = atomic64_read(&conn_dst->attach_flags_recv);
+
+		meta_items = kdbus_meta_export(entry->proc_meta,
+					       entry->conn_meta,
+					       attach_flags,
+					       &meta_size);
+		if (IS_ERR(meta_items)) {
+			ret = PTR_ERR(meta_items);
+			meta_items = NULL;
+			goto exit_free;
+		}
+	}
+
+	/*
+	 * The offsets stored in the slice are relative to the the start
+	 * of the payload slice. When exporting them, they need to become
+	 * relative to the pool, so get the payload slice's offset first.
+	 */
+	if (entry->slice_vecs)
+		payload_off = kdbus_pool_slice_offset(entry->slice_vecs);
+
+	if (entry->msg_res) {
+		payload_items = kdbus_msg_make_items(entry->msg_res,
+						     payload_off,
+						     install_fds, return_flags,
+						     &payload_items_size);
+		if (IS_ERR(payload_items)) {
+			ret = PTR_ERR(payload_items);
+			payload_items = NULL;
+			goto exit_free;
+		}
+	}
+
+	entry->msg.size = 0;
+
+	kdbus_kvec_set(&kvec[kvec_count++], &entry->msg, sizeof(entry->msg),
+		       &entry->msg.size);
+
+	if (entry->msg_extra_size)
+		kdbus_kvec_set(&kvec[kvec_count++], entry->msg_extra,
+			       entry->msg_extra_size, &entry->msg.size);
+
+	if (payload_items_size)
+		kdbus_kvec_set(&kvec[kvec_count++], payload_items,
+			       payload_items_size, &entry->msg.size);
+
+	if (meta_size)
+		kdbus_kvec_set(&kvec[kvec_count++], meta_items, meta_size,
+			       &entry->msg.size);
+
+	entry->slice = kdbus_pool_slice_alloc(conn_dst->pool, entry->msg.size,
+					      kvec, NULL, kvec_count);
+	if (IS_ERR(entry->slice)) {
+		ret = PTR_ERR(entry->slice);
+		entry->slice = NULL;
+		goto exit_free;
+	}
+
+	kdbus_pool_slice_set_child(entry->slice, entry->slice_vecs);
+
+exit_free:
+	kfree(payload_items);
+	kfree(meta_items);
+
+	return ret;
+}
+
+/**
+ * kdbus_queue_entry_move() - move an entry from one queue to another
+ * @conn_dst:	Connection holding the queue to copy to
+ * @entry:	The queue entry to move
+ *
+ * Return: 0 on success, nagative error otherwise
+ */
+int kdbus_queue_entry_move(struct kdbus_conn *conn_dst,
+			   struct kdbus_queue_entry *entry)
+{
+	int ret = 0;
+
+	if (entry->slice_vecs)
+		ret = kdbus_pool_slice_move(conn_dst->pool, &entry->slice_vecs);
+
+	if (ret < 0)
+		kdbus_queue_entry_free(entry);
+	else
+		kdbus_queue_entry_add(&conn_dst->queue, entry);
+
+	return ret;
+}
+
+/**
+ * kdbus_queue_entry_free() - free resources of an entry
+ * @entry:	The entry to free
+ *
+ * Removes resources allocated by a queue entry, along with the entry itself.
+ * Note that the entry's slice is not freed at this point.
+ */
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
+{
+	kdbus_msg_resources_unref(entry->msg_res);
+	kdbus_meta_conn_unref(entry->conn_meta);
+	kdbus_meta_proc_unref(entry->proc_meta);
+	kdbus_reply_unref(entry->reply);
+	kfree(entry->msg_extra);
+	kfree(entry);
+}
+
+/**
+ * kdbus_queue_init() - initialize data structure related to a queue
+ * @queue:	The queue to initialize
+ */
+void kdbus_queue_init(struct kdbus_queue *queue)
+{
+	INIT_LIST_HEAD(&queue->msg_list);
+	queue->msg_prio_queue = RB_ROOT;
+}
diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
new file mode 100644
index 000000000000..8e9961fd3ecd
--- /dev/null
+++ b/ipc/kdbus/queue.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_QUEUE_H
+#define __KDBUS_QUEUE_H
+
+struct kdbus_domain_user;
+
+/**
+ * struct kdbus_queue - a connection's message queue
+ * @msg_count		Number of messages in the queue
+ * @msg_list:		List head for kdbus_queue_entry objects
+ * @msg_prio_queue:	RB tree root for messages, sorted by priority
+ * @msg_prio_highest:	Link to the RB node referencing the message with the
+ *			highest priority in the tree.
+ */
+struct kdbus_queue {
+	size_t msg_count;
+	struct list_head msg_list;
+	struct rb_root msg_prio_queue;
+	struct rb_node *msg_prio_highest;
+};
+
+/**
+ * struct kdbus_queue_entry - messages waiting to be read
+ * @entry:		Entry in the connection's list
+ * @prio_node:		Entry in the priority queue tree
+ * @prio_entry:		Queue tree node entry in the list of one priority
+ * @msg:		Message header, either as received from userspace
+ *			process, or as crafted by the kernel as notification
+ * @msg_extra:		For notifications, contains more fixed parts of a
+ *			message, which will be copied to the final message
+ *			slice verbatim.
+ * @slice:		Slice in the receiver's pool for the message
+ * @slice_vecs:		Slice in the receiver's pool for message payload
+ * @memfds:		Arrays of offsets where to update the installed
+ *			fd number
+ * @dst_name:		Destination well-known-name
+ * @vecs:		Array of struct kdbus_queue_vecs
+ * @vec_count:		Number of elements in @vecs
+ * @memfds_fp:		Array memfd files queued up for this message
+ * @memfd_size:		Array of size_t values, describing the sizes of memfds
+ * @memfds_count:	Number of elements in @memfds_fp
+ * @fds_fp:		Array of passed files queued up for this message
+ * @fds_count:		Number of elements in @fds_fp
+ * @dst_name_id:	The sequence number of the name this message is
+ *			addressed to, 0 for messages sent to an ID
+ * @proc_meta:		Process metadata, captured at message arrival
+ * @conn_meta:		Connection metadata, captured at message arrival
+ * @reply:		The reply block if a reply to this message is expected.
+ * @user:		Index in per-user message counter, -1 for unused
+ */
+struct kdbus_queue_entry {
+	struct list_head entry;
+	struct rb_node prio_node;
+	struct list_head prio_entry;
+
+	struct kdbus_msg msg;
+
+	char *msg_extra;
+	size_t msg_extra_size;
+
+	struct kdbus_pool_slice *slice;
+	struct kdbus_pool_slice *slice_vecs;
+
+	u64 dst_name_id;
+
+	struct kdbus_msg_resources *msg_res;
+	struct kdbus_meta_proc *proc_meta;
+	struct kdbus_meta_conn *conn_meta;
+	struct kdbus_reply *reply;
+	struct kdbus_domain_user *user;
+};
+
+struct kdbus_kmsg;
+
+void kdbus_queue_init(struct kdbus_queue *queue);
+
+struct kdbus_queue_entry *
+kdbus_queue_entry_alloc(struct kdbus_pool *pool,
+			const struct kdbus_kmsg *kmsg);
+int kdbus_queue_entry_move(struct kdbus_conn *conn_dst,
+			   struct kdbus_queue_entry *entry);
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
+
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+			   struct kdbus_queue_entry *entry);
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+			      struct kdbus_queue_entry *entry);
+struct kdbus_queue_entry *kdbus_queue_entry_peek(struct kdbus_queue *queue,
+						 s64 priority,
+						 bool use_priority);
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
+			      struct kdbus_conn *conn_dst,
+			      u64 *return_flags, bool install_fds);
+
+#endif /* __KDBUS_QUEUE_H */
diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
new file mode 100644
index 000000000000..9e3559d1ed4a
--- /dev/null
+++ b/ipc/kdbus/reply.c
@@ -0,0 +1,262 @@
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "message.h"
+#include "metadata.h"
+#include "domain.h"
+#include "item.h"
+#include "notify.h"
+#include "policy.h"
+#include "reply.h"
+#include "util.h"
+
+/**
+ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
+ * @reply_src:		The connection a reply is expected from
+ * @reply_dst:		The connection this reply object belongs to
+ * @msg:		Message associated with the reply
+ * @name_entry:		Name entry used to send the message
+ * @sync:		Whether or not to make this reply synchronous
+ *
+ * Allocate and fill a new kdbus_reply object.
+ *
+ * Return: New kdbus_conn object on success, ERR_PTR on error.
+ */
+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
+				    struct kdbus_conn *reply_dst,
+				    const struct kdbus_msg *msg,
+				    struct kdbus_name_entry *name_entry,
+				    bool sync)
+{
+	struct kdbus_reply *r;
+	int ret = 0;
+
+	if (atomic_inc_return(&reply_dst->request_count) >
+	    KDBUS_CONN_MAX_REQUESTS_PENDING) {
+		ret = -EMLINK;
+		goto exit_dec_request_count;
+	}
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r) {
+		ret = -ENOMEM;
+		goto exit_dec_request_count;
+	}
+
+	kref_init(&r->kref);
+	INIT_LIST_HEAD(&r->entry);
+	r->reply_src = kdbus_conn_ref(reply_src);
+	r->reply_dst = kdbus_conn_ref(reply_dst);
+	r->cookie = msg->cookie;
+	r->name_id = name_entry ? name_entry->name_id : 0;
+	r->deadline_ns = msg->timeout_ns;
+
+	if (sync) {
+		r->sync = true;
+		r->waiting = true;
+	}
+
+exit_dec_request_count:
+	if (ret < 0) {
+		atomic_dec(&reply_dst->request_count);
+		return ERR_PTR(ret);
+	}
+
+	return r;
+}
+
+static void __kdbus_reply_free(struct kref *kref)
+{
+	struct kdbus_reply *reply =
+		container_of(kref, struct kdbus_reply, kref);
+
+	atomic_dec(&reply->reply_dst->request_count);
+	kdbus_conn_unref(reply->reply_src);
+	kdbus_conn_unref(reply->reply_dst);
+	kfree(reply);
+}
+
+/**
+ * kdbus_reply_ref() - Increase reference on kdbus_reply
+ * @r:		The reply, may be %NULL
+ *
+ * Return: The reply object with an extra reference
+ */
+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
+{
+	if (r)
+		kref_get(&r->kref);
+	return r;
+}
+
+/**
+ * kdbus_reply_unref() - Decrease reference on kdbus_reply
+ * @r:		The reply, may be %NULL
+ *
+ * Return: NULL
+ */
+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
+{
+	if (r)
+		kref_put(&r->kref, __kdbus_reply_free);
+	return NULL;
+}
+
+/**
+ * kdbus_reply_link() - Link reply object into target connection
+ * @r:		Reply to link
+ */
+void kdbus_reply_link(struct kdbus_reply *r)
+{
+	if (WARN_ON(!list_empty(&r->entry)))
+		return;
+
+	list_add(&r->entry, &r->reply_dst->reply_list);
+	kdbus_reply_ref(r);
+}
+
+/**
+ * kdbus_reply_unlink() - Unlink reply object from target connection
+ * @r:		Reply to unlink
+ */
+void kdbus_reply_unlink(struct kdbus_reply *r)
+{
+	if (!list_empty(&r->entry)) {
+		list_del_init(&r->entry);
+		kdbus_reply_unref(r);
+	}
+}
+
+/**
+ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
+ * @reply:	The reply object
+ * @err:	Error code to set on the remote side
+ *
+ * Remove the synchronous reply object from its connection reply_list, and
+ * wake up remote peer (method origin) with the appropriate synchronous reply
+ * code.
+ */
+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
+{
+	if (WARN_ON(!reply->sync))
+		return;
+
+	reply->waiting = false;
+	reply->err = err;
+	wake_up_interruptible(&reply->reply_dst->wait);
+}
+
+/**
+ * kdbus_reply_find() - Find the corresponding reply object
+ * @replying:	The replying connection
+ * @reply_dst:	The connection the reply will be sent to
+ *		(method origin)
+ * @cookie:	The cookie of the requesting message
+ *
+ * Lookup a reply object that should be sent as a reply by
+ * @replying to @reply_dst with the given cookie.
+ *
+ * For optimizations, callers should first check 'request_count' of
+ * @reply_dst to see if the connection has issued any requests
+ * that are waiting for replies, before calling this function.
+ *
+ * Callers must take the @reply_dst lock.
+ *
+ * Return: the corresponding reply object or NULL if not found
+ */
+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
+				     struct kdbus_conn *reply_dst,
+				     u64 cookie)
+{
+	struct kdbus_reply *r, *reply = NULL;
+
+	list_for_each_entry(r, &reply_dst->reply_list, entry) {
+		if (r->reply_src == replying &&
+		    r->cookie == cookie) {
+			reply = r;
+			break;
+		}
+	}
+
+	return reply;
+}
+
+/**
+ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
+ *				  connection for exceeded timeouts
+ * @work:		Work struct of the connection to scan
+ *
+ * Walk the list of replies stored with a connection and look for entries
+ * that have exceeded their timeout. If such an entry is found, a timeout
+ * notification is sent to the waiting peer, and the reply is removed from
+ * the list.
+ *
+ * The work is rescheduled to the nearest timeout found during the list
+ * iteration.
+ */
+void kdbus_reply_list_scan_work(struct work_struct *work)
+{
+	struct kdbus_conn *conn =
+		container_of(work, struct kdbus_conn, work.work);
+	struct kdbus_reply *reply, *reply_tmp;
+	u64 deadline = ~0ULL;
+	struct timespec64 ts;
+	u64 now;
+
+	ktime_get_ts64(&ts);
+	now = timespec64_to_ns(&ts);
+
+	mutex_lock(&conn->lock);
+	if (!kdbus_conn_active(conn)) {
+		mutex_unlock(&conn->lock);
+		return;
+	}
+
+	list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
+		/*
+		 * If the reply block is waiting for synchronous I/O,
+		 * the timeout is handled by wait_event_*_timeout(),
+		 * so we don't have to care for it here.
+		 */
+		if (reply->sync && !reply->interrupted)
+			continue;
+
+		WARN_ON(reply->reply_dst != conn);
+
+		if (reply->deadline_ns > now) {
+			/* remember next timeout */
+			if (deadline > reply->deadline_ns)
+				deadline = reply->deadline_ns;
+
+			continue;
+		}
+
+		/*
+		 * A zero deadline means the connection died, was
+		 * cleaned up already and the notification was sent.
+		 * Don't send notifications for reply trackers that were
+		 * left in an interrupted syscall state.
+		 */
+		if (reply->deadline_ns != 0 && !reply->interrupted)
+			kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
+						   reply->cookie);
+
+		kdbus_reply_unlink(reply);
+	}
+
+	/* rearm delayed work with next timeout */
+	if (deadline != ~0ULL)
+		schedule_delayed_work(&conn->work,
+				      nsecs_to_jiffies(deadline - now));
+
+	mutex_unlock(&conn->lock);
+
+	kdbus_notify_flush(conn->ep->bus);
+}
diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
new file mode 100644
index 000000000000..7cecea210bf5
--- /dev/null
+++ b/ipc/kdbus/reply.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_REPLY_H
+#define __KDBUS_REPLY_H
+
+/**
+ * struct kdbus_reply - an entry of kdbus_conn's list of replies
+ * @kref:		Ref-count of this object
+ * @entry:		The entry of the connection's reply_list
+ * @reply_src:		The connection the reply will be sent from
+ * @reply_dst:		The connection the reply will be sent to
+ * @queue_entry:	The queue entry item that is prepared by the replying
+ *			connection
+ * @deadline_ns:	The deadline of the reply, in nanoseconds
+ * @cookie:		The cookie of the requesting message
+ * @name_id:		ID of the well-known name the original msg was sent to
+ * @sync:		The reply block is waiting for synchronous I/O
+ * @waiting:		The condition to synchronously wait for
+ * @interrupted:	The sync reply was left in an interrupted state
+ * @err:		The error code for the synchronous reply
+ */
+struct kdbus_reply {
+	struct kref kref;
+	struct list_head entry;
+	struct kdbus_conn *reply_src;
+	struct kdbus_conn *reply_dst;
+	struct kdbus_queue_entry *queue_entry;
+	u64 deadline_ns;
+	u64 cookie;
+	u64 name_id;
+	bool sync:1;
+	bool waiting:1;
+	bool interrupted:1;
+	int err;
+};
+
+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
+				    struct kdbus_conn *reply_dst,
+				    const struct kdbus_msg *msg,
+				    struct kdbus_name_entry *name_entry,
+				    bool sync);
+
+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
+
+void kdbus_reply_link(struct kdbus_reply *r);
+void kdbus_reply_unlink(struct kdbus_reply *r);
+
+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
+				     struct kdbus_conn *reply_dst,
+				     u64 cookie);
+
+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
+void kdbus_reply_list_scan_work(struct work_struct *work);
+
+#endif /* __KDBUS_REPLY_H */
diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
index 33d31f6274e0..241bbcc1c19f 100644
--- a/ipc/kdbus/util.h
+++ b/ipc/kdbus/util.h
@@ -19,7 +19,7 @@
 #include <linux/ioctl.h>
 #include <linux/uidgid.h>
 
-#include "kdbus.h"
+#include <uapi/linux/kdbus.h>
 
 /* all exported addresses are 64 bit */
 #define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 06/13] kdbus: add node and filesystem implementation
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

kdbusfs is a filesystem that will expose a fresh kdbus domain context
each time it is mounted. Per mount point, there will be a 'control'
node, which can be used to create buses. fs.c contains the
implementation of that pseudo-fs. Exported inodes of 'file' type have
their i_fop set to either kdbus_handle_control_ops or
kdbus_handle_ep_ops, depending on their type. The actual dispatching
of file operations is done from handle.c

node.c is an implementation of a kdbus object that has an id and
children, organized in an R/B tree. The tree is used by the filesystem
code for lookup and iterator functions, and to deactivate children
once the parent is deactivated. Every inode exported by kdbusfs is
backed by a kdbus_node, hence it is embedded in struct kdbus_ep,
struct kdbus_bus and struct kdbus_domain.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/uapi/linux/magic.h |   2 +
 ipc/kdbus/fs.c             | 519 ++++++++++++++++++++++++++
 ipc/kdbus/fs.h             |  25 ++
 ipc/kdbus/node.c           | 910 +++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/node.h           |  87 +++++
 5 files changed, 1543 insertions(+)
 create mode 100644 ipc/kdbus/fs.c
 create mode 100644 ipc/kdbus/fs.h
 create mode 100644 ipc/kdbus/node.c
 create mode 100644 ipc/kdbus/node.h

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 7d664ea85ebd..1cf05c066158 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -74,4 +74,6 @@
 #define BTRFS_TEST_MAGIC	0x73727279
 #define NSFS_MAGIC		0x6e736673
 
+#define KDBUS_SUPER_MAGIC	0x44427573
+
 #endif /* __LINUX_MAGIC_H__ */
diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
new file mode 100644
index 000000000000..f48b9bd533b5
--- /dev/null
+++ b/ipc/kdbus/fs.c
@@ -0,0 +1,519 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/backing-dev.h>
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/init.h>
+#include <linux/ipc_namespace.h>
+#include <linux/magic.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/mutex.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "fs.h"
+#include "handle.h"
+#include "node.h"
+
+#define kdbus_node_from_dentry(_dentry) \
+	((struct kdbus_node *)(_dentry)->d_fsdata)
+#define kdbus_node_from_inode(_inode) \
+	((struct kdbus_node *)(_inode)->i_private)
+
+static struct inode *fs_inode_get(struct super_block *sb,
+				  struct kdbus_node *node);
+
+/*
+ * Directory Management
+ */
+
+static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
+{
+	switch (node->type) {
+	case KDBUS_NODE_DOMAIN:
+	case KDBUS_NODE_BUS:
+		return DT_DIR;
+	case KDBUS_NODE_CONTROL:
+	case KDBUS_NODE_ENDPOINT:
+		return DT_REG;
+	}
+
+	return DT_UNKNOWN;
+}
+
+static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
+{
+	struct dentry *dentry = file->f_path.dentry;
+	struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
+	struct kdbus_node *old, *next = file->private_data;
+
+	/*
+	 * kdbusfs directory iterator (modelled after sysfs/kernfs)
+	 * When iterating kdbusfs directories, we iterate all children of the
+	 * parent kdbus_node object. We use ctx->pos to store the hash of the
+	 * child and file->private_data to store a reference to the next node
+	 * object. If ctx->pos is not modified via llseek while you iterate a
+	 * directory, then we use the file->private_data node pointer to
+	 * directly access the next node in the tree.
+	 * However, if you directly seek on the directory, we have to find the
+	 * closest node to that position and cannot use our node pointer. This
+	 * means iterating the rb-tree to find the closest match and start over
+	 * from there.
+	 * Note that hash values are not neccessarily unique. Therefore, llseek
+	 * is not guaranteed to seek to the same node that you got when you
+	 * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
+	 * though. We could use the inode-number as position, but this would
+	 * require another rb-tree for fast access. Kernfs and others already
+	 * ignore those conflicts, so we should be fine, too.
+	 */
+
+	if (!dir_emit_dots(file, ctx))
+		return 0;
+
+	/* acquire @next; if deactivated, or seek detected, find next node */
+	old = next;
+	if (next && ctx->pos == next->hash) {
+		if (kdbus_node_acquire(next))
+			kdbus_node_ref(next);
+		else
+			next = kdbus_node_next_child(parent, next);
+	} else {
+		next = kdbus_node_find_closest(parent, ctx->pos);
+	}
+	kdbus_node_unref(old);
+
+	while (next) {
+		/* emit @next */
+		file->private_data = next;
+		ctx->pos = next->hash;
+
+		kdbus_node_release(next);
+
+		if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
+			      kdbus_dt_type(next)))
+			return 0;
+
+		/* find next node after @next */
+		old = next;
+		next = kdbus_node_next_child(parent, next);
+		kdbus_node_unref(old);
+	}
+
+	file->private_data = NULL;
+	ctx->pos = INT_MAX;
+
+	return 0;
+}
+
+static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct inode *inode = file_inode(file);
+	loff_t ret;
+
+	/* protect f_off against fop_iterate */
+	mutex_lock(&inode->i_mutex);
+	ret = generic_file_llseek(file, offset, whence);
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
+static int fs_dir_fop_release(struct inode *inode, struct file *file)
+{
+	kdbus_node_unref(file->private_data);
+	return 0;
+}
+
+static const struct file_operations fs_dir_fops = {
+	.read		= generic_read_dir,
+	.iterate	= fs_dir_fop_iterate,
+	.llseek		= fs_dir_fop_llseek,
+	.release	= fs_dir_fop_release,
+};
+
+static struct dentry *fs_dir_iop_lookup(struct inode *dir,
+					struct dentry *dentry,
+					unsigned int flags)
+{
+	struct dentry *dnew = NULL;
+	struct kdbus_node *parent;
+	struct kdbus_node *node;
+	struct inode *inode;
+
+	parent = kdbus_node_from_dentry(dentry->d_parent);
+	if (!kdbus_node_acquire(parent))
+		return NULL;
+
+	/* returns reference to _acquired_ child node */
+	node = kdbus_node_find_child(parent, dentry->d_name.name);
+	if (node) {
+		dentry->d_fsdata = node;
+		inode = fs_inode_get(dir->i_sb, node);
+		if (IS_ERR(inode))
+			dnew = ERR_CAST(inode);
+		else
+			dnew = d_splice_alias(inode, dentry);
+
+		kdbus_node_release(node);
+	}
+
+	kdbus_node_release(parent);
+	return dnew;
+}
+
+static const struct inode_operations fs_dir_iops = {
+	.permission	= generic_permission,
+	.lookup		= fs_dir_iop_lookup,
+};
+
+/*
+ * Inode Management
+ */
+
+static const struct inode_operations fs_inode_iops = {
+	.permission	= generic_permission,
+};
+
+static struct inode *fs_inode_get(struct super_block *sb,
+				  struct kdbus_node *node)
+{
+	struct inode *inode;
+
+	inode = iget_locked(sb, node->id);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+	if (!(inode->i_state & I_NEW))
+		return inode;
+
+	inode->i_private = kdbus_node_ref(node);
+	inode->i_mapping->a_ops = &empty_aops;
+	inode->i_mapping->backing_dev_info = &noop_backing_dev_info;
+	inode->i_mode = node->mode & S_IALLUGO;
+	inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+	inode->i_uid = node->uid;
+	inode->i_gid = node->gid;
+
+	switch (node->type) {
+	case KDBUS_NODE_DOMAIN:
+	case KDBUS_NODE_BUS:
+		inode->i_mode |= S_IFDIR;
+		inode->i_op = &fs_dir_iops;
+		inode->i_fop = &fs_dir_fops;
+		set_nlink(inode, 2);
+		break;
+	case KDBUS_NODE_CONTROL:
+		inode->i_mode |= S_IFREG;
+		inode->i_op = &fs_inode_iops;
+		inode->i_fop = &kdbus_handle_control_ops;
+		break;
+	case KDBUS_NODE_ENDPOINT:
+		inode->i_mode |= S_IFREG;
+		inode->i_op = &fs_inode_iops;
+		inode->i_fop = &kdbus_handle_ep_ops;
+		break;
+	}
+
+	unlock_new_inode(inode);
+
+	return inode;
+}
+
+/*
+ * Superblock Management
+ */
+
+static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct kdbus_node *node;
+
+	/* Force lookup on negatives */
+	if (!dentry->d_inode)
+		return 0;
+
+	node = kdbus_node_from_dentry(dentry);
+
+	/* see whether the node has been removed */
+	if (!kdbus_node_is_active(node))
+		return 0;
+
+	return 1;
+}
+
+static void fs_super_dop_release(struct dentry *dentry)
+{
+	kdbus_node_unref(dentry->d_fsdata);
+}
+
+static const struct dentry_operations fs_super_dops = {
+	.d_revalidate	= fs_super_dop_revalidate,
+	.d_release	= fs_super_dop_release,
+};
+
+static void fs_super_sop_evict_inode(struct inode *inode)
+{
+	struct kdbus_node *node = kdbus_node_from_inode(inode);
+
+	truncate_inode_pages_final(&inode->i_data);
+	clear_inode(inode);
+	kdbus_node_unref(node);
+}
+
+static const struct super_operations fs_super_sops = {
+	.statfs		= simple_statfs,
+	.drop_inode	= generic_delete_inode,
+	.evict_inode	= fs_super_sop_evict_inode,
+};
+
+static int fs_super_fill(struct super_block *sb)
+{
+	struct kdbus_domain *domain = sb->s_fs_info;
+	struct inode *inode;
+	int ret;
+
+	sb->s_blocksize = PAGE_CACHE_SIZE;
+	sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+	sb->s_magic = KDBUS_SUPER_MAGIC;
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+	sb->s_op = &fs_super_sops;
+	sb->s_time_gran = 1;
+
+	inode = fs_inode_get(sb, &domain->node);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	sb->s_root = d_make_root(inode);
+	if (!sb->s_root) {
+		/* d_make_root iput()s the inode on failure */
+		return -ENOMEM;
+	}
+
+	/* sb holds domain reference */
+	sb->s_root->d_fsdata = &domain->node;
+	sb->s_d_op = &fs_super_dops;
+
+	/* sb holds root reference */
+	domain->dentry = sb->s_root;
+
+	ret = kdbus_domain_activate(domain);
+	if (ret < 0)
+		return ret;
+
+	sb->s_flags |= MS_ACTIVE;
+	return 0;
+}
+
+static void fs_super_kill(struct super_block *sb)
+{
+	struct kdbus_domain *domain = sb->s_fs_info;
+
+	if (domain) {
+		kdbus_domain_deactivate(domain);
+		domain->dentry = NULL;
+	}
+
+	kill_anon_super(sb);
+
+	if (domain)
+		kdbus_domain_unref(domain);
+}
+
+static int fs_super_set(struct super_block *sb, void *data)
+{
+	int ret;
+
+	ret = set_anon_super(sb, data);
+	if (!ret)
+		sb->s_fs_info = data;
+
+	return ret;
+}
+
+static struct dentry *fs_super_mount(struct file_system_type *fs_type,
+				     int flags, const char *dev_name,
+				     void *data)
+{
+	struct kdbus_domain *domain;
+	struct super_block *sb;
+	int ret;
+
+	domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
+	if (IS_ERR(domain))
+		return ERR_CAST(domain);
+
+	sb = sget(fs_type, NULL, fs_super_set, flags, domain);
+	if (IS_ERR(sb)) {
+		ret = PTR_ERR(sb);
+		goto exit_domain;
+	}
+
+	WARN_ON(sb->s_fs_info != domain);
+	WARN_ON(sb->s_root);
+
+	ret = fs_super_fill(sb);
+	if (ret < 0) {
+		/* calls into ->kill_sb() when done */
+		deactivate_locked_super(sb);
+		return ERR_PTR(ret);
+	}
+
+	return dget(sb->s_root);
+
+exit_domain:
+	kdbus_domain_deactivate(domain);
+	kdbus_domain_unref(domain);
+	return ERR_PTR(ret);
+}
+
+static struct file_system_type fs_type = {
+	.name		= KBUILD_MODNAME "fs",
+	.owner		= THIS_MODULE,
+	.mount		= fs_super_mount,
+	.kill_sb	= fs_super_kill,
+	.fs_flags	= FS_USERNS_MOUNT,
+};
+
+/**
+ * kdbus_fs_init() - register kdbus filesystem
+ *
+ * This registers a filesystem with the VFS layer. The filesystem is called
+ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
+ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
+ * independent filesystem for developers.
+ *
+ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
+ * Operations on this mount will only affect the attached domain. On each mount
+ * a new domain is automatically created and used for this mount exclusively.
+ * If you want to share a domain across multiple mounts, you need to bind-mount
+ * it.
+ *
+ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
+ * and will never have any effect on any domain but their own.
+ *
+ * Return: 0 on success, negative error otherwise.
+ */
+int kdbus_fs_init(void)
+{
+	return register_filesystem(&fs_type);
+}
+
+/**
+ * kdbus_fs_exit() - unregister kdbus filesystem
+ *
+ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
+ * filesystem from VFS and cleans up any allocated resources.
+ */
+void kdbus_fs_exit(void)
+{
+	unregister_filesystem(&fs_type);
+}
+
+/* acquire domain of @node, making sure all ancestors are active */
+static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
+{
+	struct kdbus_domain *domain;
+	struct kdbus_node *iter;
+
+	/* caller must guarantee that @node is linked */
+	for (iter = node; iter->parent; iter = iter->parent)
+		if (!kdbus_node_is_active(iter->parent))
+			return NULL;
+
+	/* root nodes are always domains */
+	if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
+		return NULL;
+
+	domain = kdbus_domain_from_node(iter);
+	if (!kdbus_node_acquire(&domain->node))
+		return NULL;
+
+	return domain;
+}
+
+/**
+ * kdbus_fs_flush() - flush dcache entries of a node
+ * @node:		Node to flush entries of
+ *
+ * This flushes all VFS filesystem cache entries for a node and all its
+ * children. This should be called whenever a node is destroyed during
+ * runtime. It will flush the cache entries so the linked objects can be
+ * deallocated.
+ *
+ * This is a no-op if you call it on active nodes (they really should stay in
+ * cache) or on nodes with deactivated parents (flushing the parent is enough).
+ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
+ * their parents'. In those cases, the parent-flush will always also flush the
+ * children.
+ */
+void kdbus_fs_flush(struct kdbus_node *node)
+{
+	struct dentry *dentry, *parent_dentry = NULL;
+	struct kdbus_domain *domain;
+	struct qstr name;
+
+	/* active nodes should remain in cache */
+	if (!kdbus_node_is_deactivated(node))
+		return;
+
+	/* nodes that were never linked were never instantiated */
+	if (!node->parent)
+		return;
+
+	/* acquire domain and verify all ancestors are active */
+	domain = fs_acquire_domain(node);
+	if (!domain)
+		return;
+
+	switch (node->type) {
+	case KDBUS_NODE_ENDPOINT:
+		if (WARN_ON(!node->parent || !node->parent->name))
+			goto exit;
+
+		name.name = node->parent->name;
+		name.len = strlen(node->parent->name);
+		parent_dentry = d_hash_and_lookup(domain->dentry, &name);
+		if (IS_ERR_OR_NULL(parent_dentry))
+			goto exit;
+
+		/* fallthrough */
+	case KDBUS_NODE_BUS:
+		if (WARN_ON(!node->name))
+			goto exit;
+
+		name.name = node->name;
+		name.len = strlen(node->name);
+		dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
+					   &name);
+		if (!IS_ERR_OR_NULL(dentry)) {
+			d_invalidate(dentry);
+			dput(dentry);
+		}
+
+		dput(parent_dentry);
+		break;
+
+	default:
+		/* all other types are bound to their parent lifetime */
+		break;
+	}
+
+exit:
+	kdbus_node_release(&domain->node);
+}
diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
new file mode 100644
index 000000000000..5c38a5774392
--- /dev/null
+++ b/ipc/kdbus/fs.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUSFS_H
+#define __KDBUSFS_H
+
+#include <linux/kernel.h>
+
+struct kdbus_node;
+
+int kdbus_fs_init(void);
+void kdbus_fs_exit(void);
+void kdbus_fs_flush(struct kdbus_node *node);
+
+#endif
diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
new file mode 100644
index 000000000000..2fd9cd09f3fe
--- /dev/null
+++ b/ipc/kdbus/node.c
@@ -0,0 +1,910 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/atomic.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/kdev_t.h>
+#include <linux/rbtree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "fs.h"
+#include "handle.h"
+#include "node.h"
+#include "util.h"
+
+/**
+ * DOC: kdbus nodes
+ *
+ * Nodes unify lifetime management across exposed kdbus objects and provide a
+ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
+ * kdbus_node object embedded and is linked into the hierarchy. Each node can
+ * have any number (0-n) of child nodes linked. Each child retains a reference
+ * to its parent node. For root-nodes, the parent is NULL.
+ *
+ * Each node object goes through a bunch of states during it's lifetime:
+ *     * NEW
+ *       * LINKED    (can be skipped by NEW->FREED transition)
+ *         * ACTIVE  (can be skipped by LINKED->INACTIVE transition)
+ *       * INACTIVE
+ *       * DRAINED
+ *     * FREED
+ *
+ * Each node is allocated by the caller and initialized via kdbus_node_init().
+ * This never fails and sets the object into state NEW. From now on, ref-counts
+ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
+ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
+ * is called to deallocate any memory.
+ *
+ * After initializing a node, you usually link it into the hierarchy. You need
+ * to provide a parent node and a name. The node will be linked as child to the
+ * parent and a globally unique ID is assigned to the child. The name of the
+ * child must be unique for all children of this parent. Otherwise, linking the
+ * child will fail with -EEXIST.
+ * Note that the child is not marked active, yet. Admittedly, it prevents any
+ * other node from being linked with the same name (thus, it reserves that
+ * name), but any child-lookup (via name or unique ID) will never return this
+ * child unless it has been marked active.
+ *
+ * Once successfully linked, you can use kdbus_node_activate() to activate a
+ * child. This will mark the child active. This state can be skipped by directly
+ * deactivating the child via kdbus_node_deactivate() (see below).
+ * By activating a child, you enable any lookups on this child to succeed from
+ * now on. Furthermore, any code that got its hands on a reference to the node,
+ * can from now on "acquire" the node.
+ *
+ *     Active References (or: 'acquiring' and 'releasing' a node)
+ *     Additionally to normal object references, nodes support something we call
+ *     "active references". An active reference can be acquired via
+ *     kdbus_node_acquire() and released via kdbus_node_release(). A caller
+ *     _must_ own a normal object reference whenever calling those functions.
+ *     Unlike object references, acquiring an active reference can fail (by
+ *     returning 'false' from kdbus_node_acquire()). An active reference can
+ *     only be acquired if the node is marked active. If it is not marked
+ *     active, yet, or if it was already deactivated, no more active references
+ *     can be acquired, ever!
+ *     Active references are used to track tasks working on a node. Whenever a
+ *     task enters kernel-space to perform an action on a node, it acquires an
+ *     active reference, performs the action and releases the reference again.
+ *     While holding an active reference, the node is guaranteed to stay active.
+ *     If the node is deactivated in parallel, the node is marked as
+ *     deactivated, then we wait for all active references to be dropped, before
+ *     we finally proceed with any cleanups. That is, if you hold an active
+ *     reference to a node, any resources that are bound to the "active" state
+ *     are guaranteed to stay accessible until you release your reference.
+ *
+ *     Active-references are very similar to rw-locks, where acquiring a node is
+ *     equal to try-read-lock and releasing to read-unlock. Deactivating a node
+ *     means write-lock and never releasing it again.
+ *     Unlike rw-locks, the 'active reference' concept is more versatile and
+ *     avoids unusual rw-lock usage (never releasing a write-lock..).
+ *
+ *     It is safe to acquire multiple active-references recursively. But you
+ *     need to check the return value of kdbus_node_acquire() on _each_ call. It
+ *     may stop granting references at _any_ time.
+ *
+ *     You're free to perform any operations you want while holding an active
+ *     reference, except sleeping for an indefinite period. Sleeping for a fixed
+ *     amount of time is fine, but you usually should not wait on wait-queues
+ *     without a timeout.
+ *     For example, if you wait for I/O to happen, you should gather all data
+ *     and schedule the I/O operation, then release your active reference and
+ *     wait for it to complete. Then try to acquire a new reference. If it
+ *     fails, perform any cleanup (the node is now dead). Otherwise, you can
+ *     finish your operation.
+ *
+ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
+ * call this multiple times, even in parallel or on nodes that were never
+ * linked, and it will just work. The only restriction is, you must not hold an
+ * active reference when calling kdbus_node_deactivate().
+ * By deactivating a node, it is immediately marked inactive. Then, we wait for
+ * all active references to be released (called 'draining' the node). This
+ * shouldn't take very long as we don't perform long-lasting operations while
+ * holding an active reference. Note that once the node is marked inactive, no
+ * new active references can be acquired.
+ * Once all active references are dropped, the node is considered 'drained'. Now
+ * kdbus_node_deactivate() is called on each child of the node before we
+ * continue deactvating our node. That is, once all children are entirely
+ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
+ * any resources on that node which are bound to the "active" state of a node.
+ * When done, we unlink the node from its parent rb-tree, mark it as
+ * 'released' and return.
+ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
+ * but one caller will just wait until the node is fully deactivated. That is,
+ * one random caller of kdbus_node_deactivate() is selected to call
+ * ->release_cb() and cleanup the node. Only once all this is done, all other
+ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
+ * whether you're the selected caller or not, it will only return after
+ * everything is fully done.
+ *
+ * When a node is activated, we acquire a normal object reference to the node.
+ * This reference is dropped after deactivation is fully done (and only iff the
+ * node really was activated). This allows callers to link+activate a child node
+ * and then drop all refs. The node will be deactivated together with the
+ * parent, and then be freed when this reference is dropped.
+ *
+ * Currently, nodes provide a bunch of resources that external code can use
+ * directly. This includes:
+ *
+ *     * node->waitq: Each node has its own wait-queue that is used to manage
+ *                    the 'active' state. When a node is deactivated, we wait on
+ *                    this queue until all active refs are dropped. Analogously,
+ *                    when you release an active reference on a deactivated
+ *                    node, and the active ref-count drops to 0, we wake up a
+ *                    single thread on this queue. Furthermore, once the
+ *                    ->release_cb() callback finished, we wake up all waiters.
+ *                    The node-owner is free to re-use this wait-queue for other
+ *                    purposes. As node-management uses this queue only during
+ *                    deactivation, it is usually totally fine to re-use the
+ *                    queue for other, preferably low-overhead, use-cases.
+ *
+ *     * node->type: This field defines the type of the owner of this node. It
+ *                   must be set during node initialization and must remain
+ *                   constant. The node management never looks at this value,
+ *                   but external users might use to gain access to the owner
+ *                   object of a node.
+ *                   It is totally up to the owner of the node to define what
+ *                   their type means. Usually it means you can access the
+ *                   parent structure via container_of(), as long as you hold an
+ *                   active reference to the node.
+ *
+ *     * node->free_cb:    callback after all references are dropped
+ *       node->release_cb: callback during node deactivation
+ *                         These fields must be set by the node owner during
+ *                         node initialization. They must remain constant. If
+ *                         NULL, they're skipped.
+ *
+ *     * node->mode: filesystem access modes
+ *       node->uid:  filesystem owner uid
+ *       node->gid:  filesystem owner gid
+ *                   These fields must be set by the node owner during node
+ *                   initialization. They must remain constant and may be
+ *                   accessed by other callers to properly initialize
+ *                   filesystem nodes.
+ *
+ *     * node->id: This is an unsigned 32bit integer allocated by an IDR. It is
+ *                 always kept as small as possible during allocation and is
+ *                 globally unique across all nodes allocated by this module. 0
+ *                 is reserved as "not assigned" and is the default.
+ *                 The ID is assigned during kdbus_node_link() and is kept until
+ *                 the object is freed. Thus, the ID surpasses the active
+ *                 lifetime of a node. As long as you hold an object reference
+ *                 to a node (and the node was linked once), the ID is valid and
+ *                 unique.
+ *
+ *     * node->name: name of this node
+ *       node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
+ *                   These values follow the same lifetime rules as node->id.
+ *                   They're initialized when the node is linked and then remain
+ *                   constant until the last object reference is dropped.
+ *                   Unlike the id, the name is only unique across all siblings
+ *                   and only until the node is deactivated. Currently, the name
+ *                   is even unique if linked but not activated, yet. This might
+ *                   change in the future, though. Code should not rely on this.
+ *
+ *     * node->lock:     lock to protect node->children, node->rb, node->parent
+ *     * node->parent: Reference to parent node. This is set during LINK time
+ *                     and is dropped during destruction. You must not access
+ *                     it unless you hold an active reference to the node or if
+ *                     you know the node is dead.
+ *     * node->children: rb-tree of all linked children of this node. You must
+ *                       not access this directly, but use one of the iterator
+ *                       or lookup helpers.
+ */
+
+/*
+ * Bias values track states of "active references". They're all negative. If a
+ * node is active, its active-ref-counter is >=0 and tracks all active
+ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
+ * counter is now negative but still counts the active references. Once it drops
+ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
+ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
+ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
+ * the node will now be woken up (thus, they wait until the node is fully done).
+ * The initial state during node-setup is NODE_NEW. If a node is directly
+ * deactivated without having ever been active, it is put into
+ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
+ * across node-deactivation. The task putting it into NODE_RELEASE now knows
+ * whether the node was active before or not.
+ *
+ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
+ * to avoid overflows if multiplied by -1.
+ */
+#define KDBUS_NODE_BIAS			(INT_MIN + 5)
+#define KDBUS_NODE_RELEASE_DIRECT	(KDBUS_NODE_BIAS - 1)
+#define KDBUS_NODE_RELEASE		(KDBUS_NODE_BIAS - 2)
+#define KDBUS_NODE_DRAINED		(KDBUS_NODE_BIAS - 3)
+#define KDBUS_NODE_NEW			(KDBUS_NODE_BIAS - 4)
+
+/* global unique ID mapping for kdbus nodes */
+static DEFINE_IDR(kdbus_node_idr);
+static DECLARE_RWSEM(kdbus_node_idr_lock);
+
+/**
+ * kdbus_node_name_hash() - hash a name
+ * @name:	The string to hash
+ *
+ * This computes the hash of @name. It is guaranteed to be in the range
+ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
+ * for the filesystem code.
+ *
+ * Return: hash value of the passed string
+ */
+static unsigned int kdbus_node_name_hash(const char *name)
+{
+	unsigned int hash;
+
+	/* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
+	hash = kdbus_strhash(name) & INT_MAX;
+	if (hash < 2)
+		hash += 2;
+	if (hash >= INT_MAX)
+		hash = INT_MAX - 1;
+
+	return hash;
+}
+
+/**
+ * kdbus_node_name_compare() - compare a name with a node's name
+ * @hash:	hash of the string to compare the node with
+ * @name:	name to compare the node with
+ * @node:	node to compare the name with
+ *
+ * Return: 0 if @name and @hash exactly match the information in @node, or
+ * an integer less than or greater than zero if @name is found, respectively,
+ * to be less than or be greater than the string stored in @node.
+ */
+static int kdbus_node_name_compare(unsigned int hash, const char *name,
+				   const struct kdbus_node *node)
+{
+	if (hash != node->hash)
+		return hash - node->hash;
+
+	return strcmp(name, node->name);
+}
+
+/**
+ * kdbus_node_init() - initialize a kdbus_node
+ * @node:	Pointer to the node to initialize
+ * @type:	The type the node will have (KDBUS_NODE_*)
+ *
+ * The caller is responsible of allocating @node and initializating it to zero.
+ * Once this call returns, you must use the node_ref() and node_unref()
+ * functions to manage this node.
+ */
+void kdbus_node_init(struct kdbus_node *node, unsigned int type)
+{
+	atomic_set(&node->refcnt, 1);
+	mutex_init(&node->lock);
+	node->id = 0;
+	node->type = type;
+	RB_CLEAR_NODE(&node->rb);
+	node->children = RB_ROOT;
+	init_waitqueue_head(&node->waitq);
+	atomic_set(&node->active, KDBUS_NODE_NEW);
+}
+
+/**
+ * kdbus_node_link() - link a node into the nodes system
+ * @node:	Pointer to the node to initialize
+ * @parent:	Pointer to a parent node, may be %NULL
+ * @name:	The name of the node (or NULL if root node)
+ *
+ * This links a node into the hierarchy. This must not be called multiple times.
+ * If @parent is NULL, the node becomes a new root node.
+ *
+ * This call will fail if @name is not unique across all its siblings or if no
+ * ID could be allocated. You must not activate a node if linking failed! It is
+ * safe to deactivate it, though.
+ *
+ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
+ * the last reference (even if you never activate the node).
+ *
+ * Return: 0 on success. negative error otherwise.
+ */
+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
+		    const char *name)
+{
+	int ret;
+
+	if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
+		return -EINVAL;
+
+	if (WARN_ON(parent && !name))
+		return -EINVAL;
+
+	if (name) {
+		node->name = kstrdup(name, GFP_KERNEL);
+		if (!node->name)
+			return -ENOMEM;
+
+		node->hash = kdbus_node_name_hash(name);
+	}
+
+	down_write(&kdbus_node_idr_lock);
+	ret = idr_alloc(&kdbus_node_idr, node, 1, 0, GFP_KERNEL);
+	if (ret >= 0)
+		node->id = ret;
+	up_write(&kdbus_node_idr_lock);
+
+	if (ret < 0)
+		return ret;
+
+	ret = 0;
+
+	if (parent) {
+		struct rb_node **n, *prev;
+
+		if (!kdbus_node_acquire(parent))
+			return -ESHUTDOWN;
+
+		mutex_lock(&parent->lock);
+
+		n = &parent->children.rb_node;
+		prev = NULL;
+
+		while (*n) {
+			struct kdbus_node *pos;
+			int result;
+
+			pos = kdbus_node_from_rb(*n);
+			prev = *n;
+			result = kdbus_node_name_compare(node->hash,
+							 node->name,
+							 pos);
+			if (result == 0) {
+				ret = -EEXIST;
+				goto exit_unlock;
+			}
+
+			if (result < 0)
+				n = &pos->rb.rb_left;
+			else
+				n = &pos->rb.rb_right;
+		}
+
+		/* add new node and rebalance the tree */
+		rb_link_node(&node->rb, prev, n);
+		rb_insert_color(&node->rb, &parent->children);
+		node->parent = kdbus_node_ref(parent);
+
+exit_unlock:
+		mutex_unlock(&parent->lock);
+		kdbus_node_release(parent);
+	}
+
+	return ret;
+}
+
+/**
+ * kdbus_node_ref() - Acquire object reference
+ * @node:	node to acquire reference to (or NULL)
+ *
+ * This acquires a new reference to @node. You must already own a reference when
+ * calling this!
+ * If @node is NULL, this is a no-op.
+ *
+ * Return: @node is returned
+ */
+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
+{
+	if (node)
+		atomic_inc(&node->refcnt);
+	return node;
+}
+
+/**
+ * kdbus_node_unref() - Drop object reference
+ * @node:	node to drop reference to (or NULL)
+ *
+ * This drops an object reference to @node. You must not access the node if you
+ * no longer own a reference.
+ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
+ * called).
+ *
+ * If you linked or activated the node, you must deactivate the node before you
+ * drop your last reference! If you didn't link or activate the node, you can
+ * drop any reference you want.
+ *
+ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
+ * callbacks must not acquire any outer locks, though. So you can safely drop
+ * references while holding locks.
+ *
+ * If @node is NULL, this is a no-op.
+ *
+ * Return: This always returns NULL
+ */
+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
+{
+	if (node && atomic_dec_and_test(&node->refcnt)) {
+		struct kdbus_node safe = *node;
+
+		WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
+		WARN_ON(!RB_EMPTY_NODE(&node->rb));
+
+		if (node->free_cb)
+			node->free_cb(node);
+
+		down_write(&kdbus_node_idr_lock);
+		if (safe.id > 0)
+			idr_remove(&kdbus_node_idr, safe.id);
+		/* drop caches after last node to not leak memory on unload */
+		if (idr_is_empty(&kdbus_node_idr)) {
+			idr_destroy(&kdbus_node_idr);
+			idr_init(&kdbus_node_idr);
+		}
+		up_write(&kdbus_node_idr_lock);
+
+		kfree(safe.name);
+
+		/*
+		 * kdbusfs relies on the parent to be available even after the
+		 * node was deactivated and unlinked. Therefore, we pin it
+		 * until a node is destroyed.
+		 */
+		kdbus_node_unref(safe.parent);
+	}
+
+	return NULL;
+}
+
+/**
+ * kdbus_node_is_active() - test whether a node is active
+ * @node:	node to test
+ *
+ * This checks whether @node is active. That means, @node was linked and
+ * activated by the node owner and hasn't been deactivated, yet. If, and only
+ * if, a node is active, kdbus_node_acquire() will be able to acquire active
+ * references.
+ *
+ * Note that this function does not give any lifetime guarantees. After this
+ * call returns, the node might be deactivated immediately. Normally, what you
+ * want is to acquire a real active reference via kdbus_node_acquire().
+ *
+ * Return: true if @node is active, false otherwise
+ */
+bool kdbus_node_is_active(struct kdbus_node *node)
+{
+	return atomic_read(&node->active) >= 0;
+}
+
+/**
+ * kdbus_node_is_deactivated() - test whether a node was already deactivated
+ * @node:	node to test
+ *
+ * This checks whether kdbus_node_deactivate() was called on @node. Note that
+ * this might be true even if you never deactivated the node directly, but only
+ * one of its ancestors.
+ *
+ * Note that even if this returns 'false', the node might get deactivated
+ * immediately after the call returns.
+ *
+ * Return: true if @node was already deactivated, false if not
+ */
+bool kdbus_node_is_deactivated(struct kdbus_node *node)
+{
+	int v;
+
+	v = atomic_read(&node->active);
+	return v != KDBUS_NODE_NEW && v < 0;
+}
+
+/**
+ * kdbus_node_activate() - activate a node
+ * @node:	node to activate
+ *
+ * This marks @node as active if, and only if, the node wasn't activated nor
+ * deactivated, yet, and the parent is still active. Any but the first call to
+ * kdbus_node_activate() is a no-op.
+ * If you called kdbus_node_deactivate() before, then even the first call to
+ * kdbus_node_activate() will be a no-op.
+ *
+ * This call doesn't give any lifetime guarantees. The node might get
+ * deactivated immediately after this call returns. Or the parent might already
+ * be deactivated, which will make this call a no-op.
+ *
+ * If this call successfully activated a node, it will take an object reference
+ * to it. This reference is dropped after the node is deactivated. Therefore,
+ * the object owner can safely drop their reference to @node iff they know that
+ * its parent node will get deactivated at some point. Once the parent node is
+ * deactivated, it will deactivate all its child and thus drop this reference
+ * again.
+ *
+ * Return: True if this call successfully activated the node, otherwise false.
+ *         Note that this might return false, even if the node is still active
+ *         (eg., if you called this a second time).
+ */
+bool kdbus_node_activate(struct kdbus_node *node)
+{
+	bool res = false;
+
+	mutex_lock(&node->lock);
+	if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
+		atomic_sub(KDBUS_NODE_NEW, &node->active);
+		/* activated nodes have ref +1 */
+		kdbus_node_ref(node);
+		res = true;
+	}
+	mutex_unlock(&node->lock);
+
+	return res;
+}
+
+/**
+ * kdbus_node_deactivate() - deactivate a node
+ * @node:	The node to deactivate.
+ *
+ * This function recursively deactivates this node and all its children. It
+ * returns only once all children and the node itself were recursively disabled
+ * (even if you call this function multiple times in parallel).
+ *
+ * It is safe to call this function on _any_ node that was initialized _any_
+ * number of times.
+ *
+ * This call may sleep, as it waits for all active references to be dropped.
+ */
+void kdbus_node_deactivate(struct kdbus_node *node)
+{
+	struct kdbus_node *pos, *child;
+	struct rb_node *rb;
+	int v_pre, v_post;
+
+	pos = node;
+
+	/*
+	 * To avoid recursion, we perform back-tracking while deactivating
+	 * nodes. For each node we enter, we first mark the active-counter as
+	 * deactivated by adding BIAS. If the node as children, we set the first
+	 * child as current position and start over. If the node has no
+	 * children, we drain the node by waiting for all active refs to be
+	 * dropped and then releasing the node.
+	 *
+	 * After the node is released, we set its parent as current position
+	 * and start over. If the current position was the initial node, we're
+	 * done.
+	 *
+	 * Note that this function can be called in parallel by multiple
+	 * callers. We make sure that each node is only released once, and any
+	 * racing caller will wait until the other thread fully released that
+	 * node.
+	 */
+
+	for (;;) {
+		/*
+		 * Add BIAS to node->active to mark it as inactive. If it was
+		 * never active before, immediately mark it as RELEASE_INACTIVE
+		 * so we remember this state.
+		 * We cannot remember v_pre as we might iterate into the
+		 * children, overwriting v_pre, before we can release our node.
+		 */
+		mutex_lock(&pos->lock);
+		v_pre = atomic_read(&pos->active);
+		if (v_pre >= 0)
+			atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
+		else if (v_pre == KDBUS_NODE_NEW)
+			atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
+		mutex_unlock(&pos->lock);
+
+		/* wait until all active references were dropped */
+		wait_event(pos->waitq,
+			   atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
+
+		mutex_lock(&pos->lock);
+		/* recurse into first child if any */
+		rb = rb_first(&pos->children);
+		if (rb) {
+			child = kdbus_node_ref(kdbus_node_from_rb(rb));
+			mutex_unlock(&pos->lock);
+			pos = child;
+			continue;
+		}
+
+		/* mark object as RELEASE */
+		v_post = atomic_read(&pos->active);
+		if (v_post == KDBUS_NODE_BIAS ||
+		    v_post == KDBUS_NODE_RELEASE_DIRECT)
+			atomic_set(&pos->active, KDBUS_NODE_RELEASE);
+		mutex_unlock(&pos->lock);
+
+		/*
+		 * If this is the thread that marked the object as RELEASE, we
+		 * perform the actual release. Otherwise, we wait until the
+		 * release is done and the node is marked as DRAINED.
+		 */
+		if (v_post == KDBUS_NODE_BIAS ||
+		    v_post == KDBUS_NODE_RELEASE_DIRECT) {
+			if (pos->release_cb)
+				pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
+
+			if (pos->parent) {
+				mutex_lock(&pos->parent->lock);
+				if (!RB_EMPTY_NODE(&pos->rb)) {
+					rb_erase(&pos->rb,
+						 &pos->parent->children);
+					RB_CLEAR_NODE(&pos->rb);
+				}
+				mutex_unlock(&pos->parent->lock);
+			}
+
+			/* mark as DRAINED */
+			atomic_set(&pos->active, KDBUS_NODE_DRAINED);
+			wake_up_all(&pos->waitq);
+
+			/* drop VFS cache */
+			kdbus_fs_flush(pos);
+
+			/*
+			 * If the node was activated and somone subtracted BIAS
+			 * from it to deactivate it, we, and only us, are
+			 * responsible to release the extra ref-count that was
+			 * taken once in kdbus_node_activate().
+			 * If the node was never activated, no-one ever
+			 * subtracted BIAS, but instead skipped that state and
+			 * immediately went to NODE_RELEASE_DIRECT. In that case
+			 * we must not drop the reference.
+			 */
+			if (v_post == KDBUS_NODE_BIAS)
+				kdbus_node_unref(pos);
+		} else {
+			/* wait until object is DRAINED */
+			wait_event(pos->waitq,
+			    atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
+		}
+
+		/*
+		 * We're done with the current node. Continue on its parent
+		 * again, which will try deactivating its next child, or itself
+		 * if no child is left.
+		 * If we've reached our initial node again, we are done and
+		 * can safely return.
+		 */
+		if (pos == node)
+			break;
+
+		child = pos;
+		pos = pos->parent;
+		kdbus_node_unref(child);
+	}
+}
+
+/**
+ * kdbus_node_acquire() - Acquire an active ref on a node
+ * @node:	The node
+ *
+ * This acquires an active-reference to @node. This will only succeed if the
+ * node is active. You must release this active reference via
+ * kdbus_node_release() again.
+ *
+ * See the introduction to "active references" for more details.
+ *
+ * Return: %true if @node was non-NULL and active
+ */
+bool kdbus_node_acquire(struct kdbus_node *node)
+{
+	return node && atomic_inc_unless_negative(&node->active);
+}
+
+/**
+ * kdbus_node_release() - Release an active ref on a node
+ * @node:	The node
+ *
+ * This releases an active reference that was previously acquired via
+ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
+ */
+void kdbus_node_release(struct kdbus_node *node)
+{
+	if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
+		wake_up(&node->waitq);
+}
+
+/**
+ * kdbus_node_find_child() - Find child by name
+ * @node:	parent node to search through
+ * @name:	name of child node
+ *
+ * This searches through all children of @node for a child-node with name @name.
+ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
+ * the child is acquired and a new reference is returned.
+ *
+ * If you're done with the child, you need to release it and drop your
+ * reference.
+ *
+ * This function does not acquire the parent node. However, if the parent was
+ * already deactivated, then kdbus_node_deactivate() will, at some point, also
+ * deactivate the child. Therefore, we can rely on the explicit ordering during
+ * deactivation.
+ *
+ * Return: Reference to acquired child node, or NULL if not found / not active.
+ */
+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
+					 const char *name)
+{
+	struct kdbus_node *child;
+	struct rb_node *rb;
+	unsigned int hash;
+	int ret;
+
+	hash = kdbus_node_name_hash(name);
+
+	mutex_lock(&node->lock);
+	rb = node->children.rb_node;
+	while (rb) {
+		child = kdbus_node_from_rb(rb);
+		ret = kdbus_node_name_compare(hash, name, child);
+		if (ret < 0)
+			rb = rb->rb_left;
+		else if (ret > 0)
+			rb = rb->rb_right;
+		else
+			break;
+	}
+	if (rb && kdbus_node_acquire(child))
+		kdbus_node_ref(child);
+	else
+		child = NULL;
+	mutex_unlock(&node->lock);
+
+	return child;
+}
+
+static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
+						     unsigned int hash,
+						     const char *name)
+{
+	struct kdbus_node *n, *pos = NULL;
+	struct rb_node *rb;
+	int res;
+
+	/*
+	 * Find the closest child with ``node->hash >= hash'', or, if @name is
+	 * valid, ``node->name >= name'' (where '>=' is the lex. order).
+	 */
+
+	rb = node->children.rb_node;
+	while (rb) {
+		n = kdbus_node_from_rb(rb);
+
+		if (name)
+			res = kdbus_node_name_compare(hash, name, n);
+		else
+			res = hash - n->hash;
+
+		if (res <= 0) {
+			rb = rb->rb_left;
+			pos = n;
+		} else { /* ``hash > n->hash'', ``name > n->name'' */
+			rb = rb->rb_right;
+		}
+	}
+
+	return pos;
+}
+
+/**
+ * kdbus_node_find_closest() - Find closest child-match
+ * @node:	parent node to search through
+ * @hash:	hash value to find closest match for
+ *
+ * Find the closest child of @node with a hash greater than or equal to @hash.
+ * The closest match is the left-most child of @node with this property. Which
+ * means, it is the first child with that hash returned by
+ * kdbus_node_next_child(), if you'd iterate the whole parent node.
+ *
+ * Return: Reference to acquired child, or NULL if none found.
+ */
+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
+					   unsigned int hash)
+{
+	struct kdbus_node *child;
+	struct rb_node *rb;
+
+	mutex_lock(&node->lock);
+
+	child = node_find_closest_unlocked(node, hash, NULL);
+	while (child && !kdbus_node_acquire(child)) {
+		rb = rb_next(&child->rb);
+		if (rb)
+			child = kdbus_node_from_rb(rb);
+		else
+			child = NULL;
+	}
+	kdbus_node_ref(child);
+
+	mutex_unlock(&node->lock);
+
+	return child;
+}
+
+/**
+ * kdbus_node_next_child() - Acquire next child
+ * @node:	parent node
+ * @prev:	previous child-node position or NULL
+ *
+ * This function returns a reference to the next active child of @node, after
+ * the passed position @prev. If @prev is NULL, a reference to the first active
+ * child is returned. If no more active children are found, NULL is returned.
+ *
+ * This function acquires the next child it returns. If you're done with the
+ * returned pointer, you need to release _and_ unref it.
+ *
+ * The passed in pointer @prev is not modified by this function, and it does
+ * *not* have to be active. If @prev was acquired via different means, or if it
+ * was unlinked from its parent before you pass it in, then this iterator will
+ * still return the next active child (it will have to search through the
+ * rb-tree based on the node-name, though).
+ * However, @prev must not be linked to a different parent than @node!
+ *
+ * Return: Reference to next acquired child, or NULL if at the end.
+ */
+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
+					 struct kdbus_node *prev)
+{
+	struct kdbus_node *pos = NULL;
+	struct rb_node *rb;
+
+	mutex_lock(&node->lock);
+
+	if (!prev) {
+		/*
+		 * New iteration; find first node in rb-tree and try to acquire
+		 * it. If we got it, directly return it as first element.
+		 * Otherwise, the loop below will find the next active node.
+		 */
+		rb = rb_first(&node->children);
+		if (!rb)
+			goto exit;
+		pos = kdbus_node_from_rb(rb);
+		if (kdbus_node_acquire(pos))
+			goto exit;
+	} else if (RB_EMPTY_NODE(&prev->rb)) {
+		/*
+		 * The current iterator is no longer linked to the rb-tree. Use
+		 * its hash value and name to find the next _higher_ node and
+		 * acquire it. If we got it, return it as next element.
+		 * Otherwise, the loop below will find the next active node.
+		 */
+		pos = node_find_closest_unlocked(node, prev->hash, prev->name);
+		if (!pos)
+			goto exit;
+		if (kdbus_node_acquire(pos))
+			goto exit;
+	} else {
+		/*
+		 * The current iterator is still linked to the parent. Set it
+		 * as current position and use the loop below to find the next
+		 * active element.
+		 */
+		pos = prev;
+	}
+
+	/* @pos was already returned or is inactive; find next active node */
+	do {
+		rb = rb_next(&pos->rb);
+		if (rb)
+			pos = kdbus_node_from_rb(rb);
+		else
+			pos = NULL;
+	} while (pos && !kdbus_node_acquire(pos));
+
+exit:
+	/* @pos is NULL or acquired. Take ref if non-NULL and return it */
+	kdbus_node_ref(pos);
+	mutex_unlock(&node->lock);
+	return pos;
+}
diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
new file mode 100644
index 000000000000..fa9339123b8f
--- /dev/null
+++ b/ipc/kdbus/node.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NODE_H
+#define __KDBUS_NODE_H
+
+#include <linux/atomic.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+
+struct kdbus_bus;
+struct kdbus_domain;
+struct kdbus_ep;
+struct kdbus_node;
+
+enum kdbus_node_type {
+	KDBUS_NODE_DOMAIN,
+	KDBUS_NODE_CONTROL,
+	KDBUS_NODE_BUS,
+	KDBUS_NODE_ENDPOINT,
+};
+
+typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
+typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
+
+struct kdbus_node {
+	atomic_t refcnt;
+	atomic_t active;
+	wait_queue_head_t waitq;
+
+	/* static members */
+	unsigned int type;
+	kdbus_node_free_t free_cb;
+	kdbus_node_release_t release_cb;
+	umode_t mode;
+	kuid_t uid;
+	kgid_t gid;
+
+	/* valid once linked */
+	char *name;
+	unsigned int hash;
+	unsigned int id;
+	struct kdbus_node *parent; /* may be NULL */
+
+	/* valid iff active */
+	struct mutex lock;
+	struct rb_node rb;
+	struct rb_root children;
+};
+
+#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
+
+void kdbus_node_init(struct kdbus_node *node, unsigned int type);
+
+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
+		    const char *name);
+
+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
+
+bool kdbus_node_is_active(struct kdbus_node *node);
+bool kdbus_node_is_deactivated(struct kdbus_node *node);
+bool kdbus_node_activate(struct kdbus_node *node);
+void kdbus_node_deactivate(struct kdbus_node *node);
+
+bool kdbus_node_acquire(struct kdbus_node *node);
+void kdbus_node_release(struct kdbus_node *node);
+
+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
+					 const char *name);
+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
+					   unsigned int hash);
+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
+					 struct kdbus_node *prev);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 07/13] kdbus: add code to gather metadata
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.

See kdbus.txt for more details on which metadata can currently be
attached to messages.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/metadata.c | 1066 ++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/metadata.h |   52 +++
 2 files changed, 1118 insertions(+)
 create mode 100644 ipc/kdbus/metadata.c
 create mode 100644 ipc/kdbus/metadata.h

diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
new file mode 100644
index 000000000000..101561dfe5ac
--- /dev/null
+++ b/ipc/kdbus/metadata.c
@@ -0,0 +1,1066 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/fs_struct.h>
+#include <linux/init.h>
+#include <linux/kref.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/user_namespace.h>
+#include <linux/version.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+
+/**
+ * struct kdbus_meta_proc - Process metadata
+ * @kref:		Reference counting
+ * @lock:		Object lock
+ * @collected:		Bitmask of collected items
+ * @valid:		Bitmask of collected and valid items
+ * @uid:		UID of process
+ * @euid:		EUID of process
+ * @suid:		SUID of process
+ * @fsuid:		FSUID of process
+ * @gid:		GID of process
+ * @egid:		EGID of process
+ * @sgid:		SGID of process
+ * @fsgid:		FSGID of process
+ * @pid:		PID of process
+ * @tgid:		TGID of process
+ * @ppid:		PPID of process
+ * @auxgrps:		Auxiliary groups
+ * @n_auxgrps:		Number of items in @auxgrps
+ * @tid_comm:		TID comm line
+ * @pid_comm:		PID comm line
+ * @exe_path:		Executable path
+ * @root_path:		Root-FS path
+ * @cmdline:		Command-line
+ * @cgroup:		Full cgroup path
+ * @caps:		Capabilities
+ * @caps_namespace:	User-namespace of @caps
+ * @seclabel:		Seclabel
+ * @audit_loginuid:	Audit login-UID
+ * @audit_sessionid:	Audit session-ID
+ */
+struct kdbus_meta_proc {
+	struct kref kref;
+	struct mutex lock;
+	u64 collected;
+	u64 valid;
+
+	/* KDBUS_ITEM_CREDS */
+	kuid_t uid, euid, suid, fsuid;
+	kgid_t gid, egid, sgid, fsgid;
+
+	/* KDBUS_ITEM_PIDS */
+	struct pid *pid;
+	struct pid *tgid;
+	struct pid *ppid;
+
+	/* KDBUS_ITEM_AUXGROUPS */
+	kgid_t *auxgrps;
+	size_t n_auxgrps;
+
+	/* KDBUS_ITEM_TID_COMM */
+	char tid_comm[TASK_COMM_LEN];
+	/* KDBUS_ITEM_PID_COMM */
+	char pid_comm[TASK_COMM_LEN];
+
+	/* KDBUS_ITEM_EXE */
+	struct path exe_path;
+	struct path root_path;
+
+	/* KDBUS_ITEM_CMDLINE */
+	char *cmdline;
+
+	/* KDBUS_ITEM_CGROUP */
+	char *cgroup;
+
+	/* KDBUS_ITEM_CAPS */
+	struct caps {
+		/* binary compatible to kdbus_caps */
+		u32 last_cap;
+		struct {
+			u32 caps[_KERNEL_CAPABILITY_U32S];
+		} set[4];
+	} caps;
+	struct user_namespace *caps_namespace;
+
+	/* KDBUS_ITEM_SECLABEL */
+	char *seclabel;
+
+	/* KDBUS_ITEM_AUDIT */
+	kuid_t audit_loginuid;
+	unsigned int audit_sessionid;
+};
+
+/**
+ * struct kdbus_meta_conn
+ * @kref:		Reference counting
+ * @lock:		Object lock
+ * @collected:		Bitmask of collected items
+ * @valid:		Bitmask of collected and valid items
+ * @ts:			Timestamp values
+ * @owned_names_items:	Serialized items for owned names
+ * @owned_names_size:	Size of @owned_names_items
+ * @conn_description:	Connection description
+ */
+struct kdbus_meta_conn {
+	struct kref kref;
+	struct mutex lock;
+	u64 collected;
+	u64 valid;
+
+	/* KDBUS_ITEM_TIMESTAMP */
+	struct kdbus_timestamp ts;
+
+	/* KDBUS_ITEM_OWNED_NAME */
+	struct kdbus_item *owned_names_items;
+	size_t owned_names_size;
+
+	/* KDBUS_ITEM_CONN_DESCRIPTION */
+	char *conn_description;
+};
+
+/**
+ * kdbus_meta_proc_new() - Create process metadata object
+ *
+ * Return: Pointer to new object on success, ERR_PTR on failure.
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_new(void)
+{
+	struct kdbus_meta_proc *mp;
+
+	mp = kzalloc(sizeof(*mp), GFP_KERNEL);
+	if (!mp)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&mp->kref);
+	mutex_init(&mp->lock);
+
+	return mp;
+}
+
+static void kdbus_meta_proc_free(struct kref *kref)
+{
+	struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
+						  kref);
+
+	path_put(&mp->exe_path);
+	path_put(&mp->root_path);
+	put_user_ns(mp->caps_namespace);
+	put_pid(mp->ppid);
+	put_pid(mp->tgid);
+	put_pid(mp->pid);
+
+	kfree(mp->seclabel);
+	kfree(mp->auxgrps);
+	kfree(mp->cmdline);
+	kfree(mp->cgroup);
+	kfree(mp);
+}
+
+/**
+ * kdbus_meta_proc_ref() - Gain reference
+ * @mp:		Process metadata object
+ *
+ * Return: @mp is returned
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
+{
+	if (mp)
+		kref_get(&mp->kref);
+	return mp;
+}
+
+/**
+ * kdbus_meta_proc_unref() - Drop reference
+ * @mp:		Process metadata object
+ *
+ * Return: NULL
+ */
+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
+{
+	if (mp)
+		kref_put(&mp->kref, kdbus_meta_proc_free);
+	return NULL;
+}
+
+static void kdbus_meta_proc_collect_creds(struct kdbus_meta_proc *mp)
+{
+	mp->uid		= current_uid();
+	mp->euid	= current_euid();
+	mp->suid	= current_suid();
+	mp->fsuid	= current_fsuid();
+
+	mp->gid		= current_gid();
+	mp->egid	= current_egid();
+	mp->sgid	= current_sgid();
+	mp->fsgid	= current_fsgid();
+
+	mp->valid |= KDBUS_ATTACH_CREDS;
+}
+
+static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
+{
+	struct task_struct *parent;
+
+	mp->pid = get_pid(task_pid(current));
+	mp->tgid = get_pid(task_tgid(current));
+
+	rcu_read_lock();
+	parent = rcu_dereference(current->real_parent);
+	mp->ppid = get_pid(task_tgid(parent));
+	rcu_read_unlock();
+
+	mp->valid |= KDBUS_ATTACH_PIDS;
+}
+
+static int kdbus_meta_proc_collect_auxgroups(struct kdbus_meta_proc *mp)
+{
+	struct group_info *info;
+	size_t i;
+
+	info = get_current_groups();
+
+	if (info->ngroups > 0) {
+		mp->auxgrps = kmalloc_array(info->ngroups, sizeof(kgid_t),
+					    GFP_KERNEL);
+		if (!mp->auxgrps) {
+			put_group_info(info);
+			return -ENOMEM;
+		}
+
+		for (i = 0; i < info->ngroups; i++)
+			mp->auxgrps[i] = GROUP_AT(info, i);
+	}
+
+	mp->n_auxgrps = info->ngroups;
+	put_group_info(info);
+	mp->valid |= KDBUS_ATTACH_AUXGROUPS;
+
+	return 0;
+}
+
+static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
+{
+	get_task_comm(mp->tid_comm, current);
+	mp->valid |= KDBUS_ATTACH_TID_COMM;
+}
+
+static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
+{
+	get_task_comm(mp->pid_comm, current->group_leader);
+	mp->valid |= KDBUS_ATTACH_PID_COMM;
+}
+
+static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
+{
+	struct mm_struct *mm;
+
+	mm = get_task_mm(current);
+	if (!mm)
+		return;
+
+	down_read(&mm->mmap_sem);
+	if (mm->exe_file) {
+		mp->exe_path = mm->exe_file->f_path;
+		path_get(&mp->exe_path);
+		get_fs_root(current->fs, &mp->root_path);
+		mp->valid |= KDBUS_ATTACH_EXE;
+	}
+	up_read(&mm->mmap_sem);
+
+	mmput(mm);
+}
+
+static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
+{
+	struct mm_struct *mm;
+	char *cmdline;
+
+	mm = get_task_mm(current);
+	if (!mm)
+		return 0;
+
+	if (!mm->arg_end) {
+		mmput(mm);
+		return 0;
+	}
+
+	cmdline = strndup_user((const char __user *)mm->arg_start,
+			       mm->arg_end - mm->arg_start);
+	mmput(mm);
+
+	if (IS_ERR(cmdline))
+		return PTR_ERR(cmdline);
+
+	mp->cmdline = cmdline;
+	mp->valid |= KDBUS_ATTACH_CMDLINE;
+
+	return 0;
+}
+
+static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_CGROUPS
+	void *page;
+	char *s;
+
+	page = (void *)__get_free_page(GFP_TEMPORARY);
+	if (!page)
+		return -ENOMEM;
+
+	s = task_cgroup_path(current, page, PAGE_SIZE);
+	if (s) {
+		mp->cgroup = kstrdup(s, GFP_KERNEL);
+		if (!mp->cgroup) {
+			free_page((unsigned long)page);
+			return -ENOMEM;
+		}
+	}
+
+	free_page((unsigned long)page);
+	mp->valid |= KDBUS_ATTACH_CGROUP;
+#endif
+
+	return 0;
+}
+
+static void kdbus_meta_proc_collect_caps(struct kdbus_meta_proc *mp)
+{
+	const struct cred *c = current_cred();
+	int i;
+
+	/* ABI: "last_cap" equals /proc/sys/kernel/cap_last_cap */
+	mp->caps.last_cap = CAP_LAST_CAP;
+	mp->caps_namespace = get_user_ns(current_user_ns());
+
+	CAP_FOR_EACH_U32(i) {
+		mp->caps.set[0].caps[i] = c->cap_inheritable.cap[i];
+		mp->caps.set[1].caps[i] = c->cap_permitted.cap[i];
+		mp->caps.set[2].caps[i] = c->cap_effective.cap[i];
+		mp->caps.set[3].caps[i] = c->cap_bset.cap[i];
+	}
+
+	/* clear unused bits */
+	for (i = 0; i < 4; i++)
+		mp->caps.set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
+						CAP_LAST_U32_VALID_MASK;
+
+	mp->valid |= KDBUS_ATTACH_CAPS;
+}
+
+static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_SECURITY
+	char *ctx = NULL;
+	u32 sid, len;
+	int ret;
+
+	security_task_getsecid(current, &sid);
+	ret = security_secid_to_secctx(sid, &ctx, &len);
+	if (ret < 0) {
+		/*
+		 * EOPNOTSUPP means no security module is active,
+		 * lets skip adding the seclabel then. This effectively
+		 * drops the SECLABEL item.
+		 */
+		return (ret == -EOPNOTSUPP) ? 0 : ret;
+	}
+
+	mp->seclabel = kstrdup(ctx, GFP_KERNEL);
+	security_release_secctx(ctx, len);
+	if (!mp->seclabel)
+		return -ENOMEM;
+
+	mp->valid |= KDBUS_ATTACH_SECLABEL;
+#endif
+
+	return 0;
+}
+
+static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
+{
+#ifdef CONFIG_AUDITSYSCALL
+	mp->audit_loginuid = audit_get_loginuid(current);
+	mp->audit_sessionid = audit_get_sessionid(current);
+	mp->valid |= KDBUS_ATTACH_AUDIT;
+#endif
+}
+
+/**
+ * kdbus_meta_proc_collect() - Collect process metadata
+ * @mp:		Process metadata object
+ * @what:	Attach flags to collect
+ *
+ * This collects process metadata from current and saves it in @mp.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
+{
+	int ret;
+
+	if (!mp)
+		return 0;
+
+	mutex_lock(&mp->lock);
+
+	if ((what & KDBUS_ATTACH_CREDS) &&
+	    !(mp->collected & KDBUS_ATTACH_CREDS)) {
+		kdbus_meta_proc_collect_creds(mp);
+		mp->collected |= KDBUS_ATTACH_CREDS;
+	}
+
+	if ((what & KDBUS_ATTACH_PIDS) &&
+	    !(mp->collected & KDBUS_ATTACH_PIDS)) {
+		kdbus_meta_proc_collect_pids(mp);
+		mp->collected |= KDBUS_ATTACH_PIDS;
+	}
+
+	if ((what & KDBUS_ATTACH_AUXGROUPS) &&
+	    !(mp->collected & KDBUS_ATTACH_AUXGROUPS)) {
+		ret = kdbus_meta_proc_collect_auxgroups(mp);
+		if (ret < 0)
+			goto exit_unlock;
+		mp->collected |= KDBUS_ATTACH_AUXGROUPS;
+	}
+
+	if ((what & KDBUS_ATTACH_TID_COMM) &&
+	    !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
+		kdbus_meta_proc_collect_tid_comm(mp);
+		mp->collected |= KDBUS_ATTACH_TID_COMM;
+	}
+
+	if ((what & KDBUS_ATTACH_PID_COMM) &&
+	    !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
+		kdbus_meta_proc_collect_pid_comm(mp);
+		mp->collected |= KDBUS_ATTACH_PID_COMM;
+	}
+
+	if ((what & KDBUS_ATTACH_EXE) &&
+	    !(mp->collected & KDBUS_ATTACH_EXE)) {
+		kdbus_meta_proc_collect_exe(mp);
+		mp->collected |= KDBUS_ATTACH_EXE;
+	}
+
+	if ((what & KDBUS_ATTACH_CMDLINE) &&
+	    !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
+		ret = kdbus_meta_proc_collect_cmdline(mp);
+		if (ret < 0)
+			goto exit_unlock;
+		mp->collected |= KDBUS_ATTACH_CMDLINE;
+	}
+
+	if ((what & KDBUS_ATTACH_CGROUP) &&
+	    !(mp->collected & KDBUS_ATTACH_CGROUP)) {
+		ret = kdbus_meta_proc_collect_cgroup(mp);
+		if (ret < 0)
+			goto exit_unlock;
+		mp->collected |= KDBUS_ATTACH_CGROUP;
+	}
+
+	if ((what & KDBUS_ATTACH_CAPS) &&
+	    !(mp->collected & KDBUS_ATTACH_CAPS)) {
+		kdbus_meta_proc_collect_caps(mp);
+		mp->collected |= KDBUS_ATTACH_CAPS;
+	}
+
+	if ((what & KDBUS_ATTACH_SECLABEL) &&
+	    !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
+		ret = kdbus_meta_proc_collect_seclabel(mp);
+		if (ret < 0)
+			goto exit_unlock;
+		mp->collected |= KDBUS_ATTACH_SECLABEL;
+	}
+
+	if ((what & KDBUS_ATTACH_AUDIT) &&
+	    !(mp->collected & KDBUS_ATTACH_AUDIT)) {
+		kdbus_meta_proc_collect_audit(mp);
+		mp->collected |= KDBUS_ATTACH_AUDIT;
+	}
+
+	ret = 0;
+
+exit_unlock:
+	mutex_unlock(&mp->lock);
+	return ret;
+}
+
+/**
+ * kdbus_meta_proc_fake() - Fill process metadata from faked credentials
+ * @mp:		Metadata
+ * @creds:	Creds to set, may be %NULL
+ * @pids:	PIDs to set, may be %NULL
+ * @seclabel:	Seclabel to set, may be %NULL
+ *
+ * This function takes information stored in @creds, @pids and @seclabel and
+ * resolves them to kernel-representations, if possible. A call to this function
+ * is considered an alternative to calling kdbus_meta_add_current(), which
+ * derives the same information from the 'current' task.
+ *
+ * This call uses the current task's namespaces to resolve the given
+ * information.
+ *
+ * Return: 0 on success, negative error number otherwise.
+ */
+int kdbus_meta_proc_fake(struct kdbus_meta_proc *mp,
+			 const struct kdbus_creds *creds,
+			 const struct kdbus_pids *pids,
+			 const char *seclabel)
+{
+	int ret;
+
+	if (!mp)
+		return 0;
+
+	mutex_lock(&mp->lock);
+
+	if (creds && !(mp->collected & KDBUS_ATTACH_CREDS)) {
+		struct user_namespace *ns = current_user_ns();
+
+		mp->uid		= make_kuid(ns, creds->uid);
+		mp->euid	= make_kuid(ns, creds->euid);
+		mp->suid	= make_kuid(ns, creds->suid);
+		mp->fsuid	= make_kuid(ns, creds->fsuid);
+
+		mp->gid		= make_kgid(ns, creds->gid);
+		mp->egid	= make_kgid(ns, creds->egid);
+		mp->sgid	= make_kgid(ns, creds->sgid);
+		mp->fsgid	= make_kgid(ns, creds->fsgid);
+
+		if ((creds->uid   != (uid_t)-1 && !uid_valid(mp->uid))   ||
+		    (creds->euid  != (uid_t)-1 && !uid_valid(mp->euid))  ||
+		    (creds->suid  != (uid_t)-1 && !uid_valid(mp->suid))  ||
+		    (creds->fsuid != (uid_t)-1 && !uid_valid(mp->fsuid)) ||
+		    (creds->gid   != (gid_t)-1 && !gid_valid(mp->gid))   ||
+		    (creds->egid  != (gid_t)-1 && !gid_valid(mp->egid))  ||
+		    (creds->sgid  != (gid_t)-1 && !gid_valid(mp->sgid))  ||
+		    (creds->fsgid != (gid_t)-1 && !gid_valid(mp->fsgid))) {
+			ret = -EINVAL;
+			goto exit_unlock;
+		}
+
+		mp->valid |= KDBUS_ATTACH_CREDS;
+		mp->collected |= KDBUS_ATTACH_CREDS;
+	}
+
+	if (pids && !(mp->collected & KDBUS_ATTACH_PIDS)) {
+		mp->pid = get_pid(find_vpid(pids->tid));
+		mp->tgid = get_pid(find_vpid(pids->pid));
+		mp->ppid = get_pid(find_vpid(pids->ppid));
+
+		if ((pids->tid != 0 && !mp->pid) ||
+		    (pids->pid != 0 && !mp->tgid) ||
+		    (pids->ppid != 0 && !mp->ppid)) {
+			put_pid(mp->pid);
+			put_pid(mp->tgid);
+			put_pid(mp->ppid);
+			mp->pid = NULL;
+			mp->tgid = NULL;
+			mp->ppid = NULL;
+			ret = -EINVAL;
+			goto exit_unlock;
+		}
+
+		mp->valid |= KDBUS_ATTACH_PIDS;
+		mp->collected |= KDBUS_ATTACH_PIDS;
+	}
+
+	if (seclabel && !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
+		mp->seclabel = kstrdup(seclabel, GFP_KERNEL);
+		if (!mp->seclabel) {
+			ret = -ENOMEM;
+			goto exit_unlock;
+		}
+
+		mp->valid |= KDBUS_ATTACH_SECLABEL;
+		mp->collected |= KDBUS_ATTACH_SECLABEL;
+	}
+
+	ret = 0;
+
+exit_unlock:
+	mutex_unlock(&mp->lock);
+	return ret;
+}
+
+/**
+ * kdbus_meta_conn_new() - Create connection metadata object
+ *
+ * Return: Pointer to new object on success, ERR_PTR on failure.
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_new(void)
+{
+	struct kdbus_meta_conn *mc;
+
+	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
+	if (!mc)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&mc->kref);
+	mutex_init(&mc->lock);
+
+	return mc;
+}
+
+static void kdbus_meta_conn_free(struct kref *kref)
+{
+	struct kdbus_meta_conn *mc = container_of(kref, struct kdbus_meta_conn,
+						  kref);
+
+	kfree(mc->conn_description);
+	kfree(mc->owned_names_items);
+	kfree(mc);
+}
+
+/**
+ * kdbus_meta_conn_ref() - Gain reference
+ * @mc:		Connection metadata object
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
+{
+	if (mc)
+		kref_get(&mc->kref);
+	return mc;
+}
+
+/**
+ * kdbus_meta_conn_unref() - Drop reference
+ * @mc:		Connection metadata object
+ */
+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
+{
+	if (mc)
+		kref_put(&mc->kref, kdbus_meta_conn_free);
+	return NULL;
+}
+
+static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
+					      struct kdbus_kmsg *kmsg)
+{
+	struct timespec ts;
+
+	ktime_get_ts(&ts);
+	mc->ts.monotonic_ns = timespec_to_ns(&ts);
+
+	ktime_get_real_ts(&ts);
+	mc->ts.realtime_ns = timespec_to_ns(&ts);
+
+	if (kmsg)
+		mc->ts.seqnum = kmsg->seq;
+
+	mc->valid |= KDBUS_ATTACH_TIMESTAMP;
+}
+
+static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
+					 struct kdbus_conn *conn)
+{
+	const struct kdbus_name_entry *e;
+	struct kdbus_item *item;
+	size_t slen, size;
+
+	size = 0;
+	list_for_each_entry(e, &conn->names_list, conn_entry)
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_name) +
+					strlen(e->name) + 1);
+
+	if (!size)
+		return 0;
+
+	item = kmalloc(size, GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+
+	mc->owned_names_items = item;
+	mc->owned_names_size = size;
+
+	list_for_each_entry(e, &conn->names_list, conn_entry) {
+		slen = strlen(e->name) + 1;
+		kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
+			       sizeof(struct kdbus_name) + slen);
+		item->name.flags = e->flags;
+		memcpy(item->name.name, e->name, slen);
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	/* sanity check: the buffer should be completely written now */
+	WARN_ON((u8 *)item != (u8 *)mc->owned_names_items + size);
+
+	mc->valid |= KDBUS_ATTACH_NAMES;
+	return 0;
+}
+
+static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
+					       struct kdbus_conn *conn)
+{
+	if (!conn->description)
+		return 0;
+
+	mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
+	if (!mc->conn_description)
+		return -ENOMEM;
+
+	mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
+	return 0;
+}
+
+/**
+ * kdbus_meta_conn_collect() - Collect connection metadata
+ * @mc:		Message metadata object
+ * @kmsg:	Kmsg to collect data from
+ * @conn:	Connection to collect data from
+ * @what:	Attach flags to collect
+ *
+ * This collects connection metadata from @kmsg and @conn and saves it in @mc.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
+			    struct kdbus_kmsg *kmsg,
+			    struct kdbus_conn *conn,
+			    u64 what)
+{
+	int ret;
+
+	if (!mc)
+		return 0;
+
+	if (conn)
+		mutex_lock(&conn->lock);
+	mutex_lock(&mc->lock);
+
+	if (kmsg && (what & KDBUS_ATTACH_TIMESTAMP) &&
+	    !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
+		kdbus_meta_conn_collect_timestamp(mc, kmsg);
+		mc->collected |= KDBUS_ATTACH_TIMESTAMP;
+	}
+
+
+	if (conn && (what & KDBUS_ATTACH_NAMES) &&
+	    !(mc->collected & KDBUS_ATTACH_NAMES)) {
+		ret = kdbus_meta_conn_collect_names(mc, conn);
+		if (ret < 0)
+			goto exit_unlock;
+		mc->collected |= KDBUS_ATTACH_NAMES;
+	}
+
+	if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
+	    !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
+		ret = kdbus_meta_conn_collect_description(mc, conn);
+		if (ret < 0)
+			goto exit_unlock;
+		mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
+	}
+
+	ret = 0;
+
+exit_unlock:
+	mutex_unlock(&mc->lock);
+	if (conn)
+		mutex_unlock(&conn->lock);
+	return ret;
+}
+
+/**
+ * kdbus_meta_export() - export information from metadata into buffer
+ * @mp:		Process metadata, or NULL
+ * @mc:		Connection metadata, or NULL
+ * @mask:	Mask of KDBUS_ATTACH_* flags to export
+ * @sz:		Pointer to return the buffer size
+ *
+ * This function exports information from metadata to allocated buffer.
+ * Only information that is requested in @mask and that has been collected
+ * before is exported.
+ *
+ * All information will be translated using the current namespaces.
+ *
+ * Return: An array of items on success, ERR_PTR value on errors. On success,
+ * @sz is also set to the number of bytes returned in the items array. The
+ * caller must release the buffer via kfree().
+ */
+struct kdbus_item *kdbus_meta_export(struct kdbus_meta_proc *mp,
+				     struct kdbus_meta_conn *mc,
+				     u64 mask,
+				     size_t *sz)
+{
+	struct user_namespace *user_ns = current_user_ns();
+	struct kdbus_item *item, *items = NULL;
+	char *exe_pathname = NULL;
+	void *exe_page = NULL;
+	size_t size = 0;
+	u64 valid = 0;
+	int ret;
+
+	if (WARN_ON(!sz))
+		return ERR_PTR(-EINVAL);
+
+	if (mp) {
+		mutex_lock(&mp->lock);
+		valid |= mp ? mp->valid : 0;
+		mutex_unlock(&mp->lock);
+	}
+
+	if (mc) {
+		mutex_lock(&mc->lock);
+		valid |= mc ? mc->valid : 0;
+		mutex_unlock(&mc->lock);
+	}
+
+	mask &= valid;
+	mask &= kdbus_meta_attach_mask;
+
+	/*
+	 * TODO: We currently have no sane way of translating a set of caps
+	 * between different user namespaces. Until that changes, we have
+	 * to drop such items.
+	 */
+	if (mp && mp->caps_namespace != user_ns)
+		mask &= ~KDBUS_ATTACH_CAPS;
+
+	if (!mask) {
+		*sz = 0;
+		return NULL;
+	}
+
+	/* process metadata */
+
+	if (mp && (mask & KDBUS_ATTACH_CREDS))
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
+
+	if (mp && (mask & KDBUS_ATTACH_PIDS))
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
+
+	if (mp && (mask & KDBUS_ATTACH_AUXGROUPS))
+		size += KDBUS_ITEM_SIZE(mp->n_auxgrps * sizeof(u32));
+
+	if (mp && (mask & KDBUS_ATTACH_TID_COMM))
+		size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_PID_COMM))
+		size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_EXE)) {
+		struct path p;
+
+		/*
+		 * TODO: We need access to __d_path() so we can write the path
+		 * relative to conn->root_path. Once upstream, we need
+		 * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
+		 * takes the root path directly. Until then, we drop this item
+		 * if the root-paths differ.
+		 */
+
+		get_fs_root(current->fs, &p);
+		if (path_equal(&p, &mp->root_path)) {
+			exe_page = (void *)__get_free_page(GFP_TEMPORARY);
+			if (!exe_page) {
+				path_put(&p);
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			exe_pathname = d_path(&mp->exe_path, exe_page,
+					      PAGE_SIZE);
+			if (IS_ERR(exe_pathname)) {
+				path_put(&p);
+				ret = PTR_ERR(exe_pathname);
+				goto exit;
+			}
+
+			size += KDBUS_ITEM_SIZE(strlen(exe_pathname) + 1);
+		} else {
+			mask &= ~KDBUS_ATTACH_EXE;
+		}
+		path_put(&p);
+	}
+
+	if (mp && (mask & KDBUS_ATTACH_CMDLINE))
+		size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_CGROUP))
+		size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_CAPS))
+		size += KDBUS_ITEM_SIZE(sizeof(mp->caps));
+
+	if (mp && (mask & KDBUS_ATTACH_SECLABEL))
+		size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_AUDIT))
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
+
+	/* connection metadata */
+
+	if (mc && (mask & KDBUS_ATTACH_NAMES))
+		size += mc->owned_names_size;
+
+	if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
+		size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
+
+	if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
+
+	if (!size) {
+		*sz = 0;
+		ret = 0;
+		goto exit;
+	}
+
+	items = kzalloc(size, GFP_KERNEL);
+	if (!items) {
+		ret = -ENOMEM;
+		goto exit;
+	}
+
+	item = items;
+
+	/* process metadata */
+
+	if (mp && (mask & KDBUS_ATTACH_CREDS)) {
+		struct kdbus_creds creds = {
+			.uid	= kdbus_from_kuid_keep(mp->uid),
+			.euid	= kdbus_from_kuid_keep(mp->euid),
+			.suid	= kdbus_from_kuid_keep(mp->suid),
+			.fsuid	= kdbus_from_kuid_keep(mp->fsuid),
+			.gid	= kdbus_from_kgid_keep(mp->gid),
+			.egid	= kdbus_from_kgid_keep(mp->egid),
+			.sgid	= kdbus_from_kgid_keep(mp->sgid),
+			.fsgid	= kdbus_from_kgid_keep(mp->fsgid),
+		};
+
+		item = kdbus_item_set(item, KDBUS_ITEM_CREDS, &creds,
+				      sizeof(creds));
+	}
+
+	if (mp && (mask & KDBUS_ATTACH_PIDS)) {
+		struct kdbus_pids pids = {
+			.pid = pid_vnr(mp->tgid),
+			.tid = pid_vnr(mp->pid),
+			.ppid = pid_vnr(mp->ppid),
+		};
+
+		item = kdbus_item_set(item, KDBUS_ITEM_PIDS, &pids,
+				      sizeof(pids));
+	}
+
+	if (mp && (mask & KDBUS_ATTACH_AUXGROUPS)) {
+		int i;
+
+		kdbus_item_set(item, KDBUS_ITEM_AUXGROUPS, NULL,
+			       mp->n_auxgrps * sizeof(u32));
+
+		for (i = 0; i < mp->n_auxgrps; i++)
+			item->data32[i] = from_kgid_munged(user_ns,
+							   mp->auxgrps[i]);
+
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	if (mp && (mask & KDBUS_ATTACH_TID_COMM))
+		item = kdbus_item_set(item, KDBUS_ITEM_TID_COMM, mp->tid_comm,
+				      strlen(mp->tid_comm) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_PID_COMM))
+		item = kdbus_item_set(item, KDBUS_ITEM_PID_COMM, mp->pid_comm,
+				      strlen(mp->pid_comm) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_EXE))
+		item = kdbus_item_set(item, KDBUS_ITEM_EXE, exe_pathname,
+				      strlen(exe_pathname) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_CMDLINE))
+		item = kdbus_item_set(item, KDBUS_ITEM_CMDLINE, mp->cmdline,
+				      strlen(mp->cmdline) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_CGROUP))
+		item = kdbus_item_set(item, KDBUS_ITEM_CGROUP, mp->cgroup,
+				      strlen(mp->cgroup) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_CAPS))
+		item = kdbus_item_set(item, KDBUS_ITEM_CAPS, &mp->caps,
+				      sizeof(mp->caps));
+
+	if (mp && (mask & KDBUS_ATTACH_SECLABEL))
+		item = kdbus_item_set(item, KDBUS_ITEM_SECLABEL, mp->seclabel,
+				      strlen(mp->seclabel) + 1);
+
+	if (mp && (mask & KDBUS_ATTACH_AUDIT)) {
+		struct kdbus_audit a = {
+			.loginuid = from_kuid(user_ns, mp->audit_loginuid),
+			.sessionid = mp->audit_sessionid,
+		};
+
+		item = kdbus_item_set(item, KDBUS_ITEM_AUDIT, &a, sizeof(a));
+	}
+
+	/* connection metadata */
+
+	if (mc && (mask & KDBUS_ATTACH_NAMES)) {
+		memcpy(item, mc->owned_names_items, mc->owned_names_size);
+		item = (struct kdbus_item *)
+				((u8 *)item + mc->owned_names_size);
+	}
+
+	if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
+		item = kdbus_item_set(item, KDBUS_ITEM_CONN_DESCRIPTION,
+				      mc->conn_description,
+				      strlen(mc->conn_description) + 1);
+
+	if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
+		item = kdbus_item_set(item, KDBUS_ITEM_TIMESTAMP, &mc->ts,
+				      sizeof(mc->ts));
+
+	/* sanity check: the buffer should be completely written now */
+	WARN_ON((u8 *)item != (u8 *)items + size);
+
+	*sz = size;
+	ret = 0;
+
+exit:
+	if (exe_page)
+		free_page((unsigned long)exe_page);
+	return ret < 0 ? ERR_PTR(ret) : items;
+}
+
+/**
+ * kdbus_meta_calc_attach_flags() - calculate attach flags for a sender
+ *				    and a receiver
+ * @sender:		Sending connection
+ * @receiver:		Receiving connection
+ *
+ * Return: the attach flags both the sender and the receiver have opted-in
+ * for.
+ */
+u64 kdbus_meta_calc_attach_flags(const struct kdbus_conn *sender,
+				 const struct kdbus_conn *receiver)
+{
+	return atomic64_read(&sender->attach_flags_send) &
+	       atomic64_read(&receiver->attach_flags_recv);
+}
diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
new file mode 100644
index 000000000000..4ddac7bf630d
--- /dev/null
+++ b/ipc/kdbus/metadata.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_METADATA_H
+#define __KDBUS_METADATA_H
+
+struct kdbus_conn;
+struct kdbus_domain;
+struct kdbus_kmsg;
+struct kdbus_pool_slice;
+
+struct kdbus_meta_proc;
+struct kdbus_meta_conn;
+
+extern unsigned long long kdbus_meta_attach_mask;
+
+struct kdbus_meta_proc *kdbus_meta_proc_new(void);
+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
+int kdbus_meta_proc_fake(struct kdbus_meta_proc *mp,
+			 const struct kdbus_creds *creds,
+			 const struct kdbus_pids *pids,
+			 const char *seclabel);
+
+struct kdbus_meta_conn *kdbus_meta_conn_new(void);
+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
+			    struct kdbus_kmsg *kmsg,
+			    struct kdbus_conn *conn,
+			    u64 what);
+
+struct kdbus_item *kdbus_meta_export(struct kdbus_meta_proc *mp,
+				     struct kdbus_meta_conn *mc,
+				     u64 mask,
+				     size_t *sz);
+u64 kdbus_meta_calc_attach_flags(const struct kdbus_conn *sender,
+				 const struct kdbus_conn *receiver);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 08/13] kdbus: add code for notifications and matches
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds code for matches and notifications.

Notifications are broadcast messages generated by the kernel, which
notify subscribes when connections are created or destroyed, when
well-known-names have been claimed, released or changed ownership,
or when reply messages have timed out.

Matches are used to tell the kernel driver which broadcast messages
a connection is interested in. Matches can either be specific on one
of the kernel-generated notification types, or carry a bloom filter
mask to match against a message from userspace. The latter is a way
to pre-filter messages from other connections in order to mitigate
unnecessary wakeups.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/match.c  | 535 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/match.h  |  32 ++++
 ipc/kdbus/notify.c | 244 ++++++++++++++++++++++++
 ipc/kdbus/notify.h |  30 +++
 4 files changed, 841 insertions(+)
 create mode 100644 ipc/kdbus/match.c
 create mode 100644 ipc/kdbus/match.h
 create mode 100644 ipc/kdbus/notify.c
 create mode 100644 ipc/kdbus/notify.h

diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
new file mode 100644
index 000000000000..d4f21848f66b
--- /dev/null
+++ b/ipc/kdbus/match.c
@@ -0,0 +1,535 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+
+/**
+ * struct kdbus_match_db - message filters
+ * @entries_list:	List of matches
+ * @mdb_rwlock:		Match data lock
+ * @entries_count:	Number of entries in database
+ */
+struct kdbus_match_db {
+	struct list_head entries_list;
+	struct rw_semaphore mdb_rwlock;
+	unsigned int entries_count;
+};
+
+/**
+ * struct kdbus_match_entry - a match database entry
+ * @cookie:		User-supplied cookie to lookup the entry
+ * @list_entry:		The list entry element for the db list
+ * @rules_list:		The list head for tracking rules of this entry
+ */
+struct kdbus_match_entry {
+	u64 cookie;
+	struct list_head list_entry;
+	struct list_head rules_list;
+};
+
+/**
+ * struct kdbus_bloom_mask - mask to match against filter
+ * @generations:	Number of generations carried
+ * @data:		Array of bloom bit fields
+ */
+struct kdbus_bloom_mask {
+	u64 generations;
+	u64 *data;
+};
+
+/**
+ * struct kdbus_match_rule - a rule appended to a match entry
+ * @type:		An item type to match agains
+ * @bloom_mask:		Bloom mask to match a message's filter against, used
+ *			with KDBUS_ITEM_BLOOM_MASK
+ * @name:		Name to match against, used with KDBUS_ITEM_NAME,
+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
+ * @old_id:		ID to match against, used with
+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ *			KDBUS_ITEM_ID_REMOVE
+ * @new_id:		ID to match against, used with
+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ *			KDBUS_ITEM_ID_REMOVE
+ * @src_id:		ID to match against, used with KDBUS_ITEM_ID
+ * @rules_entry:	Entry in the entry's rules list
+ */
+struct kdbus_match_rule {
+	u64 type;
+	union {
+		struct kdbus_bloom_mask bloom_mask;
+		struct {
+			char *name;
+			u64 old_id;
+			u64 new_id;
+		};
+		u64 src_id;
+	};
+	struct list_head rules_entry;
+};
+
+static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
+{
+	if (!rule)
+		return;
+
+	switch (rule->type) {
+	case KDBUS_ITEM_BLOOM_MASK:
+		kfree(rule->bloom_mask.data);
+		break;
+
+	case KDBUS_ITEM_NAME:
+	case KDBUS_ITEM_NAME_ADD:
+	case KDBUS_ITEM_NAME_REMOVE:
+	case KDBUS_ITEM_NAME_CHANGE:
+		kfree(rule->name);
+		break;
+
+	case KDBUS_ITEM_ID:
+	case KDBUS_ITEM_ID_ADD:
+	case KDBUS_ITEM_ID_REMOVE:
+		break;
+
+	default:
+		BUG();
+	}
+
+	list_del(&rule->rules_entry);
+	kfree(rule);
+}
+
+static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
+{
+	struct kdbus_match_rule *r, *tmp;
+
+	list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
+		kdbus_match_rule_free(r);
+
+	list_del(&entry->list_entry);
+	kfree(entry);
+}
+
+/**
+ * kdbus_match_db_free() - free match db resources
+ * @mdb:		The match database
+ */
+void kdbus_match_db_free(struct kdbus_match_db *mdb)
+{
+	struct kdbus_match_entry *entry, *tmp;
+
+	if (!mdb)
+		return;
+
+	down_write(&mdb->mdb_rwlock);
+	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
+		kdbus_match_entry_free(entry);
+	up_write(&mdb->mdb_rwlock);
+
+	kfree(mdb);
+}
+
+/**
+ * kdbus_match_db_new() - create a new match database
+ *
+ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
+ */
+struct kdbus_match_db *kdbus_match_db_new(void)
+{
+	struct kdbus_match_db *d;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return ERR_PTR(-ENOMEM);
+
+	init_rwsem(&d->mdb_rwlock);
+	INIT_LIST_HEAD(&d->entries_list);
+
+	return d;
+}
+
+static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
+			      const struct kdbus_bloom_mask *mask,
+			      const struct kdbus_conn *conn)
+{
+	size_t n = conn->ep->bus->bloom.size / sizeof(u64);
+	const u64 *m;
+	size_t i;
+
+	/*
+	 * The message's filter carries a generation identifier, the
+	 * match's mask possibly carries an array of multiple generations
+	 * of the mask. Select the mask with the closest match of the
+	 * filter's generation.
+	 */
+	m = mask->data + (min(filter->generation, mask->generations - 1) * n);
+
+	/*
+	 * The message's filter contains the messages properties,
+	 * the match's mask contains the properties to look for in the
+	 * message. Check the mask bit field against the filter bit field,
+	 * if the message possibly carries the properties the connection
+	 * has subscribed to.
+	 */
+	for (i = 0; i < n; i++)
+		if ((filter->data[i] & m[i]) != m[i])
+			return false;
+
+	return true;
+}
+
+static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
+			      struct kdbus_conn *conn_src,
+			      struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_match_rule *r;
+
+	/*
+	 * Walk all the rules and bail out immediately
+	 * if any of them is unsatisfied.
+	 */
+
+	list_for_each_entry(r, &entry->rules_list, rules_entry) {
+		if (conn_src) {
+			/* messages from userspace */
+
+			switch (r->type) {
+			case KDBUS_ITEM_BLOOM_MASK:
+				if (!kdbus_match_bloom(kmsg->bloom_filter,
+						       &r->bloom_mask,
+						       conn_src))
+					return false;
+				break;
+
+			case KDBUS_ITEM_ID:
+				if (r->src_id != conn_src->id &&
+				    r->src_id != KDBUS_MATCH_ID_ANY)
+					return false;
+
+				break;
+
+			case KDBUS_ITEM_NAME:
+				if (!kdbus_conn_has_name(conn_src, r->name))
+					return false;
+
+				break;
+
+			default:
+				return false;
+			}
+		} else {
+			/* kernel notifications */
+
+			if (kmsg->notify_type != r->type)
+				return false;
+
+			switch (r->type) {
+			case KDBUS_ITEM_ID_ADD:
+				if (r->new_id != KDBUS_MATCH_ID_ANY &&
+				    r->new_id != kmsg->notify_new_id)
+					return false;
+
+				break;
+
+			case KDBUS_ITEM_ID_REMOVE:
+				if (r->old_id != KDBUS_MATCH_ID_ANY &&
+				    r->old_id != kmsg->notify_old_id)
+					return false;
+
+				break;
+
+			case KDBUS_ITEM_NAME_ADD:
+			case KDBUS_ITEM_NAME_CHANGE:
+			case KDBUS_ITEM_NAME_REMOVE:
+				if ((r->old_id != KDBUS_MATCH_ID_ANY &&
+				     r->old_id != kmsg->notify_old_id) ||
+				    (r->new_id != KDBUS_MATCH_ID_ANY &&
+				     r->new_id != kmsg->notify_new_id) ||
+				    (r->name && kmsg->notify_name &&
+				     strcmp(r->name, kmsg->notify_name) != 0))
+					return false;
+
+				break;
+
+			default:
+				return false;
+			}
+		}
+	}
+
+	return true;
+}
+
+/**
+ * kdbus_match_db_match_kmsg() - match a kmsg object agains the database entries
+ * @mdb:		The match database
+ * @conn_src:		The connection object originating the message
+ * @kmsg:		The kmsg to perform the match on
+ *
+ * This function will walk through all the database entries previously uploaded
+ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
+ * set, this function will return true.
+ *
+ * Return: true if there was a matching database entry, false otherwise.
+ */
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *mdb,
+			       struct kdbus_conn *conn_src,
+			       struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_match_entry *entry;
+	bool matched = false;
+
+	down_read(&mdb->mdb_rwlock);
+	list_for_each_entry(entry, &mdb->entries_list, list_entry) {
+		matched = kdbus_match_rules(entry, conn_src, kmsg);
+		if (matched)
+			break;
+	}
+	up_read(&mdb->mdb_rwlock);
+
+	return matched;
+}
+
+static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
+					  u64 cookie)
+{
+	struct kdbus_match_entry *entry, *tmp;
+	bool found = false;
+
+	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
+		if (entry->cookie == cookie) {
+			kdbus_match_entry_free(entry);
+			--mdb->entries_count;
+			found = true;
+		}
+
+	return found ? 0 : -ENOENT;
+}
+
+/**
+ * kdbus_match_db_add() - add an entry to the match database
+ * @conn:		The connection that was used in the ioctl call
+ * @cmd:		The command as provided by the ioctl call
+ *
+ * This function is used in the context of the KDBUS_CMD_MATCH_ADD ioctl
+ * interface.
+ *
+ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
+ * adds one new database entry with n rules attached to it. Each rule is
+ * described with an kdbus_item, and an entry is considered matching if all
+ * its rules are satisfied.
+ *
+ * The items attached to a kdbus_cmd_match struct have the following mapping:
+ *
+ * KDBUS_ITEM_BLOOM_MASK:	A bloom mask
+ * KDBUS_ITEM_NAME:		A connection's source name
+ * KDBUS_ITEM_ID:		A connection ID
+ * KDBUS_ITEM_NAME_ADD:
+ * KDBUS_ITEM_NAME_REMOVE:
+ * KDBUS_ITEM_NAME_CHANGE:	Well-known name changes, carry
+ *				kdbus_notify_name_change
+ * KDBUS_ITEM_ID_ADD:
+ * KDBUS_ITEM_ID_REMOVE:	Connection ID changes, carry
+ *				kdbus_notify_id_change
+ *
+ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
+ * are looked at at when adding an entry. The flags are unused.
+ *
+ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME and KDBUS_ITEM_ID
+ * are used to match messages from userspace, while the others apply to
+ * kernel-generated notifications.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_match_db_add(struct kdbus_conn *conn,
+		       struct kdbus_cmd_match *cmd)
+{
+	struct kdbus_match_entry *entry = NULL;
+	struct kdbus_match_db *mdb = conn->match_db;
+	struct kdbus_item *item;
+	int ret = 0;
+
+	kdbus_conn_assert_active(conn);
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->cookie = cmd->cookie;
+	INIT_LIST_HEAD(&entry->list_entry);
+	INIT_LIST_HEAD(&entry->rules_list);
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		struct kdbus_match_rule *rule;
+		size_t size = item->size - offsetof(struct kdbus_item, data);
+
+		rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+		if (!rule) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		rule->type = item->type;
+		INIT_LIST_HEAD(&rule->rules_entry);
+
+		switch (item->type) {
+		/* First matches for userspace messages */
+		case KDBUS_ITEM_BLOOM_MASK: {
+			u64 bsize = conn->ep->bus->bloom.size;
+			u64 generations;
+			u64 remainder;
+
+			generations = div64_u64_rem(size, bsize, &remainder);
+			if (size < bsize || remainder > 0) {
+				ret = -EDOM;
+				break;
+			}
+
+			rule->bloom_mask.data = kmemdup(item->data,
+							size, GFP_KERNEL);
+			if (!rule->bloom_mask.data) {
+				ret = -ENOMEM;
+				break;
+			}
+
+			/* we get an array of n generations of bloom masks */
+			rule->bloom_mask.generations = generations;
+
+			break;
+		}
+
+		case KDBUS_ITEM_NAME:
+			/*
+			 * Do not allow wildcard for now, since we
+			 * must validate the wildcard first
+			 */
+			if (!kdbus_name_is_valid(item->str, false)) {
+				ret = -EINVAL;
+				break;
+			}
+
+			rule->name = kstrdup(item->str, GFP_KERNEL);
+			if (!rule->name)
+				ret = -ENOMEM;
+
+			break;
+
+		case KDBUS_ITEM_ID:
+			rule->src_id = item->id;
+			break;
+
+		/* Now matches for kernel messages */
+		case KDBUS_ITEM_NAME_ADD:
+		case KDBUS_ITEM_NAME_REMOVE:
+		case KDBUS_ITEM_NAME_CHANGE: {
+			rule->old_id = item->name_change.old_id.id;
+			rule->new_id = item->name_change.new_id.id;
+
+			if (size > sizeof(struct kdbus_notify_name_change)) {
+				rule->name = kstrdup(item->name_change.name,
+						     GFP_KERNEL);
+				if (!rule->name)
+					ret = -ENOMEM;
+			}
+
+			break;
+		}
+
+		case KDBUS_ITEM_ID_ADD:
+		case KDBUS_ITEM_ID_REMOVE:
+			if (item->type == KDBUS_ITEM_ID_ADD)
+				rule->new_id = item->id_change.id;
+			else
+				rule->old_id = item->id_change.id;
+
+			break;
+
+		default:
+			ret = -EINVAL;
+			break;
+		}
+
+		if (ret < 0) {
+			kdbus_match_rule_free(rule);
+			break;
+		}
+
+		list_add_tail(&rule->rules_entry, &entry->rules_list);
+	}
+
+	if (ret < 0)
+		goto exit;
+
+	down_write(&mdb->mdb_rwlock);
+
+	/* Remove any entry that has the same cookie as the current one. */
+	if (cmd->flags & KDBUS_MATCH_REPLACE)
+		kdbus_match_db_remove_unlocked(mdb, entry->cookie);
+
+	/*
+	 * If the above removal caught any entry, there will be room for the
+	 * new one.
+	 */
+	if (++mdb->entries_count > KDBUS_MATCH_MAX) {
+		--mdb->entries_count;
+		ret = -EMFILE;
+	} else {
+		list_add_tail(&entry->list_entry, &mdb->entries_list);
+	}
+
+	up_write(&mdb->mdb_rwlock);
+
+exit:
+	if (ret < 0)
+		kdbus_match_entry_free(entry);
+
+	return ret;
+}
+
+/**
+ * kdbus_match_db_remove() - remove an entry from the match database
+ * @conn:		The connection that was used in the ioctl call
+ * @cmd:		Pointer to the match data structure
+ *
+ * This function is used in the context of the KDBUS_CMD_MATCH_REMOVE
+ * ioctl interface.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_match_db_remove(struct kdbus_conn *conn,
+			  struct kdbus_cmd_match *cmd)
+{
+	struct kdbus_match_db *mdb = conn->match_db;
+	int ret;
+
+	kdbus_conn_assert_active(conn);
+
+	down_write(&mdb->mdb_rwlock);
+	ret = kdbus_match_db_remove_unlocked(mdb, cmd->cookie);
+	up_write(&mdb->mdb_rwlock);
+
+	return ret;
+}
diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
new file mode 100644
index 000000000000..81aed4a99721
--- /dev/null
+++ b/ipc/kdbus/match.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MATCH_H
+#define __KDBUS_MATCH_H
+
+struct kdbus_conn;
+struct kdbus_kmsg;
+struct kdbus_match_db;
+
+struct kdbus_match_db *kdbus_match_db_new(void);
+void kdbus_match_db_free(struct kdbus_match_db *db);
+int kdbus_match_db_add(struct kdbus_conn *conn,
+		       struct kdbus_cmd_match *cmd);
+int kdbus_match_db_remove(struct kdbus_conn *conn,
+			  struct kdbus_cmd_match *cmd);
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *db,
+			       struct kdbus_conn *conn_src,
+			       struct kdbus_kmsg *kmsg);
+
+#endif
diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
new file mode 100644
index 000000000000..b429d42bf9f6
--- /dev/null
+++ b/ipc/kdbus/notify.c
@@ -0,0 +1,244 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "item.h"
+#include "message.h"
+#include "notify.h"
+
+static inline void kdbus_notify_add_tail(struct kdbus_kmsg *kmsg,
+					 struct kdbus_bus *bus)
+{
+	spin_lock(&bus->notify_lock);
+	list_add_tail(&kmsg->notify_entry, &bus->notify_list);
+	spin_unlock(&bus->notify_lock);
+}
+
+static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
+			      u64 cookie, u64 msg_type)
+{
+	struct kdbus_kmsg *kmsg = NULL;
+
+	WARN_ON(id == 0);
+
+	kmsg = kdbus_kmsg_new(0);
+	if (IS_ERR(kmsg))
+		return PTR_ERR(kmsg);
+
+	/*
+	 * a kernel-generated notification can only contain one
+	 * struct kdbus_item, so make a shortcut here for
+	 * faster lookup in the match db.
+	 */
+	kmsg->notify_type = msg_type;
+	kmsg->msg.dst_id = id;
+	kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+	kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+	kmsg->msg.cookie_reply = cookie;
+	kmsg->msg.items[0].type = msg_type;
+
+	kdbus_notify_add_tail(kmsg, bus);
+
+	return 0;
+}
+
+/**
+ * kdbus_notify_reply_timeout() - queue a timeout reply
+ * @bus:		Bus which queues the messages
+ * @id:			The destination's connection ID
+ * @cookie:		The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
+}
+
+/**
+ * kdbus_notify_reply_dead() - queue a 'dead' reply
+ * @bus:		Bus which queues the messages
+ * @id:			The destination's connection ID
+ * @cookie:		The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
+}
+
+/**
+ * kdbus_notify_name_change() - queue a notification about a name owner change
+ * @bus:		Bus which queues the messages
+ * @type:		The type if the notification; KDBUS_ITEM_NAME_ADD,
+ *			KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
+ * @old_id:		The id of the connection that used to own the name
+ * @new_id:		The id of the new owner connection
+ * @old_flags:		The flags to pass in the KDBUS_ITEM flags field for
+ *			the old owner
+ * @new_flags:		The flags to pass in the KDBUS_ITEM flags field for
+ *			the new owner
+ * @name:		The name that was removed or assigned to a new owner
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+			     u64 old_id, u64 new_id,
+			     u64 old_flags, u64 new_flags,
+			     const char *name)
+{
+	struct kdbus_kmsg *kmsg = NULL;
+	size_t name_len, extra_size;
+
+	name_len = strlen(name) + 1;
+	extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
+	kmsg = kdbus_kmsg_new(extra_size);
+	if (IS_ERR(kmsg))
+		return PTR_ERR(kmsg);
+
+	kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+	kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+	kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+	kmsg->notify_type = type;
+	kmsg->notify_old_id = old_id;
+	kmsg->notify_new_id = new_id;
+	kmsg->msg.items[0].type = type;
+	kmsg->msg.items[0].name_change.old_id.id = old_id;
+	kmsg->msg.items[0].name_change.old_id.flags = old_flags;
+	kmsg->msg.items[0].name_change.new_id.id = new_id;
+	kmsg->msg.items[0].name_change.new_id.flags = new_flags;
+	memcpy(kmsg->msg.items[0].name_change.name, name, name_len);
+	kmsg->notify_name = kmsg->msg.items[0].name_change.name;
+
+	kdbus_notify_add_tail(kmsg, bus);
+
+	return 0;
+}
+
+/**
+ * kdbus_notify_id_change() - queue a notification about a unique ID change
+ * @bus:		Bus which queues the messages
+ * @type:		The type if the notification; KDBUS_ITEM_ID_ADD or
+ *			KDBUS_ITEM_ID_REMOVE
+ * @id:			The id of the connection that was added or removed
+ * @flags:		The flags to pass in the KDBUS_ITEM flags field
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
+{
+	struct kdbus_kmsg *kmsg = NULL;
+
+	kmsg = kdbus_kmsg_new(sizeof(struct kdbus_notify_id_change));
+	if (IS_ERR(kmsg))
+		return PTR_ERR(kmsg);
+
+	kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+	kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+	kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+	kmsg->notify_type = type;
+
+	switch (type) {
+	case KDBUS_ITEM_ID_ADD:
+		kmsg->notify_new_id = id;
+		break;
+
+	case KDBUS_ITEM_ID_REMOVE:
+		kmsg->notify_old_id = id;
+		break;
+
+	default:
+		BUG();
+	}
+
+	kmsg->msg.items[0].type = type;
+	kmsg->msg.items[0].id_change.id = id;
+	kmsg->msg.items[0].id_change.flags = flags;
+
+	kdbus_notify_add_tail(kmsg, bus);
+
+	return 0;
+}
+
+/**
+ * kdbus_notify_flush() - send a list of collected messages
+ * @bus:		Bus which queues the messages
+ *
+ * The list is empty after sending the messages.
+ */
+void kdbus_notify_flush(struct kdbus_bus *bus)
+{
+	LIST_HEAD(notify_list);
+	struct kdbus_kmsg *kmsg, *tmp;
+
+	mutex_lock(&bus->notify_flush_lock);
+
+	spin_lock(&bus->notify_lock);
+	list_splice_init(&bus->notify_list, &notify_list);
+	spin_unlock(&bus->notify_lock);
+
+	list_for_each_entry_safe(kmsg, tmp, &notify_list, notify_entry) {
+		kmsg->seq = atomic64_inc_return(&bus->domain->msg_seq_last);
+		kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, NULL,
+					KDBUS_ATTACH_TIMESTAMP);
+
+		if (kmsg->msg.dst_id != KDBUS_DST_ID_BROADCAST) {
+			struct kdbus_conn *conn;
+
+			conn = kdbus_bus_find_conn_by_id(bus, kmsg->msg.dst_id);
+			if (conn) {
+				kdbus_bus_eavesdrop(bus, NULL, kmsg);
+				kdbus_conn_entry_insert(NULL, conn, kmsg, NULL);
+				kdbus_conn_unref(conn);
+			}
+		} else {
+			kdbus_bus_broadcast(bus, NULL, kmsg);
+		}
+
+		list_del(&kmsg->notify_entry);
+		kdbus_kmsg_free(kmsg);
+	}
+
+	mutex_unlock(&bus->notify_flush_lock);
+}
+
+/**
+ * kdbus_notify_free() - free a list of collected messages
+ * @bus:		Bus which queues the messages
+ */
+void kdbus_notify_free(struct kdbus_bus *bus)
+{
+	struct kdbus_kmsg *kmsg, *tmp;
+
+	list_for_each_entry_safe(kmsg, tmp, &bus->notify_list, notify_entry) {
+		list_del(&kmsg->notify_entry);
+		kdbus_kmsg_free(kmsg);
+	}
+}
diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
new file mode 100644
index 000000000000..22039c123647
--- /dev/null
+++ b/ipc/kdbus/notify.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NOTIFY_H
+#define __KDBUS_NOTIFY_H
+
+struct kdbus_bus;
+
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+			     u64 old_id, u64 new_id,
+			     u64 old_flags, u64 new_flags,
+			     const char *name);
+void kdbus_notify_flush(struct kdbus_bus *bus);
+void kdbus_notify_free(struct kdbus_bus *bus);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 09/13] kdbus: add code for buses, domains and endpoints
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

Add the logic to handle the following entities:

Domain:
  A domain is an unamed object containing a number of buses. A
  domain is automatically created when an instance of kdbusfs
  is mounted, and destroyed when it is unmounted.
  Every domain offers its own "control" device node to create
  buses.  Domains have no connection to each other and cannot
  see nor talk to each other.

Bus:
  A bus is a named object inside a domain. Clients exchange messages
  over a bus. Multiple buses themselves have no connection to each
  other; messages can only be exchanged on the same bus. The default
  entry point to a bus, where clients establish the connection to, is
  the "bus" device node /sys/fs/kdbus/<bus name>/bus.  Common operating
  system setups create one "system bus" per system, and one "user
  bus" for every logged-in user. Applications or services may create
  their own private named buses.

Endpoint:
  An endpoint provides the device node to talk to a bus. Opening an
  endpoint creates a new connection to the bus to which the endpoint
  belongs. Every bus has a default endpoint called "bus". A bus can
  optionally offer additional endpoints with custom names to provide
  a restricted access to the same bus. Custom endpoints carry
  additional policy which can be used to give sandboxed processes
  only a locked-down, limited, filtered access to the same bus.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/bus.c      | 553 +++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/bus.h      | 103 ++++++++++
 ipc/kdbus/domain.c   | 350 ++++++++++++++++++++++++++++++++
 ipc/kdbus/domain.h   |  84 ++++++++
 ipc/kdbus/endpoint.c | 232 +++++++++++++++++++++
 ipc/kdbus/endpoint.h |  68 +++++++
 6 files changed, 1390 insertions(+)
 create mode 100644 ipc/kdbus/bus.c
 create mode 100644 ipc/kdbus/bus.h
 create mode 100644 ipc/kdbus/domain.c
 create mode 100644 ipc/kdbus/domain.h
 create mode 100644 ipc/kdbus/endpoint.c
 create mode 100644 ipc/kdbus/endpoint.h

diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
new file mode 100644
index 000000000000..ffcde832116c
--- /dev/null
+++ b/ipc/kdbus/bus.c
@@ -0,0 +1,553 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "notify.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "policy.h"
+#include "util.h"
+
+static void kdbus_bus_free(struct kdbus_node *node)
+{
+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+	WARN_ON(!list_empty(&bus->monitors_list));
+	WARN_ON(!hash_empty(bus->conn_hash));
+
+	kdbus_notify_free(bus);
+
+	kdbus_domain_user_unref(bus->creator);
+	kdbus_name_registry_free(bus->name_registry);
+	kdbus_domain_unref(bus->domain);
+	kdbus_policy_db_clear(&bus->policy_db);
+	kdbus_meta_proc_unref(bus->creator_meta);
+	kfree(bus);
+}
+
+static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
+{
+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+	if (was_active)
+		atomic_dec(&bus->creator->buses);
+}
+
+/**
+ * kdbus_bus_new() - create a kdbus_cmd_make from user-supplied data
+ * @domain:		The domain to work on
+ * @make:		Information as passed in by userspace
+ * @uid:		The uid of the bus node
+ * @gid:		The gid of the bus node
+ *
+ * This function is part of the connection ioctl() interface and will parse
+ * the user-supplied data in order to create a new kdbus_bus.
+ *
+ * Return: the new bus on success, ERR_PTR on failure.
+ */
+struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
+				const struct kdbus_cmd_make *make,
+				kuid_t uid, kgid_t gid)
+{
+	const struct kdbus_bloom_parameter *bloom = NULL;
+	const u64 *pattach_owner = NULL;
+	const u64 *pattach_recv = NULL;
+	const struct kdbus_item *item;
+	const char *name = NULL;
+	struct kdbus_bus *b;
+	u64 attach_owner;
+	u64 attach_recv;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, make->items, KDBUS_ITEMS_SIZE(make, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_MAKE_NAME:
+			if (name)
+				return ERR_PTR(-EEXIST);
+
+			name = item->str;
+			break;
+
+		case KDBUS_ITEM_BLOOM_PARAMETER:
+			if (bloom)
+				return ERR_PTR(-EEXIST);
+
+			bloom = &item->bloom_parameter;
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+			if (pattach_owner)
+				return ERR_PTR(-EEXIST);
+
+			pattach_owner = &item->data64[0];
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+			if (pattach_recv)
+				return ERR_PTR(-EEXIST);
+
+			pattach_recv = &item->data64[0];
+			break;
+
+		default:
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	if (!name || !bloom)
+		return ERR_PTR(-EBADMSG);
+
+	if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE)
+		return ERR_PTR(-EINVAL);
+	if (!KDBUS_IS_ALIGNED8(bloom->size))
+		return ERR_PTR(-EINVAL);
+	if (bloom->n_hash < 1)
+		return ERR_PTR(-EINVAL);
+
+	ret = kdbus_sanitize_attach_flags(pattach_recv ? *pattach_recv : 0,
+					  &attach_recv);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
+					  &attach_owner);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	b = kzalloc(sizeof(*b), GFP_KERNEL);
+	if (!b)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&b->node, KDBUS_NODE_BUS);
+
+	b->node.free_cb = kdbus_bus_free;
+	b->node.release_cb = kdbus_bus_release;
+	b->node.uid = uid;
+	b->node.gid = gid;
+	b->node.mode = S_IRUSR | S_IXUSR;
+
+	b->access = make->flags & (KDBUS_MAKE_ACCESS_WORLD |
+				   KDBUS_MAKE_ACCESS_GROUP);
+	if (b->access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		b->node.mode |= S_IRGRP | S_IXGRP;
+	if (b->access & KDBUS_MAKE_ACCESS_WORLD)
+		b->node.mode |= S_IROTH | S_IXOTH;
+
+	b->bus_flags = make->flags;
+	b->bloom = *bloom;
+	b->attach_flags_req = attach_recv;
+	b->attach_flags_owner = attach_owner;
+	mutex_init(&b->lock);
+	init_rwsem(&b->conn_rwlock);
+	hash_init(b->conn_hash);
+	INIT_LIST_HEAD(&b->monitors_list);
+	INIT_LIST_HEAD(&b->notify_list);
+	spin_lock_init(&b->notify_lock);
+	mutex_init(&b->notify_flush_lock);
+	atomic64_set(&b->conn_seq_last, 0);
+	b->domain = kdbus_domain_ref(domain);
+	kdbus_policy_db_init(&b->policy_db);
+	b->id = atomic64_inc_return(&domain->bus_seq_last);
+
+	/* generate unique bus id */
+	generate_random_uuid(b->id128);
+
+	ret = kdbus_node_link(&b->node, &domain->node, name);
+	if (ret < 0)
+		goto exit_unref;
+
+	/* cache the metadata/credentials of the creator */
+	b->creator_meta = kdbus_meta_proc_new();
+	if (IS_ERR(b->creator_meta)) {
+		ret = PTR_ERR(b->creator_meta);
+		b->creator_meta = NULL;
+		goto exit_unref;
+	}
+
+	ret = kdbus_meta_proc_collect(b->creator_meta,
+				      KDBUS_ATTACH_CREDS |
+				      KDBUS_ATTACH_PIDS |
+				      KDBUS_ATTACH_AUXGROUPS |
+				      KDBUS_ATTACH_TID_COMM |
+				      KDBUS_ATTACH_PID_COMM |
+				      KDBUS_ATTACH_EXE |
+				      KDBUS_ATTACH_CMDLINE |
+				      KDBUS_ATTACH_CGROUP |
+				      KDBUS_ATTACH_CAPS |
+				      KDBUS_ATTACH_SECLABEL |
+				      KDBUS_ATTACH_AUDIT);
+	if (ret < 0)
+		goto exit_unref;
+
+	b->name_registry = kdbus_name_registry_new();
+	if (IS_ERR(b->name_registry)) {
+		ret = PTR_ERR(b->name_registry);
+		b->name_registry = NULL;
+		goto exit_unref;
+	}
+
+	/*
+	 * Bus-limits of the creator are accounted on its real UID, just like
+	 * all other per-user limits.
+	 */
+	b->creator = kdbus_domain_get_user(domain, current_uid());
+	if (IS_ERR(b->creator)) {
+		ret = PTR_ERR(b->creator);
+		b->creator = NULL;
+		goto exit_unref;
+	}
+
+	return b;
+
+exit_unref:
+	kdbus_node_deactivate(&b->node);
+	kdbus_node_unref(&b->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
+ * @bus:		The bus to reference
+ *
+ * Every user of a bus, except for its creator, must add a reference to the
+ * kdbus_bus using this function.
+ *
+ * Return: the bus itself
+ */
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
+{
+	if (bus)
+		kdbus_node_ref(&bus->node);
+	return bus;
+}
+
+/**
+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
+ * @bus:		The bus to unref
+ *
+ * Release a reference. If the reference count drops to 0, the bus will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
+{
+	if (bus)
+		kdbus_node_unref(&bus->node);
+	return NULL;
+}
+
+/**
+ * kdbus_bus_activate() - activate a bus
+ * @bus:		Bus
+ *
+ * Activate a bus and make it available to user-space.
+ *
+ * Returns: 0 on success, negative error code on failure
+ */
+int kdbus_bus_activate(struct kdbus_bus *bus)
+{
+	struct kdbus_ep *ep;
+	int ret;
+
+	if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
+		atomic_dec(&bus->creator->buses);
+		return -EMFILE;
+	}
+
+	/*
+	 * kdbus_bus_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&bus->node)) {
+		atomic_dec(&bus->creator->buses);
+		return -ESHUTDOWN;
+	}
+
+	/*
+	 * Create a new default endpoint for this bus. If activation succeeds,
+	 * we drop our own reference, effectively causing the endpoint to be
+	 * deactivated and released when the parent domain is.
+	 */
+	ep = kdbus_ep_new(bus, "bus", bus->access,
+			  bus->node.uid, bus->node.gid, false);
+	if (IS_ERR(ep))
+		return PTR_ERR(ep);
+
+	ret = kdbus_ep_activate(ep);
+	if (ret < 0)
+		kdbus_ep_deactivate(ep);
+	kdbus_ep_unref(ep);
+
+	return 0;
+}
+
+/**
+ * kdbus_bus_deactivate() - deactivate a bus
+ * @bus:               The kdbus reference
+ *
+ * The passed bus will be disconnected and the associated endpoint will be
+ * unref'ed.
+ */
+void kdbus_bus_deactivate(struct kdbus_bus *bus)
+{
+	kdbus_node_deactivate(&bus->node);
+}
+
+/**
+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
+ * @bus:		The bus to look for the connection
+ * @id:			The 64-bit connection id
+ *
+ * Looks up a connection with a given id. The returned connection
+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
+ * the connection can't be found.
+ */
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
+{
+	struct kdbus_conn *conn, *found = NULL;
+
+	down_read(&bus->conn_rwlock);
+	hash_for_each_possible(bus->conn_hash, conn, hentry, id)
+		if (conn->id == id) {
+			found = kdbus_conn_ref(conn);
+			break;
+		}
+	up_read(&bus->conn_rwlock);
+
+	return found;
+}
+
+/**
+ * kdbus_bus_broadcast() - send a message to all subscribed connections
+ * @bus:	The bus the connections are connected to
+ * @conn_src:	The source connection, may be %NULL for kernel notifications
+ * @kmsg:	The message to send.
+ *
+ * Send @kmsg to all connections that are currently active on the bus.
+ * Connections must still have matches installed in order to let the message
+ * pass.
+ */
+void kdbus_bus_broadcast(struct kdbus_bus *bus,
+			 struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_conn *conn_dst;
+	unsigned int i;
+	int ret;
+
+	/*
+	 * Make sure broadcast are queued on monitors before we send it out to
+	 * anyone else. Otherwise, connections might react to broadcasts before
+	 * the monitor gets the broadcast queued. In the worst case, the
+	 * monitor sees a reaction to the broadcast before the broadcast itself.
+	 * We don't give ordering guarantees across connections (and monitors
+	 * can re-construct order via sequence numbers), but we should at least
+	 * try to avoid re-ordering for monitors.
+	 */
+	kdbus_bus_eavesdrop(bus, conn_src, kmsg);
+
+	down_read(&bus->conn_rwlock);
+
+	hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
+		if (conn_dst->id == kmsg->msg.src_id)
+			continue;
+		if (!kdbus_conn_is_ordinary(conn_dst))
+			continue;
+
+		/*
+		 * Check if there is a match for the kmsg object in
+		 * the destination connection match db
+		 */
+		if (!kdbus_match_db_match_kmsg(conn_dst->match_db, conn_src,
+					       kmsg))
+			continue;
+
+		if (conn_src) {
+			u64 attach_flags;
+
+			/*
+			 * Anyone can send broadcasts, as they have no
+			 * destination. But a receiver needs TALK access to
+			 * the sender in order to receive broadcasts.
+			 */
+			if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
+				continue;
+
+			attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+								    conn_dst);
+
+			/*
+			 * Keep sending messages even if we cannot acquire the
+			 * requested metadata. It's up to the receiver to drop
+			 * messages that lack expected metadata.
+			 */
+			if (!conn_src->faked_meta)
+				kdbus_meta_proc_collect(kmsg->proc_meta,
+							attach_flags);
+			kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+						attach_flags);
+		} else {
+			/*
+			 * Check if there is a policy db that prevents the
+			 * destination connection from receiving this kernel
+			 * notification
+			 */
+			if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
+								kmsg))
+				continue;
+		}
+
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+		if (ret < 0)
+			atomic_inc(&conn_dst->lost_count);
+	}
+
+	up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
+ * @bus:	The bus the monitors are connected to
+ * @conn_src:	The source connection, may be %NULL for kernel notifications
+ * @kmsg:	The message to send.
+ *
+ * Send @kmsg to all monitors that are currently active on the bus. Monitors
+ * must still have matches installed in order to let the message pass.
+ */
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
+			 struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_conn *conn_dst;
+	int ret;
+
+	/*
+	 * Monitor connections get all messages; ignore possible errors
+	 * when sending messages to monitor connections.
+	 */
+
+	down_read(&bus->conn_rwlock);
+	list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
+		/*
+		 * Collect metadata requested by the destination connection.
+		 * Ignore errors, as receivers need to check metadata
+		 * availability, anyway. So it's still better to send messages
+		 * that lack data, than to skip it entirely.
+		 */
+		if (conn_src) {
+			u64 attach_flags;
+
+			attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+								    conn_dst);
+			if (!conn_src->faked_meta)
+				kdbus_meta_proc_collect(kmsg->proc_meta,
+							attach_flags);
+			kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+						attach_flags);
+		}
+
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+		if (ret < 0)
+			atomic_inc(&conn_dst->lost_count);
+	}
+	up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_cmd_bus_creator_info() - get information on a bus creator
+ * @conn:	The querying connection
+ * @cmd_info:	The command buffer, as passed in from the ioctl
+ *
+ * Gather information on the creator of the bus @conn is connected to.
+ *
+ * Return: 0 on success, error otherwise.
+ */
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+			       struct kdbus_cmd_info *cmd_info)
+{
+	struct kdbus_bus *bus = conn->ep->bus;
+	struct kdbus_pool_slice *slice = NULL;
+	struct kdbus_item_header item_hdr;
+	struct kdbus_item *meta_items;
+	struct kdbus_info info = {};
+	size_t meta_size, name_len;
+	struct kvec kvec[5];
+	u64 attach_flags;
+	size_t cnt = 0;
+	int ret;
+
+	info.id = bus->id;
+	info.flags = bus->bus_flags;
+
+	name_len = strlen(bus->node.name) + 1;
+
+	/* mask out what information the bus owner wants to pass us */
+	attach_flags = cmd_info->flags & bus->attach_flags_owner;
+
+	meta_items = kdbus_meta_export(bus->creator_meta, NULL, attach_flags,
+				       &meta_size);
+	if (IS_ERR(meta_items))
+		return PTR_ERR(meta_items);
+
+	item_hdr.type = KDBUS_ITEM_MAKE_NAME;
+	item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
+
+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
+	kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &info.size);
+	kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &info.size);
+	cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
+
+	if (meta_items && meta_size)
+		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &info.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, info.size, kvec, NULL, cnt);
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit;
+	}
+
+	/* write back the offset */
+	kdbus_pool_slice_publish(slice, &cmd_info->offset,
+				 &cmd_info->info_size);
+	ret = 0;
+
+	kdbus_pool_slice_release(slice);
+exit:
+	kfree(meta_items);
+	return ret;
+}
diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
new file mode 100644
index 000000000000..3fa57373165c
--- /dev/null
+++ b/ipc/kdbus/bus.h
@@ -0,0 +1,103 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_BUS_H
+#define __KDBUS_BUS_H
+
+#include <linux/hashtable.h>
+#include <linux/spinlock.h>
+#include <linux/rwsem.h>
+
+#include "node.h"
+#include "policy.h"
+#include "util.h"
+
+/**
+ * struct kdbus_bus - bus in a domain
+ * @node:		kdbus_node
+ * @domain:		Domain of this bus
+ * @id:			ID of this bus in the domain
+ * @lock:		Bus data lock
+ * @access:		The access flags for the bus directory
+ * @ep_seq_last:	Last used endpoint id sequence number
+ * @conn_seq_last:	Last used connection id sequence number
+ * @bus_flags:		Simple pass-through flags from userspace to userspace
+ * @attach_flags_req:	KDBUS_ATTACH_* flags required by connecting peers
+ * @attach_flags_owner:	KDBUS_ATTACH_* flags of bus creator that other
+ *			connections can see or query
+ * @name_registry:	Name registry of this bus
+ * @bloom:		Bloom parameters
+ * @id128:		Unique random 128 bit ID of this bus
+ * @creator:		Creator of the bus
+ * @policy_db:		Policy database for this bus
+ * @notify_list:	List of pending kernel-generated messages
+ * @notify_lock:	Notification list lock
+ * @notify_flush_lock:	Notification flushing lock
+ * @conn_rwlock:	Read/Write lock for all lists of child connections
+ * @conn_hash:		Map of connection IDs
+ * @monitors_list:	Connections that monitor this bus
+ * @meta_proc:		Meta information about the bus creator
+ *
+ * A bus provides a "bus" endpoint node.
+ *
+ * A bus is created by opening the control node and issuing the
+ * KDBUS_CMD_BUS_MAKE iotcl. Closing this file immediately destroys
+ * the bus.
+ */
+struct kdbus_bus {
+	struct kdbus_node node;
+	struct kdbus_domain *domain;
+	u64 id;
+	struct mutex lock;
+	unsigned int access;
+	atomic64_t ep_seq_last;
+	atomic64_t conn_seq_last;
+	u64 bus_flags;
+	u64 attach_flags_req;
+	u64 attach_flags_owner;
+	struct kdbus_name_registry *name_registry;
+	struct kdbus_bloom_parameter bloom;
+	u8 id128[16];
+	struct kdbus_domain_user *creator;
+	struct kdbus_policy_db policy_db;
+	struct list_head notify_list;
+	spinlock_t notify_lock;
+	struct mutex notify_flush_lock;
+
+	struct rw_semaphore conn_rwlock;
+	DECLARE_HASHTABLE(conn_hash, 8);
+	struct list_head monitors_list;
+
+	struct kdbus_meta_proc *creator_meta;
+};
+
+struct kdbus_kmsg;
+
+struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
+				const struct kdbus_cmd_make *make,
+				kuid_t uid, kgid_t gid);
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
+int kdbus_bus_activate(struct kdbus_bus *bus);
+void kdbus_bus_deactivate(struct kdbus_bus *bus);
+
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+			       struct kdbus_cmd_info *cmd_info);
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
+void kdbus_bus_broadcast(struct kdbus_bus *bus, struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg);
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus, struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg);
+
+#endif
diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
new file mode 100644
index 000000000000..81d1fb52b73c
--- /dev/null
+++ b/ipc/kdbus/domain.c
@@ -0,0 +1,350 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "handle.h"
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+static void kdbus_domain_control_free(struct kdbus_node *node)
+{
+	kfree(node);
+}
+
+static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
+						   unsigned int access)
+{
+	struct kdbus_node *node;
+	int ret;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(node, KDBUS_NODE_CONTROL);
+
+	node->free_cb = kdbus_domain_control_free;
+	node->mode = domain->node.mode;
+	node->mode = S_IRUSR | S_IWUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		node->mode |= S_IRGRP | S_IWGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		node->mode |= S_IROTH | S_IWOTH;
+
+	ret = kdbus_node_link(node, &domain->node, "control");
+	if (ret < 0)
+		goto exit_free;
+
+	return node;
+
+exit_free:
+	kdbus_node_deactivate(node);
+	kdbus_node_unref(node);
+	return ERR_PTR(ret);
+}
+
+static void kdbus_domain_free(struct kdbus_node *node)
+{
+	struct kdbus_domain *domain =
+		container_of(node, struct kdbus_domain, node);
+
+	WARN_ON(!hash_empty(domain->user_hash));
+
+	put_user_ns(domain->user_namespace);
+	idr_destroy(&domain->user_idr);
+	kfree(domain);
+}
+
+/**
+ * kdbus_domain_new() - create a new domain
+ * @access:		The access mode for this node (KDBUS_MAKE_ACCESS_*)
+ *
+ * Return: a new kdbus_domain on success, ERR_PTR on failure
+ */
+struct kdbus_domain *kdbus_domain_new(unsigned int access)
+{
+	struct kdbus_domain *d;
+	int ret;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
+
+	d->node.free_cb = kdbus_domain_free;
+	d->node.mode = S_IRUSR | S_IXUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		d->node.mode |= S_IRGRP | S_IXGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		d->node.mode |= S_IROTH | S_IXOTH;
+
+	d->access = access;
+	mutex_init(&d->lock);
+	atomic64_set(&d->msg_seq_last, 0);
+	idr_init(&d->user_idr);
+
+	/* Pin user namespace so we can guarantee domain-unique bus * names. */
+	d->user_namespace = get_user_ns(current_user_ns());
+
+	ret = kdbus_node_link(&d->node, NULL, NULL);
+	if (ret < 0)
+		goto exit_unref;
+
+	return d;
+
+exit_unref:
+	kdbus_node_deactivate(&d->node);
+	kdbus_node_unref(&d->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_domain_ref() - take a domain reference
+ * @domain:		Domain
+ *
+ * Return: the domain itself
+ */
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
+{
+	if (domain)
+		kdbus_node_ref(&domain->node);
+	return domain;
+}
+
+/**
+ * kdbus_domain_unref() - drop a domain reference
+ * @domain:		Domain
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
+{
+	if (domain)
+		kdbus_node_unref(&domain->node);
+	return NULL;
+}
+
+/**
+ * kdbus_domain_activate() - activate a domain
+ * @domain:		Domain
+ *
+ * Activate a domain so it will be visible to user-space and can be accessed
+ * by external entities.
+ *
+ * Returns: 0 on success, negative error-code on failure
+ */
+int kdbus_domain_activate(struct kdbus_domain *domain)
+{
+	struct kdbus_node *control;
+
+	/*
+	 * kdbus_domain_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&domain->node))
+		return -ESHUTDOWN;
+
+	/*
+	 * Create a control-node for this domain. We drop our own reference
+	 * immediately, effectively causing the node to be deactivated and
+	 * released when the parent domain is.
+	 */
+	control = kdbus_domain_control_new(domain, domain->access);
+	if (IS_ERR(control))
+		return PTR_ERR(control);
+
+	kdbus_node_activate(control);
+	kdbus_node_unref(control);
+
+	return 0;
+}
+
+/**
+ * kdbus_domain_deactivate() - invalidate a domain
+ * @domain:		Domain
+ */
+void kdbus_domain_deactivate(struct kdbus_domain *domain)
+{
+	kdbus_node_deactivate(&domain->node);
+}
+
+/**
+ * kdbus_domain_user_assign_id() - allocate ID and assign it to the
+ *				   domain user
+ * @domain:		The domain of the user
+ * @user:		The kdbus_domain_user object of the user
+ *
+ * Returns 0 if ID in [0, INT_MAX] is successfully assigned to the
+ * domain user. Negative errno on failure.
+ *
+ * The user index is used in arrays for accounting user quota in
+ * receiver queues.
+ *
+ * Caller must have the domain lock held and must ensure that the
+ * domain was not disconnected.
+ */
+static int kdbus_domain_user_assign_id(struct kdbus_domain *domain,
+				       struct kdbus_domain_user *user)
+{
+	int ret;
+
+	/*
+	 * Allocate the smallest possible index for this user; used
+	 * in arrays for accounting user quota in receiver queues.
+	 */
+	ret = idr_alloc(&domain->user_idr, user, 0, 0, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+
+	user->idr = ret;
+
+	return 0;
+}
+
+/**
+ * kdbus_domain_get_user() - get a kdbus_domain_user object
+ * @domain:		The domain of the user
+ * @uid:		The uid of the user; INVALID_UID for an
+ *			anonymous user like a custom endpoint
+ *
+ * If there is a uid matching, then use the already accounted
+ * kdbus_domain_user, increment its reference counter and return it.
+ * Otherwise allocate a new one, link it into the domain and return it.
+ *
+ * Return: the accounted domain user on success, ERR_PTR on failure.
+ */
+struct kdbus_domain_user *kdbus_domain_get_user(struct kdbus_domain *domain,
+						kuid_t uid)
+{
+	struct kdbus_domain_user *tmp_user;
+	struct kdbus_domain_user *u = NULL;
+	int ret;
+
+	mutex_lock(&domain->lock);
+
+	/* find uid and reference it */
+	if (uid_valid(uid)) {
+		hash_for_each_possible(domain->user_hash, tmp_user,
+				       hentry, __kuid_val(uid)) {
+			if (!uid_eq(tmp_user->uid, uid))
+				continue;
+
+			/*
+			 * If the ref-count is already 0, the destructor is
+			 * about to unlink and destroy the object. Continue
+			 * looking for a next one or create one, if none found.
+			 */
+			if (kref_get_unless_zero(&tmp_user->kref)) {
+				mutex_unlock(&domain->lock);
+				return tmp_user;
+			}
+		}
+	}
+
+	u = kzalloc(sizeof(*u), GFP_KERNEL);
+	if (!u) {
+		ret = -ENOMEM;
+		goto exit_unlock;
+	}
+
+	kref_init(&u->kref);
+	u->domain = kdbus_domain_ref(domain);
+	u->uid = uid;
+	atomic_set(&u->buses, 0);
+	atomic_set(&u->connections, 0);
+
+	/* Assign user ID and link into domain */
+	ret = kdbus_domain_user_assign_id(domain, u);
+	if (ret < 0)
+		goto exit_free;
+
+	/* UID hash map */
+	hash_add(domain->user_hash, &u->hentry, __kuid_val(u->uid));
+
+	mutex_unlock(&domain->lock);
+	return u;
+
+exit_free:
+	kdbus_domain_unref(u->domain);
+	kfree(u);
+exit_unlock:
+	mutex_unlock(&domain->lock);
+	return ERR_PTR(ret);
+}
+
+static void __kdbus_domain_user_free(struct kref *kref)
+{
+	struct kdbus_domain_user *user =
+		container_of(kref, struct kdbus_domain_user, kref);
+
+	WARN_ON(atomic_read(&user->buses) > 0);
+	WARN_ON(atomic_read(&user->connections) > 0);
+
+	/*
+	 * Lookups ignore objects with a ref-count of 0. Therefore, we can
+	 * safely remove it from the table after dropping the last reference.
+	 * No-one will acquire a ref in parallel.
+	 */
+	mutex_lock(&user->domain->lock);
+	idr_remove(&user->domain->user_idr, user->idr);
+	hash_del(&user->hentry);
+	mutex_unlock(&user->domain->lock);
+
+	kdbus_domain_unref(user->domain);
+	kfree(user);
+}
+
+/**
+ * kdbus_domain_user_ref() - take a domain user reference
+ * @u:		User
+ *
+ * Return: the domain user itself
+ */
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u)
+{
+	kref_get(&u->kref);
+	return u;
+}
+
+/**
+ * kdbus_domain_user_unref() - drop a domain user reference
+ * @u:		User
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
+{
+	if (u)
+		kref_put(&u->kref, __kdbus_domain_user_free);
+	return NULL;
+}
diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
new file mode 100644
index 000000000000..0154d566eb2f
--- /dev/null
+++ b/ipc/kdbus/domain.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DOMAIN_H
+#define __KDBUS_DOMAIN_H
+
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/user_namespace.h>
+
+#include "node.h"
+
+/**
+ * struct kdbus_domain - domain for buses
+ * @node:		Underlying API node
+ * @access:		Access mode for this domain
+ * @lock:		Domain data lock
+ * @bus_seq_last:	Last used bus id sequence number
+ * @msg_seq_last:	Last used message id sequence number
+ * @user_hash:		Accounting of user resources
+ * @user_idr:		Map of all users; smallest possible index
+ * @user_namespace:	User namespace, pinned at creation time
+ * @dentry:		Root dentry of VFS mount (dont use outside of kdbusfs)
+ */
+struct kdbus_domain {
+	struct kdbus_node node;
+	unsigned int access;
+	struct mutex lock;
+	atomic64_t bus_seq_last;
+	atomic64_t msg_seq_last;
+	DECLARE_HASHTABLE(user_hash, 6);
+	struct idr user_idr;
+	struct user_namespace *user_namespace;
+	struct dentry *dentry;
+};
+
+/**
+ * struct kdbus_domain_user - resource accounting for users
+ * @kref:		Reference counter
+ * @domain:		Domain of the user
+ * @hentry:		Entry in domain user map
+ * @idr:		Smallest possible index number of all users
+ * @uid:		UID of the user
+ * @buses:		Number of buses the user has created
+ * @connections:	Number of connections the user has created
+ */
+struct kdbus_domain_user {
+	struct kref kref;
+	struct kdbus_domain *domain;
+	struct hlist_node hentry;
+	unsigned int idr;
+	kuid_t uid;
+	atomic_t buses;
+	atomic_t connections;
+};
+
+#define kdbus_domain_from_node(_node) \
+	container_of((_node), struct kdbus_domain, node)
+
+struct kdbus_domain *kdbus_domain_new(unsigned int access);
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
+int kdbus_domain_activate(struct kdbus_domain *domain);
+void kdbus_domain_deactivate(struct kdbus_domain *domain);
+
+struct kdbus_domain_user *kdbus_domain_get_user(struct kdbus_domain *domain,
+						kuid_t uid);
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u);
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u);
+
+#endif
diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
new file mode 100644
index 000000000000..bf37e7574ede
--- /dev/null
+++ b/ipc/kdbus/endpoint.c
@@ -0,0 +1,232 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "message.h"
+#include "policy.h"
+
+static void kdbus_ep_free(struct kdbus_node *node)
+{
+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+	WARN_ON(!list_empty(&ep->conn_list));
+
+	kdbus_policy_db_clear(&ep->policy_db);
+	kdbus_bus_unref(ep->bus);
+	kdbus_domain_user_unref(ep->user);
+	kfree(ep);
+}
+
+static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
+{
+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+	/* disconnect all connections to this endpoint */
+	for (;;) {
+		struct kdbus_conn *conn;
+
+		mutex_lock(&ep->lock);
+		conn = list_first_entry_or_null(&ep->conn_list,
+						struct kdbus_conn,
+						ep_entry);
+		if (!conn) {
+			mutex_unlock(&ep->lock);
+			break;
+		}
+
+		/* take reference, release lock, disconnect without lock */
+		kdbus_conn_ref(conn);
+		mutex_unlock(&ep->lock);
+
+		kdbus_conn_disconnect(conn, false);
+		kdbus_conn_unref(conn);
+	}
+}
+
+/**
+ * kdbus_ep_new() - create a new endpoint
+ * @bus:		The bus this endpoint will be created for
+ * @name:		The name of the endpoint
+ * @access:		The access flags for this node (KDBUS_MAKE_ACCESS_*)
+ * @uid:		The uid of the node
+ * @gid:		The gid of the node
+ * @is_custom:		Whether this is a custom endpoint
+ *
+ * This function will create a new enpoint with the given
+ * name and properties for a given bus.
+ *
+ * Return: a new kdbus_ep on success, ERR_PTR on failure.
+ */
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+			      unsigned int access, kuid_t uid, kgid_t gid,
+			      bool is_custom)
+{
+	struct kdbus_ep *e;
+	int ret;
+
+	/*
+	 * Validate only custom endpoints names, default endpoints
+	 * with a "bus" name are created when the bus is created
+	 */
+	if (is_custom) {
+		ret = kdbus_verify_uid_prefix(name,
+					      bus->domain->user_namespace,
+					      uid);
+		if (ret < 0)
+			return ERR_PTR(ret);
+	}
+
+	e = kzalloc(sizeof(*e), GFP_KERNEL);
+	if (!e)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
+
+	e->node.free_cb = kdbus_ep_free;
+	e->node.release_cb = kdbus_ep_release;
+	e->node.uid = uid;
+	e->node.gid = gid;
+	e->node.mode = S_IRUSR | S_IWUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		e->node.mode |= S_IRGRP | S_IWGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		e->node.mode |= S_IROTH | S_IWOTH;
+
+	mutex_init(&e->lock);
+	INIT_LIST_HEAD(&e->conn_list);
+	kdbus_policy_db_init(&e->policy_db);
+	e->has_policy = is_custom;
+	e->bus = kdbus_bus_ref(bus);
+	e->id = atomic64_inc_return(&bus->ep_seq_last);
+
+	ret = kdbus_node_link(&e->node, &bus->node, name);
+	if (ret < 0)
+		goto exit_unref;
+
+	/*
+	 * Transactions on custom endpoints are never accounted on the global
+	 * user limits. Instead, for each custom endpoint, we create a custom,
+	 * unique user, which all transactions are accounted on. Regardless of
+	 * the user using that endpoint, it is always accounted on the same
+	 * user-object. This budget is not shared with ordniary users on
+	 * non-custom endpoints.
+	 */
+	if (is_custom) {
+		e->user = kdbus_domain_get_user(bus->domain, INVALID_UID);
+		if (IS_ERR(e->user)) {
+			ret = PTR_ERR(e->user);
+			e->user = NULL;
+			goto exit_unref;
+		}
+	}
+
+	return e;
+
+exit_unref:
+	kdbus_node_deactivate(&e->node);
+	kdbus_node_unref(&e->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
+ * @ep:			The endpoint to reference
+ *
+ * Every user of an endpoint, except for its creator, must add a reference to
+ * the kdbus_ep instance using this function.
+ *
+ * Return: the ep itself
+ */
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
+{
+	if (ep)
+		kdbus_node_ref(&ep->node);
+	return ep;
+}
+
+/**
+ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
+ * @ep:		The ep to unref
+ *
+ * Release a reference. If the reference count drops to 0, the ep will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
+{
+	if (ep)
+		kdbus_node_unref(&ep->node);
+	return NULL;
+}
+
+/**
+ * kdbus_ep_activate() - Activatean endpoint
+ * @ep:			Endpoint
+ *
+ * Return: 0 on success, negative error otherwise.
+ */
+int kdbus_ep_activate(struct kdbus_ep *ep)
+{
+	/*
+	 * kdbus_ep_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&ep->node))
+		return -ESHUTDOWN;
+
+	return 0;
+}
+
+/**
+ * kdbus_ep_deactivate() - invalidate an endpoint
+ * @ep:			Endpoint
+ */
+void kdbus_ep_deactivate(struct kdbus_ep *ep)
+{
+	kdbus_node_deactivate(&ep->node);
+}
+
+/**
+ * kdbus_ep_policy_set() - set policy for an endpoint
+ * @ep:			The endpoint
+ * @items:		The kdbus items containing policy information
+ * @items_size:		The total length of the items
+ *
+ * Only the endpoint owner should be able to call this function.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+			const struct kdbus_item *items,
+			size_t items_size)
+{
+	return kdbus_policy_set(&ep->policy_db, items, items_size, 0, true, ep);
+}
diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
new file mode 100644
index 000000000000..be395c9c64dc
--- /dev/null
+++ b/ipc/kdbus/endpoint.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ENDPOINT_H
+#define __KDBUS_ENDPOINT_H
+
+#include "limits.h"
+#include "names.h"
+#include "node.h"
+#include "policy.h"
+#include "util.h"
+
+struct kdbus_kmsg;
+
+/**
+ * struct kdbus_ep - enpoint to access a bus
+ * @node:		The kdbus node
+ * @bus:		Bus behind this endpoint
+ * @id:			ID of this endpoint on the bus
+ * @conn_list:		Connections of this endpoint
+ * @lock:		Endpoint data lock
+ * @user:		Custom enpoints account against an anonymous user
+ * @policy_db:		Uploaded policy
+ * @has_policy:		The policy-db is valid and should be used
+ *
+ * An enpoint offers access to a bus; the default endpoint node name is "bus".
+ * Additional custom endpoints to the same bus can be created and they can
+ * carry their own policies/filters.
+ */
+struct kdbus_ep {
+	struct kdbus_node node;
+	struct kdbus_bus *bus;
+	u64 id;
+	struct list_head conn_list;
+	struct mutex lock;
+	struct kdbus_domain_user *user;
+	struct kdbus_policy_db policy_db;
+
+	bool has_policy:1;
+};
+
+#define kdbus_ep_from_node(_node) \
+	container_of((_node), struct kdbus_ep, node)
+
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+			      unsigned int access, kuid_t uid, kgid_t gid,
+			      bool policy);
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
+int kdbus_ep_activate(struct kdbus_ep *ep);
+void kdbus_ep_deactivate(struct kdbus_ep *ep);
+
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+			const struct kdbus_item *items,
+			size_t items_size);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 09/13] kdbus: add code for buses, domains and endpoints
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

Add the logic to handle the following entities:

Domain:
  A domain is an unamed object containing a number of buses. A
  domain is automatically created when an instance of kdbusfs
  is mounted, and destroyed when it is unmounted.
  Every domain offers its own "control" device node to create
  buses.  Domains have no connection to each other and cannot
  see nor talk to each other.

Bus:
  A bus is a named object inside a domain. Clients exchange messages
  over a bus. Multiple buses themselves have no connection to each
  other; messages can only be exchanged on the same bus. The default
  entry point to a bus, where clients establish the connection to, is
  the "bus" device node /sys/fs/kdbus/<bus name>/bus.  Common operating
  system setups create one "system bus" per system, and one "user
  bus" for every logged-in user. Applications or services may create
  their own private named buses.

Endpoint:
  An endpoint provides the device node to talk to a bus. Opening an
  endpoint creates a new connection to the bus to which the endpoint
  belongs. Every bus has a default endpoint called "bus". A bus can
  optionally offer additional endpoints with custom names to provide
  a restricted access to the same bus. Custom endpoints carry
  additional policy which can be used to give sandboxed processes
  only a locked-down, limited, filtered access to the same bus.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
 ipc/kdbus/bus.c      | 553 +++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/bus.h      | 103 ++++++++++
 ipc/kdbus/domain.c   | 350 ++++++++++++++++++++++++++++++++
 ipc/kdbus/domain.h   |  84 ++++++++
 ipc/kdbus/endpoint.c | 232 +++++++++++++++++++++
 ipc/kdbus/endpoint.h |  68 +++++++
 6 files changed, 1390 insertions(+)
 create mode 100644 ipc/kdbus/bus.c
 create mode 100644 ipc/kdbus/bus.h
 create mode 100644 ipc/kdbus/domain.c
 create mode 100644 ipc/kdbus/domain.h
 create mode 100644 ipc/kdbus/endpoint.c
 create mode 100644 ipc/kdbus/endpoint.h

diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
new file mode 100644
index 000000000000..ffcde832116c
--- /dev/null
+++ b/ipc/kdbus/bus.c
@@ -0,0 +1,553 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "notify.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "policy.h"
+#include "util.h"
+
+static void kdbus_bus_free(struct kdbus_node *node)
+{
+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+	WARN_ON(!list_empty(&bus->monitors_list));
+	WARN_ON(!hash_empty(bus->conn_hash));
+
+	kdbus_notify_free(bus);
+
+	kdbus_domain_user_unref(bus->creator);
+	kdbus_name_registry_free(bus->name_registry);
+	kdbus_domain_unref(bus->domain);
+	kdbus_policy_db_clear(&bus->policy_db);
+	kdbus_meta_proc_unref(bus->creator_meta);
+	kfree(bus);
+}
+
+static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
+{
+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
+
+	if (was_active)
+		atomic_dec(&bus->creator->buses);
+}
+
+/**
+ * kdbus_bus_new() - create a kdbus_cmd_make from user-supplied data
+ * @domain:		The domain to work on
+ * @make:		Information as passed in by userspace
+ * @uid:		The uid of the bus node
+ * @gid:		The gid of the bus node
+ *
+ * This function is part of the connection ioctl() interface and will parse
+ * the user-supplied data in order to create a new kdbus_bus.
+ *
+ * Return: the new bus on success, ERR_PTR on failure.
+ */
+struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
+				const struct kdbus_cmd_make *make,
+				kuid_t uid, kgid_t gid)
+{
+	const struct kdbus_bloom_parameter *bloom = NULL;
+	const u64 *pattach_owner = NULL;
+	const u64 *pattach_recv = NULL;
+	const struct kdbus_item *item;
+	const char *name = NULL;
+	struct kdbus_bus *b;
+	u64 attach_owner;
+	u64 attach_recv;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, make->items, KDBUS_ITEMS_SIZE(make, items)) {
+		switch (item->type) {
+		case KDBUS_ITEM_MAKE_NAME:
+			if (name)
+				return ERR_PTR(-EEXIST);
+
+			name = item->str;
+			break;
+
+		case KDBUS_ITEM_BLOOM_PARAMETER:
+			if (bloom)
+				return ERR_PTR(-EEXIST);
+
+			bloom = &item->bloom_parameter;
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_SEND:
+			if (pattach_owner)
+				return ERR_PTR(-EEXIST);
+
+			pattach_owner = &item->data64[0];
+			break;
+
+		case KDBUS_ITEM_ATTACH_FLAGS_RECV:
+			if (pattach_recv)
+				return ERR_PTR(-EEXIST);
+
+			pattach_recv = &item->data64[0];
+			break;
+
+		default:
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	if (!name || !bloom)
+		return ERR_PTR(-EBADMSG);
+
+	if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE)
+		return ERR_PTR(-EINVAL);
+	if (!KDBUS_IS_ALIGNED8(bloom->size))
+		return ERR_PTR(-EINVAL);
+	if (bloom->n_hash < 1)
+		return ERR_PTR(-EINVAL);
+
+	ret = kdbus_sanitize_attach_flags(pattach_recv ? *pattach_recv : 0,
+					  &attach_recv);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
+					  &attach_owner);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	b = kzalloc(sizeof(*b), GFP_KERNEL);
+	if (!b)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&b->node, KDBUS_NODE_BUS);
+
+	b->node.free_cb = kdbus_bus_free;
+	b->node.release_cb = kdbus_bus_release;
+	b->node.uid = uid;
+	b->node.gid = gid;
+	b->node.mode = S_IRUSR | S_IXUSR;
+
+	b->access = make->flags & (KDBUS_MAKE_ACCESS_WORLD |
+				   KDBUS_MAKE_ACCESS_GROUP);
+	if (b->access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		b->node.mode |= S_IRGRP | S_IXGRP;
+	if (b->access & KDBUS_MAKE_ACCESS_WORLD)
+		b->node.mode |= S_IROTH | S_IXOTH;
+
+	b->bus_flags = make->flags;
+	b->bloom = *bloom;
+	b->attach_flags_req = attach_recv;
+	b->attach_flags_owner = attach_owner;
+	mutex_init(&b->lock);
+	init_rwsem(&b->conn_rwlock);
+	hash_init(b->conn_hash);
+	INIT_LIST_HEAD(&b->monitors_list);
+	INIT_LIST_HEAD(&b->notify_list);
+	spin_lock_init(&b->notify_lock);
+	mutex_init(&b->notify_flush_lock);
+	atomic64_set(&b->conn_seq_last, 0);
+	b->domain = kdbus_domain_ref(domain);
+	kdbus_policy_db_init(&b->policy_db);
+	b->id = atomic64_inc_return(&domain->bus_seq_last);
+
+	/* generate unique bus id */
+	generate_random_uuid(b->id128);
+
+	ret = kdbus_node_link(&b->node, &domain->node, name);
+	if (ret < 0)
+		goto exit_unref;
+
+	/* cache the metadata/credentials of the creator */
+	b->creator_meta = kdbus_meta_proc_new();
+	if (IS_ERR(b->creator_meta)) {
+		ret = PTR_ERR(b->creator_meta);
+		b->creator_meta = NULL;
+		goto exit_unref;
+	}
+
+	ret = kdbus_meta_proc_collect(b->creator_meta,
+				      KDBUS_ATTACH_CREDS |
+				      KDBUS_ATTACH_PIDS |
+				      KDBUS_ATTACH_AUXGROUPS |
+				      KDBUS_ATTACH_TID_COMM |
+				      KDBUS_ATTACH_PID_COMM |
+				      KDBUS_ATTACH_EXE |
+				      KDBUS_ATTACH_CMDLINE |
+				      KDBUS_ATTACH_CGROUP |
+				      KDBUS_ATTACH_CAPS |
+				      KDBUS_ATTACH_SECLABEL |
+				      KDBUS_ATTACH_AUDIT);
+	if (ret < 0)
+		goto exit_unref;
+
+	b->name_registry = kdbus_name_registry_new();
+	if (IS_ERR(b->name_registry)) {
+		ret = PTR_ERR(b->name_registry);
+		b->name_registry = NULL;
+		goto exit_unref;
+	}
+
+	/*
+	 * Bus-limits of the creator are accounted on its real UID, just like
+	 * all other per-user limits.
+	 */
+	b->creator = kdbus_domain_get_user(domain, current_uid());
+	if (IS_ERR(b->creator)) {
+		ret = PTR_ERR(b->creator);
+		b->creator = NULL;
+		goto exit_unref;
+	}
+
+	return b;
+
+exit_unref:
+	kdbus_node_deactivate(&b->node);
+	kdbus_node_unref(&b->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
+ * @bus:		The bus to reference
+ *
+ * Every user of a bus, except for its creator, must add a reference to the
+ * kdbus_bus using this function.
+ *
+ * Return: the bus itself
+ */
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
+{
+	if (bus)
+		kdbus_node_ref(&bus->node);
+	return bus;
+}
+
+/**
+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
+ * @bus:		The bus to unref
+ *
+ * Release a reference. If the reference count drops to 0, the bus will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
+{
+	if (bus)
+		kdbus_node_unref(&bus->node);
+	return NULL;
+}
+
+/**
+ * kdbus_bus_activate() - activate a bus
+ * @bus:		Bus
+ *
+ * Activate a bus and make it available to user-space.
+ *
+ * Returns: 0 on success, negative error code on failure
+ */
+int kdbus_bus_activate(struct kdbus_bus *bus)
+{
+	struct kdbus_ep *ep;
+	int ret;
+
+	if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
+		atomic_dec(&bus->creator->buses);
+		return -EMFILE;
+	}
+
+	/*
+	 * kdbus_bus_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&bus->node)) {
+		atomic_dec(&bus->creator->buses);
+		return -ESHUTDOWN;
+	}
+
+	/*
+	 * Create a new default endpoint for this bus. If activation succeeds,
+	 * we drop our own reference, effectively causing the endpoint to be
+	 * deactivated and released when the parent domain is.
+	 */
+	ep = kdbus_ep_new(bus, "bus", bus->access,
+			  bus->node.uid, bus->node.gid, false);
+	if (IS_ERR(ep))
+		return PTR_ERR(ep);
+
+	ret = kdbus_ep_activate(ep);
+	if (ret < 0)
+		kdbus_ep_deactivate(ep);
+	kdbus_ep_unref(ep);
+
+	return 0;
+}
+
+/**
+ * kdbus_bus_deactivate() - deactivate a bus
+ * @bus:               The kdbus reference
+ *
+ * The passed bus will be disconnected and the associated endpoint will be
+ * unref'ed.
+ */
+void kdbus_bus_deactivate(struct kdbus_bus *bus)
+{
+	kdbus_node_deactivate(&bus->node);
+}
+
+/**
+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
+ * @bus:		The bus to look for the connection
+ * @id:			The 64-bit connection id
+ *
+ * Looks up a connection with a given id. The returned connection
+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
+ * the connection can't be found.
+ */
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
+{
+	struct kdbus_conn *conn, *found = NULL;
+
+	down_read(&bus->conn_rwlock);
+	hash_for_each_possible(bus->conn_hash, conn, hentry, id)
+		if (conn->id == id) {
+			found = kdbus_conn_ref(conn);
+			break;
+		}
+	up_read(&bus->conn_rwlock);
+
+	return found;
+}
+
+/**
+ * kdbus_bus_broadcast() - send a message to all subscribed connections
+ * @bus:	The bus the connections are connected to
+ * @conn_src:	The source connection, may be %NULL for kernel notifications
+ * @kmsg:	The message to send.
+ *
+ * Send @kmsg to all connections that are currently active on the bus.
+ * Connections must still have matches installed in order to let the message
+ * pass.
+ */
+void kdbus_bus_broadcast(struct kdbus_bus *bus,
+			 struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_conn *conn_dst;
+	unsigned int i;
+	int ret;
+
+	/*
+	 * Make sure broadcast are queued on monitors before we send it out to
+	 * anyone else. Otherwise, connections might react to broadcasts before
+	 * the monitor gets the broadcast queued. In the worst case, the
+	 * monitor sees a reaction to the broadcast before the broadcast itself.
+	 * We don't give ordering guarantees across connections (and monitors
+	 * can re-construct order via sequence numbers), but we should at least
+	 * try to avoid re-ordering for monitors.
+	 */
+	kdbus_bus_eavesdrop(bus, conn_src, kmsg);
+
+	down_read(&bus->conn_rwlock);
+
+	hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
+		if (conn_dst->id == kmsg->msg.src_id)
+			continue;
+		if (!kdbus_conn_is_ordinary(conn_dst))
+			continue;
+
+		/*
+		 * Check if there is a match for the kmsg object in
+		 * the destination connection match db
+		 */
+		if (!kdbus_match_db_match_kmsg(conn_dst->match_db, conn_src,
+					       kmsg))
+			continue;
+
+		if (conn_src) {
+			u64 attach_flags;
+
+			/*
+			 * Anyone can send broadcasts, as they have no
+			 * destination. But a receiver needs TALK access to
+			 * the sender in order to receive broadcasts.
+			 */
+			if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
+				continue;
+
+			attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+								    conn_dst);
+
+			/*
+			 * Keep sending messages even if we cannot acquire the
+			 * requested metadata. It's up to the receiver to drop
+			 * messages that lack expected metadata.
+			 */
+			if (!conn_src->faked_meta)
+				kdbus_meta_proc_collect(kmsg->proc_meta,
+							attach_flags);
+			kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+						attach_flags);
+		} else {
+			/*
+			 * Check if there is a policy db that prevents the
+			 * destination connection from receiving this kernel
+			 * notification
+			 */
+			if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
+								kmsg))
+				continue;
+		}
+
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+		if (ret < 0)
+			atomic_inc(&conn_dst->lost_count);
+	}
+
+	up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
+ * @bus:	The bus the monitors are connected to
+ * @conn_src:	The source connection, may be %NULL for kernel notifications
+ * @kmsg:	The message to send.
+ *
+ * Send @kmsg to all monitors that are currently active on the bus. Monitors
+ * must still have matches installed in order to let the message pass.
+ */
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
+			 struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg)
+{
+	struct kdbus_conn *conn_dst;
+	int ret;
+
+	/*
+	 * Monitor connections get all messages; ignore possible errors
+	 * when sending messages to monitor connections.
+	 */
+
+	down_read(&bus->conn_rwlock);
+	list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
+		/*
+		 * Collect metadata requested by the destination connection.
+		 * Ignore errors, as receivers need to check metadata
+		 * availability, anyway. So it's still better to send messages
+		 * that lack data, than to skip it entirely.
+		 */
+		if (conn_src) {
+			u64 attach_flags;
+
+			attach_flags = kdbus_meta_calc_attach_flags(conn_src,
+								    conn_dst);
+			if (!conn_src->faked_meta)
+				kdbus_meta_proc_collect(kmsg->proc_meta,
+							attach_flags);
+			kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src,
+						attach_flags);
+		}
+
+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, kmsg, NULL);
+		if (ret < 0)
+			atomic_inc(&conn_dst->lost_count);
+	}
+	up_read(&bus->conn_rwlock);
+}
+
+/**
+ * kdbus_cmd_bus_creator_info() - get information on a bus creator
+ * @conn:	The querying connection
+ * @cmd_info:	The command buffer, as passed in from the ioctl
+ *
+ * Gather information on the creator of the bus @conn is connected to.
+ *
+ * Return: 0 on success, error otherwise.
+ */
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+			       struct kdbus_cmd_info *cmd_info)
+{
+	struct kdbus_bus *bus = conn->ep->bus;
+	struct kdbus_pool_slice *slice = NULL;
+	struct kdbus_item_header item_hdr;
+	struct kdbus_item *meta_items;
+	struct kdbus_info info = {};
+	size_t meta_size, name_len;
+	struct kvec kvec[5];
+	u64 attach_flags;
+	size_t cnt = 0;
+	int ret;
+
+	info.id = bus->id;
+	info.flags = bus->bus_flags;
+
+	name_len = strlen(bus->node.name) + 1;
+
+	/* mask out what information the bus owner wants to pass us */
+	attach_flags = cmd_info->flags & bus->attach_flags_owner;
+
+	meta_items = kdbus_meta_export(bus->creator_meta, NULL, attach_flags,
+				       &meta_size);
+	if (IS_ERR(meta_items))
+		return PTR_ERR(meta_items);
+
+	item_hdr.type = KDBUS_ITEM_MAKE_NAME;
+	item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
+
+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
+	kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &info.size);
+	kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &info.size);
+	cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
+
+	if (meta_items && meta_size)
+		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &info.size);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, info.size, kvec, NULL, cnt);
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit;
+	}
+
+	/* write back the offset */
+	kdbus_pool_slice_publish(slice, &cmd_info->offset,
+				 &cmd_info->info_size);
+	ret = 0;
+
+	kdbus_pool_slice_release(slice);
+exit:
+	kfree(meta_items);
+	return ret;
+}
diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
new file mode 100644
index 000000000000..3fa57373165c
--- /dev/null
+++ b/ipc/kdbus/bus.h
@@ -0,0 +1,103 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_BUS_H
+#define __KDBUS_BUS_H
+
+#include <linux/hashtable.h>
+#include <linux/spinlock.h>
+#include <linux/rwsem.h>
+
+#include "node.h"
+#include "policy.h"
+#include "util.h"
+
+/**
+ * struct kdbus_bus - bus in a domain
+ * @node:		kdbus_node
+ * @domain:		Domain of this bus
+ * @id:			ID of this bus in the domain
+ * @lock:		Bus data lock
+ * @access:		The access flags for the bus directory
+ * @ep_seq_last:	Last used endpoint id sequence number
+ * @conn_seq_last:	Last used connection id sequence number
+ * @bus_flags:		Simple pass-through flags from userspace to userspace
+ * @attach_flags_req:	KDBUS_ATTACH_* flags required by connecting peers
+ * @attach_flags_owner:	KDBUS_ATTACH_* flags of bus creator that other
+ *			connections can see or query
+ * @name_registry:	Name registry of this bus
+ * @bloom:		Bloom parameters
+ * @id128:		Unique random 128 bit ID of this bus
+ * @creator:		Creator of the bus
+ * @policy_db:		Policy database for this bus
+ * @notify_list:	List of pending kernel-generated messages
+ * @notify_lock:	Notification list lock
+ * @notify_flush_lock:	Notification flushing lock
+ * @conn_rwlock:	Read/Write lock for all lists of child connections
+ * @conn_hash:		Map of connection IDs
+ * @monitors_list:	Connections that monitor this bus
+ * @meta_proc:		Meta information about the bus creator
+ *
+ * A bus provides a "bus" endpoint node.
+ *
+ * A bus is created by opening the control node and issuing the
+ * KDBUS_CMD_BUS_MAKE iotcl. Closing this file immediately destroys
+ * the bus.
+ */
+struct kdbus_bus {
+	struct kdbus_node node;
+	struct kdbus_domain *domain;
+	u64 id;
+	struct mutex lock;
+	unsigned int access;
+	atomic64_t ep_seq_last;
+	atomic64_t conn_seq_last;
+	u64 bus_flags;
+	u64 attach_flags_req;
+	u64 attach_flags_owner;
+	struct kdbus_name_registry *name_registry;
+	struct kdbus_bloom_parameter bloom;
+	u8 id128[16];
+	struct kdbus_domain_user *creator;
+	struct kdbus_policy_db policy_db;
+	struct list_head notify_list;
+	spinlock_t notify_lock;
+	struct mutex notify_flush_lock;
+
+	struct rw_semaphore conn_rwlock;
+	DECLARE_HASHTABLE(conn_hash, 8);
+	struct list_head monitors_list;
+
+	struct kdbus_meta_proc *creator_meta;
+};
+
+struct kdbus_kmsg;
+
+struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
+				const struct kdbus_cmd_make *make,
+				kuid_t uid, kgid_t gid);
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
+int kdbus_bus_activate(struct kdbus_bus *bus);
+void kdbus_bus_deactivate(struct kdbus_bus *bus);
+
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+			       struct kdbus_cmd_info *cmd_info);
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
+void kdbus_bus_broadcast(struct kdbus_bus *bus, struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg);
+void kdbus_bus_eavesdrop(struct kdbus_bus *bus, struct kdbus_conn *conn_src,
+			 struct kdbus_kmsg *kmsg);
+
+#endif
diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
new file mode 100644
index 000000000000..81d1fb52b73c
--- /dev/null
+++ b/ipc/kdbus/domain.c
@@ -0,0 +1,350 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "handle.h"
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+static void kdbus_domain_control_free(struct kdbus_node *node)
+{
+	kfree(node);
+}
+
+static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
+						   unsigned int access)
+{
+	struct kdbus_node *node;
+	int ret;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(node, KDBUS_NODE_CONTROL);
+
+	node->free_cb = kdbus_domain_control_free;
+	node->mode = domain->node.mode;
+	node->mode = S_IRUSR | S_IWUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		node->mode |= S_IRGRP | S_IWGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		node->mode |= S_IROTH | S_IWOTH;
+
+	ret = kdbus_node_link(node, &domain->node, "control");
+	if (ret < 0)
+		goto exit_free;
+
+	return node;
+
+exit_free:
+	kdbus_node_deactivate(node);
+	kdbus_node_unref(node);
+	return ERR_PTR(ret);
+}
+
+static void kdbus_domain_free(struct kdbus_node *node)
+{
+	struct kdbus_domain *domain =
+		container_of(node, struct kdbus_domain, node);
+
+	WARN_ON(!hash_empty(domain->user_hash));
+
+	put_user_ns(domain->user_namespace);
+	idr_destroy(&domain->user_idr);
+	kfree(domain);
+}
+
+/**
+ * kdbus_domain_new() - create a new domain
+ * @access:		The access mode for this node (KDBUS_MAKE_ACCESS_*)
+ *
+ * Return: a new kdbus_domain on success, ERR_PTR on failure
+ */
+struct kdbus_domain *kdbus_domain_new(unsigned int access)
+{
+	struct kdbus_domain *d;
+	int ret;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
+
+	d->node.free_cb = kdbus_domain_free;
+	d->node.mode = S_IRUSR | S_IXUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		d->node.mode |= S_IRGRP | S_IXGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		d->node.mode |= S_IROTH | S_IXOTH;
+
+	d->access = access;
+	mutex_init(&d->lock);
+	atomic64_set(&d->msg_seq_last, 0);
+	idr_init(&d->user_idr);
+
+	/* Pin user namespace so we can guarantee domain-unique bus * names. */
+	d->user_namespace = get_user_ns(current_user_ns());
+
+	ret = kdbus_node_link(&d->node, NULL, NULL);
+	if (ret < 0)
+		goto exit_unref;
+
+	return d;
+
+exit_unref:
+	kdbus_node_deactivate(&d->node);
+	kdbus_node_unref(&d->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_domain_ref() - take a domain reference
+ * @domain:		Domain
+ *
+ * Return: the domain itself
+ */
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
+{
+	if (domain)
+		kdbus_node_ref(&domain->node);
+	return domain;
+}
+
+/**
+ * kdbus_domain_unref() - drop a domain reference
+ * @domain:		Domain
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
+{
+	if (domain)
+		kdbus_node_unref(&domain->node);
+	return NULL;
+}
+
+/**
+ * kdbus_domain_activate() - activate a domain
+ * @domain:		Domain
+ *
+ * Activate a domain so it will be visible to user-space and can be accessed
+ * by external entities.
+ *
+ * Returns: 0 on success, negative error-code on failure
+ */
+int kdbus_domain_activate(struct kdbus_domain *domain)
+{
+	struct kdbus_node *control;
+
+	/*
+	 * kdbus_domain_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&domain->node))
+		return -ESHUTDOWN;
+
+	/*
+	 * Create a control-node for this domain. We drop our own reference
+	 * immediately, effectively causing the node to be deactivated and
+	 * released when the parent domain is.
+	 */
+	control = kdbus_domain_control_new(domain, domain->access);
+	if (IS_ERR(control))
+		return PTR_ERR(control);
+
+	kdbus_node_activate(control);
+	kdbus_node_unref(control);
+
+	return 0;
+}
+
+/**
+ * kdbus_domain_deactivate() - invalidate a domain
+ * @domain:		Domain
+ */
+void kdbus_domain_deactivate(struct kdbus_domain *domain)
+{
+	kdbus_node_deactivate(&domain->node);
+}
+
+/**
+ * kdbus_domain_user_assign_id() - allocate ID and assign it to the
+ *				   domain user
+ * @domain:		The domain of the user
+ * @user:		The kdbus_domain_user object of the user
+ *
+ * Returns 0 if ID in [0, INT_MAX] is successfully assigned to the
+ * domain user. Negative errno on failure.
+ *
+ * The user index is used in arrays for accounting user quota in
+ * receiver queues.
+ *
+ * Caller must have the domain lock held and must ensure that the
+ * domain was not disconnected.
+ */
+static int kdbus_domain_user_assign_id(struct kdbus_domain *domain,
+				       struct kdbus_domain_user *user)
+{
+	int ret;
+
+	/*
+	 * Allocate the smallest possible index for this user; used
+	 * in arrays for accounting user quota in receiver queues.
+	 */
+	ret = idr_alloc(&domain->user_idr, user, 0, 0, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+
+	user->idr = ret;
+
+	return 0;
+}
+
+/**
+ * kdbus_domain_get_user() - get a kdbus_domain_user object
+ * @domain:		The domain of the user
+ * @uid:		The uid of the user; INVALID_UID for an
+ *			anonymous user like a custom endpoint
+ *
+ * If there is a uid matching, then use the already accounted
+ * kdbus_domain_user, increment its reference counter and return it.
+ * Otherwise allocate a new one, link it into the domain and return it.
+ *
+ * Return: the accounted domain user on success, ERR_PTR on failure.
+ */
+struct kdbus_domain_user *kdbus_domain_get_user(struct kdbus_domain *domain,
+						kuid_t uid)
+{
+	struct kdbus_domain_user *tmp_user;
+	struct kdbus_domain_user *u = NULL;
+	int ret;
+
+	mutex_lock(&domain->lock);
+
+	/* find uid and reference it */
+	if (uid_valid(uid)) {
+		hash_for_each_possible(domain->user_hash, tmp_user,
+				       hentry, __kuid_val(uid)) {
+			if (!uid_eq(tmp_user->uid, uid))
+				continue;
+
+			/*
+			 * If the ref-count is already 0, the destructor is
+			 * about to unlink and destroy the object. Continue
+			 * looking for a next one or create one, if none found.
+			 */
+			if (kref_get_unless_zero(&tmp_user->kref)) {
+				mutex_unlock(&domain->lock);
+				return tmp_user;
+			}
+		}
+	}
+
+	u = kzalloc(sizeof(*u), GFP_KERNEL);
+	if (!u) {
+		ret = -ENOMEM;
+		goto exit_unlock;
+	}
+
+	kref_init(&u->kref);
+	u->domain = kdbus_domain_ref(domain);
+	u->uid = uid;
+	atomic_set(&u->buses, 0);
+	atomic_set(&u->connections, 0);
+
+	/* Assign user ID and link into domain */
+	ret = kdbus_domain_user_assign_id(domain, u);
+	if (ret < 0)
+		goto exit_free;
+
+	/* UID hash map */
+	hash_add(domain->user_hash, &u->hentry, __kuid_val(u->uid));
+
+	mutex_unlock(&domain->lock);
+	return u;
+
+exit_free:
+	kdbus_domain_unref(u->domain);
+	kfree(u);
+exit_unlock:
+	mutex_unlock(&domain->lock);
+	return ERR_PTR(ret);
+}
+
+static void __kdbus_domain_user_free(struct kref *kref)
+{
+	struct kdbus_domain_user *user =
+		container_of(kref, struct kdbus_domain_user, kref);
+
+	WARN_ON(atomic_read(&user->buses) > 0);
+	WARN_ON(atomic_read(&user->connections) > 0);
+
+	/*
+	 * Lookups ignore objects with a ref-count of 0. Therefore, we can
+	 * safely remove it from the table after dropping the last reference.
+	 * No-one will acquire a ref in parallel.
+	 */
+	mutex_lock(&user->domain->lock);
+	idr_remove(&user->domain->user_idr, user->idr);
+	hash_del(&user->hentry);
+	mutex_unlock(&user->domain->lock);
+
+	kdbus_domain_unref(user->domain);
+	kfree(user);
+}
+
+/**
+ * kdbus_domain_user_ref() - take a domain user reference
+ * @u:		User
+ *
+ * Return: the domain user itself
+ */
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u)
+{
+	kref_get(&u->kref);
+	return u;
+}
+
+/**
+ * kdbus_domain_user_unref() - drop a domain user reference
+ * @u:		User
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
+{
+	if (u)
+		kref_put(&u->kref, __kdbus_domain_user_free);
+	return NULL;
+}
diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
new file mode 100644
index 000000000000..0154d566eb2f
--- /dev/null
+++ b/ipc/kdbus/domain.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DOMAIN_H
+#define __KDBUS_DOMAIN_H
+
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/user_namespace.h>
+
+#include "node.h"
+
+/**
+ * struct kdbus_domain - domain for buses
+ * @node:		Underlying API node
+ * @access:		Access mode for this domain
+ * @lock:		Domain data lock
+ * @bus_seq_last:	Last used bus id sequence number
+ * @msg_seq_last:	Last used message id sequence number
+ * @user_hash:		Accounting of user resources
+ * @user_idr:		Map of all users; smallest possible index
+ * @user_namespace:	User namespace, pinned at creation time
+ * @dentry:		Root dentry of VFS mount (dont use outside of kdbusfs)
+ */
+struct kdbus_domain {
+	struct kdbus_node node;
+	unsigned int access;
+	struct mutex lock;
+	atomic64_t bus_seq_last;
+	atomic64_t msg_seq_last;
+	DECLARE_HASHTABLE(user_hash, 6);
+	struct idr user_idr;
+	struct user_namespace *user_namespace;
+	struct dentry *dentry;
+};
+
+/**
+ * struct kdbus_domain_user - resource accounting for users
+ * @kref:		Reference counter
+ * @domain:		Domain of the user
+ * @hentry:		Entry in domain user map
+ * @idr:		Smallest possible index number of all users
+ * @uid:		UID of the user
+ * @buses:		Number of buses the user has created
+ * @connections:	Number of connections the user has created
+ */
+struct kdbus_domain_user {
+	struct kref kref;
+	struct kdbus_domain *domain;
+	struct hlist_node hentry;
+	unsigned int idr;
+	kuid_t uid;
+	atomic_t buses;
+	atomic_t connections;
+};
+
+#define kdbus_domain_from_node(_node) \
+	container_of((_node), struct kdbus_domain, node)
+
+struct kdbus_domain *kdbus_domain_new(unsigned int access);
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
+int kdbus_domain_activate(struct kdbus_domain *domain);
+void kdbus_domain_deactivate(struct kdbus_domain *domain);
+
+struct kdbus_domain_user *kdbus_domain_get_user(struct kdbus_domain *domain,
+						kuid_t uid);
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u);
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u);
+
+#endif
diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
new file mode 100644
index 000000000000..bf37e7574ede
--- /dev/null
+++ b/ipc/kdbus/endpoint.c
@@ -0,0 +1,232 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "message.h"
+#include "policy.h"
+
+static void kdbus_ep_free(struct kdbus_node *node)
+{
+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+	WARN_ON(!list_empty(&ep->conn_list));
+
+	kdbus_policy_db_clear(&ep->policy_db);
+	kdbus_bus_unref(ep->bus);
+	kdbus_domain_user_unref(ep->user);
+	kfree(ep);
+}
+
+static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
+{
+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
+
+	/* disconnect all connections to this endpoint */
+	for (;;) {
+		struct kdbus_conn *conn;
+
+		mutex_lock(&ep->lock);
+		conn = list_first_entry_or_null(&ep->conn_list,
+						struct kdbus_conn,
+						ep_entry);
+		if (!conn) {
+			mutex_unlock(&ep->lock);
+			break;
+		}
+
+		/* take reference, release lock, disconnect without lock */
+		kdbus_conn_ref(conn);
+		mutex_unlock(&ep->lock);
+
+		kdbus_conn_disconnect(conn, false);
+		kdbus_conn_unref(conn);
+	}
+}
+
+/**
+ * kdbus_ep_new() - create a new endpoint
+ * @bus:		The bus this endpoint will be created for
+ * @name:		The name of the endpoint
+ * @access:		The access flags for this node (KDBUS_MAKE_ACCESS_*)
+ * @uid:		The uid of the node
+ * @gid:		The gid of the node
+ * @is_custom:		Whether this is a custom endpoint
+ *
+ * This function will create a new enpoint with the given
+ * name and properties for a given bus.
+ *
+ * Return: a new kdbus_ep on success, ERR_PTR on failure.
+ */
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+			      unsigned int access, kuid_t uid, kgid_t gid,
+			      bool is_custom)
+{
+	struct kdbus_ep *e;
+	int ret;
+
+	/*
+	 * Validate only custom endpoints names, default endpoints
+	 * with a "bus" name are created when the bus is created
+	 */
+	if (is_custom) {
+		ret = kdbus_verify_uid_prefix(name,
+					      bus->domain->user_namespace,
+					      uid);
+		if (ret < 0)
+			return ERR_PTR(ret);
+	}
+
+	e = kzalloc(sizeof(*e), GFP_KERNEL);
+	if (!e)
+		return ERR_PTR(-ENOMEM);
+
+	kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
+
+	e->node.free_cb = kdbus_ep_free;
+	e->node.release_cb = kdbus_ep_release;
+	e->node.uid = uid;
+	e->node.gid = gid;
+	e->node.mode = S_IRUSR | S_IWUSR;
+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
+		e->node.mode |= S_IRGRP | S_IWGRP;
+	if (access & KDBUS_MAKE_ACCESS_WORLD)
+		e->node.mode |= S_IROTH | S_IWOTH;
+
+	mutex_init(&e->lock);
+	INIT_LIST_HEAD(&e->conn_list);
+	kdbus_policy_db_init(&e->policy_db);
+	e->has_policy = is_custom;
+	e->bus = kdbus_bus_ref(bus);
+	e->id = atomic64_inc_return(&bus->ep_seq_last);
+
+	ret = kdbus_node_link(&e->node, &bus->node, name);
+	if (ret < 0)
+		goto exit_unref;
+
+	/*
+	 * Transactions on custom endpoints are never accounted on the global
+	 * user limits. Instead, for each custom endpoint, we create a custom,
+	 * unique user, which all transactions are accounted on. Regardless of
+	 * the user using that endpoint, it is always accounted on the same
+	 * user-object. This budget is not shared with ordniary users on
+	 * non-custom endpoints.
+	 */
+	if (is_custom) {
+		e->user = kdbus_domain_get_user(bus->domain, INVALID_UID);
+		if (IS_ERR(e->user)) {
+			ret = PTR_ERR(e->user);
+			e->user = NULL;
+			goto exit_unref;
+		}
+	}
+
+	return e;
+
+exit_unref:
+	kdbus_node_deactivate(&e->node);
+	kdbus_node_unref(&e->node);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
+ * @ep:			The endpoint to reference
+ *
+ * Every user of an endpoint, except for its creator, must add a reference to
+ * the kdbus_ep instance using this function.
+ *
+ * Return: the ep itself
+ */
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
+{
+	if (ep)
+		kdbus_node_ref(&ep->node);
+	return ep;
+}
+
+/**
+ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
+ * @ep:		The ep to unref
+ *
+ * Release a reference. If the reference count drops to 0, the ep will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
+{
+	if (ep)
+		kdbus_node_unref(&ep->node);
+	return NULL;
+}
+
+/**
+ * kdbus_ep_activate() - Activatean endpoint
+ * @ep:			Endpoint
+ *
+ * Return: 0 on success, negative error otherwise.
+ */
+int kdbus_ep_activate(struct kdbus_ep *ep)
+{
+	/*
+	 * kdbus_ep_activate() must not be called multiple times, so if
+	 * kdbus_node_activate() didn't activate the node, it must already be
+	 * dead.
+	 */
+	if (!kdbus_node_activate(&ep->node))
+		return -ESHUTDOWN;
+
+	return 0;
+}
+
+/**
+ * kdbus_ep_deactivate() - invalidate an endpoint
+ * @ep:			Endpoint
+ */
+void kdbus_ep_deactivate(struct kdbus_ep *ep)
+{
+	kdbus_node_deactivate(&ep->node);
+}
+
+/**
+ * kdbus_ep_policy_set() - set policy for an endpoint
+ * @ep:			The endpoint
+ * @items:		The kdbus items containing policy information
+ * @items_size:		The total length of the items
+ *
+ * Only the endpoint owner should be able to call this function.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+			const struct kdbus_item *items,
+			size_t items_size)
+{
+	return kdbus_policy_set(&ep->policy_db, items, items_size, 0, true, ep);
+}
diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
new file mode 100644
index 000000000000..be395c9c64dc
--- /dev/null
+++ b/ipc/kdbus/endpoint.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ENDPOINT_H
+#define __KDBUS_ENDPOINT_H
+
+#include "limits.h"
+#include "names.h"
+#include "node.h"
+#include "policy.h"
+#include "util.h"
+
+struct kdbus_kmsg;
+
+/**
+ * struct kdbus_ep - enpoint to access a bus
+ * @node:		The kdbus node
+ * @bus:		Bus behind this endpoint
+ * @id:			ID of this endpoint on the bus
+ * @conn_list:		Connections of this endpoint
+ * @lock:		Endpoint data lock
+ * @user:		Custom enpoints account against an anonymous user
+ * @policy_db:		Uploaded policy
+ * @has_policy:		The policy-db is valid and should be used
+ *
+ * An enpoint offers access to a bus; the default endpoint node name is "bus".
+ * Additional custom endpoints to the same bus can be created and they can
+ * carry their own policies/filters.
+ */
+struct kdbus_ep {
+	struct kdbus_node node;
+	struct kdbus_bus *bus;
+	u64 id;
+	struct list_head conn_list;
+	struct mutex lock;
+	struct kdbus_domain_user *user;
+	struct kdbus_policy_db policy_db;
+
+	bool has_policy:1;
+};
+
+#define kdbus_ep_from_node(_node) \
+	container_of((_node), struct kdbus_ep, node)
+
+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+			      unsigned int access, kuid_t uid, kgid_t gid,
+			      bool policy);
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
+int kdbus_ep_activate(struct kdbus_ep *ep);
+void kdbus_ep_deactivate(struct kdbus_ep *ep);
+
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+			const struct kdbus_item *items,
+			size_t items_size);
+
+#endif
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 10/13] kdbus: add name registry implementation
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds the name registry implementation.

Each bus instantiates a name registry to resolve well-known names
into unique connection IDs for message delivery. The registry will
be queried when a message is sent with kdbus_msg.dst_id set to
KDBUS_DST_ID_NAME, or when a registry dump is requested.

It's important to have this registry implemented in the kernel to
implement lookups and take-overs in a race-free way.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/names.c | 891 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/names.h |  82 +++++
 2 files changed, 973 insertions(+)
 create mode 100644 ipc/kdbus/names.c
 create mode 100644 ipc/kdbus/names.h

diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
new file mode 100644
index 000000000000..f8a69c249e17
--- /dev/null
+++ b/ipc/kdbus/names.c
@@ -0,0 +1,891 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uio.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "names.h"
+#include "notify.h"
+#include "policy.h"
+
+/**
+ * struct kdbus_name_queue_item - a queue item for a name
+ * @conn:		The associated connection
+ * @entry:		Name entry queuing up for
+ * @entry_entry:	List element for the list in @entry
+ * @conn_entry:		List element for the list in @conn
+ * @flags:		The queuing flags
+ */
+struct kdbus_name_queue_item {
+	struct kdbus_conn *conn;
+	struct kdbus_name_entry *entry;
+	struct list_head entry_entry;
+	struct list_head conn_entry;
+	u64 flags;
+};
+
+static void kdbus_name_entry_free(struct kdbus_name_entry *e)
+{
+	hash_del(&e->hentry);
+	kfree(e->name);
+	kfree(e);
+}
+
+/**
+ * kdbus_name_registry_free() - drop a name reg's reference
+ * @reg:		The name registry, may be %NULL
+ *
+ * Cleanup the name registry's internal structures.
+ */
+void kdbus_name_registry_free(struct kdbus_name_registry *reg)
+{
+	struct kdbus_name_entry *e;
+	struct hlist_node *tmp;
+	unsigned int i;
+
+	if (!reg)
+		return;
+
+	hash_for_each_safe(reg->entries_hash, i, tmp, e, hentry)
+		kdbus_name_entry_free(e);
+
+	kfree(reg);
+}
+
+/**
+ * kdbus_name_registry_new() - create a new name registry
+ *
+ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
+ */
+struct kdbus_name_registry *kdbus_name_registry_new(void)
+{
+	struct kdbus_name_registry *r;
+
+	r = kzalloc(sizeof(*r), GFP_KERNEL);
+	if (!r)
+		return ERR_PTR(-ENOMEM);
+
+	hash_init(r->entries_hash);
+	init_rwsem(&r->rwlock);
+
+	return r;
+}
+
+static struct kdbus_name_entry *
+kdbus_name_lookup(struct kdbus_name_registry *reg, u32 hash, const char *name)
+{
+	struct kdbus_name_entry *e;
+
+	hash_for_each_possible(reg->entries_hash, e, hentry, hash)
+		if (strcmp(e->name, name) == 0)
+			return e;
+
+	return NULL;
+}
+
+static void kdbus_name_queue_item_free(struct kdbus_name_queue_item *q)
+{
+	list_del(&q->entry_entry);
+	list_del(&q->conn_entry);
+	kfree(q);
+}
+
+/*
+ * The caller must hold the lock so we decrement the counter and
+ * delete the entry.
+ *
+ * The caller needs to hold its own reference, so the connection does not go
+ * away while the entry's reference is dropped under lock.
+ */
+static void kdbus_name_entry_remove_owner(struct kdbus_name_entry *e)
+{
+	if (WARN_ON(!e->conn))
+		return;
+
+	if (WARN_ON(!mutex_is_locked(&e->conn->lock)))
+		return;
+
+	atomic_dec(&e->conn->name_count);
+	list_del(&e->conn_entry);
+	e->conn = kdbus_conn_unref(e->conn);
+}
+
+static void kdbus_name_entry_set_owner(struct kdbus_name_entry *e,
+				       struct kdbus_conn *conn)
+{
+	if (WARN_ON(e->conn))
+		return;
+
+	if (WARN_ON(!mutex_is_locked(&conn->lock)))
+		return;
+
+	e->conn = kdbus_conn_ref(conn);
+	atomic_inc(&conn->name_count);
+	list_add_tail(&e->conn_entry, &e->conn->names_list);
+}
+
+static int kdbus_name_replace_owner(struct kdbus_name_entry *e,
+				    struct kdbus_conn *conn, u64 flags)
+{
+	struct kdbus_conn *conn_old = kdbus_conn_ref(e->conn);
+	int ret = 0;
+
+	if (WARN_ON(conn == conn_old))
+		return -EALREADY;
+
+	if (WARN_ON(!conn_old))
+		return -EINVAL;
+
+	kdbus_conn_lock2(conn, conn_old);
+
+	if (!kdbus_conn_active(conn)) {
+		ret = -ECONNRESET;
+		goto exit_unlock;
+	}
+
+	kdbus_notify_name_change(conn->ep->bus, KDBUS_ITEM_NAME_CHANGE,
+				 e->conn->id, conn->id,
+				 e->flags, flags, e->name);
+
+	/* hand over name ownership */
+	kdbus_name_entry_remove_owner(e);
+	kdbus_name_entry_set_owner(e, conn);
+	e->flags = flags;
+
+exit_unlock:
+	kdbus_conn_unlock2(conn, conn_old);
+	kdbus_conn_unref(conn_old);
+	return ret;
+}
+
+static int kdbus_name_entry_release(struct kdbus_name_entry *e)
+{
+	struct kdbus_conn *conn;
+	int ret;
+
+	/* give it to first active waiter in the queue */
+	while (!list_empty(&e->queue_list)) {
+		struct kdbus_name_queue_item *q;
+
+		q = list_first_entry(&e->queue_list,
+				     struct kdbus_name_queue_item,
+				     entry_entry);
+
+		ret = kdbus_name_replace_owner(e, q->conn, q->flags);
+		if (ret < 0)
+			continue;
+
+		kdbus_name_queue_item_free(q);
+		return 0;
+	}
+
+	/* hand it back to an active activator connection */
+	if (e->activator && e->activator != e->conn) {
+		u64 flags = KDBUS_NAME_ACTIVATOR;
+
+		/*
+		 * Move messages still queued in the old connection
+		 * and addressed to that name to the new connection.
+		 * This allows a race and loss-free name and message
+		 * takeover and exit-on-idle services.
+		 */
+		ret = kdbus_conn_move_messages(e->activator, e->conn,
+					       e->name_id);
+		if (ret < 0)
+			return ret;
+
+		return kdbus_name_replace_owner(e, e->activator, flags);
+	}
+
+	/* release the name */
+	kdbus_notify_name_change(e->conn->ep->bus, KDBUS_ITEM_NAME_REMOVE,
+				 e->conn->id, 0, e->flags, 0, e->name);
+
+	conn = kdbus_conn_ref(e->conn);
+	mutex_lock(&conn->lock);
+	kdbus_name_entry_remove_owner(e);
+	mutex_unlock(&conn->lock);
+	kdbus_conn_unref(conn);
+
+	kdbus_name_entry_free(e);
+
+	return 0;
+}
+
+static int kdbus_name_release(struct kdbus_name_registry *reg,
+			      struct kdbus_conn *conn,
+			      const char *name)
+{
+	struct kdbus_name_queue_item *tmp, *q;
+	struct kdbus_name_entry *e = NULL;
+	int ret = -ESRCH;
+	u32 hash;
+
+	hash = kdbus_strhash(name);
+
+	/* lock order: domain -> bus -> ep -> names -> connection */
+	mutex_lock(&conn->ep->bus->lock);
+	down_write(&reg->rwlock);
+
+	e = kdbus_name_lookup(reg, hash, name);
+	if (!e)
+		goto exit_unlock;
+
+	/* Is the connection already the real owner of the name? */
+	if (e->conn == conn) {
+		ret = kdbus_name_entry_release(e);
+	} else {
+		/*
+		 * Otherwise, walk the list of queued entries and search
+		 * for items for connection.
+		 */
+
+		list_for_each_entry_safe(q, tmp, &e->queue_list, entry_entry) {
+			if (q->conn != conn)
+				continue;
+
+			kdbus_name_queue_item_free(q);
+			ret = 0;
+			break;
+		}
+	}
+
+exit_unlock:
+	up_write(&reg->rwlock);
+	mutex_unlock(&conn->ep->bus->lock);
+
+	return ret;
+}
+
+/**
+ * kdbus_name_remove_by_conn() - remove all name entries of a given connection
+ * @reg:		The name registry
+ * @conn:		The connection which entries to remove
+ *
+ * This function removes all name entry held by a given connection.
+ */
+void kdbus_name_remove_by_conn(struct kdbus_name_registry *reg,
+			       struct kdbus_conn *conn)
+{
+	struct kdbus_name_queue_item *q_tmp, *q;
+	struct kdbus_conn *activator = NULL;
+	struct kdbus_name_entry *e_tmp, *e;
+	LIST_HEAD(names_queue_list);
+	LIST_HEAD(names_list);
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&conn->ep->bus->lock);
+	down_write(&reg->rwlock);
+
+	mutex_lock(&conn->lock);
+	list_splice_init(&conn->names_list, &names_list);
+	list_splice_init(&conn->names_queue_list, &names_queue_list);
+	mutex_unlock(&conn->lock);
+
+	if (kdbus_conn_is_activator(conn)) {
+		activator = conn->activator_of->activator;
+		conn->activator_of->activator = NULL;
+	}
+	list_for_each_entry_safe(q, q_tmp, &names_queue_list, conn_entry)
+		kdbus_name_queue_item_free(q);
+	list_for_each_entry_safe(e, e_tmp, &names_list, conn_entry)
+		kdbus_name_entry_release(e);
+
+	up_write(&reg->rwlock);
+	mutex_unlock(&conn->ep->bus->lock);
+
+	kdbus_conn_unref(activator);
+	kdbus_notify_flush(conn->ep->bus);
+}
+
+/**
+ * kdbus_name_lock() - look up a name in a name registry and lock it
+ * @reg:		The name registry
+ * @name:		The name to look up
+ *
+ * Search for a name in a given name registry and return it with the
+ * registry-lock held. If the object is not found, the lock is not acquired and
+ * NULL is returned. The caller is responsible of unlocking the name via
+ * kdbus_name_unlock() again. Note that kdbus_name_unlock() can be safely called
+ * with NULL as name. In this case, it's a no-op as nothing was locked.
+ *
+ * The *_lock() + *_unlock() logic is only required for callers that need to
+ * protect their code against concurrent activator/implementer name changes.
+ * Multiple readers can lock names concurrently. However, you may not change
+ * name-ownership while holding a name-lock.
+ *
+ * Return: NULL if name is unknown, otherwise return a pointer to the name
+ *         entry with the name-lock held (reader lock only).
+ */
+struct kdbus_name_entry *kdbus_name_lock(struct kdbus_name_registry *reg,
+					 const char *name)
+{
+	struct kdbus_name_entry *e = NULL;
+	u32 hash = kdbus_strhash(name);
+
+	down_read(&reg->rwlock);
+	e = kdbus_name_lookup(reg, hash, name);
+	if (e)
+		return e;
+	up_read(&reg->rwlock);
+
+	return NULL;
+}
+
+/**
+ * kdbus_name_unlock() - unlock one name in a name registry
+ * @reg:		The name registry
+ * @entry:		The locked name entry or NULL
+ *
+ * This is the unlock-counterpart of kdbus_name_lock(). It unlocks a name that
+ * was previously successfully locked. You can safely pass NULL as entry and
+ * this will become a no-op. Therefore, it's safe to always call this on the
+ * return-value of kdbus_name_lock().
+ *
+ * Return: This always returns NULL.
+ */
+struct kdbus_name_entry *kdbus_name_unlock(struct kdbus_name_registry *reg,
+					   struct kdbus_name_entry *entry)
+{
+	if (entry) {
+		BUG_ON(!rwsem_is_locked(&reg->rwlock));
+		up_read(&reg->rwlock);
+	}
+
+	return NULL;
+}
+
+static int kdbus_name_queue_conn(struct kdbus_conn *conn, u64 flags,
+				 struct kdbus_name_entry *e)
+{
+	struct kdbus_name_queue_item *q;
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+
+	q->conn = conn;
+	q->flags = flags;
+	q->entry = e;
+
+	list_add_tail(&q->entry_entry, &e->queue_list);
+	list_add_tail(&q->conn_entry, &conn->names_queue_list);
+
+	return 0;
+}
+
+/**
+ * kdbus_name_is_valid() - check if a name is valid
+ * @p:			The name to check
+ * @allow_wildcard:	Whether or not to allow a wildcard name
+ *
+ * A name is valid if all of the following criterias are met:
+ *
+ *  - The name has two or more elements separated by a period ('.') character.
+ *  - All elements must contain at least one character.
+ *  - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
+ *    and must not begin with a digit.
+ *  - The name must not exceed KDBUS_NAME_MAX_LEN.
+ *  - If @allow_wildcard is true, the name may end on '.*'
+ */
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
+{
+	bool dot, found_dot = false;
+	const char *q;
+
+	for (dot = true, q = p; *q; q++) {
+		if (*q == '.') {
+			if (dot)
+				return false;
+
+			found_dot = true;
+			dot = true;
+		} else {
+			bool good;
+
+			good = isalpha(*q) || (!dot && isdigit(*q)) ||
+				*q == '_' || *q == '-' ||
+				(allow_wildcard && dot &&
+					*q == '*' && *(q + 1) == '\0');
+
+			if (!good)
+				return false;
+
+			dot = false;
+		}
+	}
+
+	if (q - p > KDBUS_NAME_MAX_LEN)
+		return false;
+
+	if (dot)
+		return false;
+
+	if (!found_dot)
+		return false;
+
+	return true;
+}
+
+/**
+ * kdbus_name_acquire() - acquire a name
+ * @reg:		The name registry
+ * @conn:		The connection to pin this entry to
+ * @name:		The name to acquire
+ * @flags:		Acquisition flags (KDBUS_NAME_*)
+ *
+ * Callers must ensure that @conn is either a privileged bus user or has
+ * sufficient privileges in the policy-db to own the well-known name @name.
+ *
+ * Return: 0 success, negative error number on failure.
+ */
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+		       struct kdbus_conn *conn,
+		       const char *name, u64 *flags)
+{
+	struct kdbus_name_entry *e = NULL;
+	int ret = 0;
+	u32 hash;
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	mutex_lock(&conn->ep->bus->lock);
+	down_write(&reg->rwlock);
+
+	hash = kdbus_strhash(name);
+	e = kdbus_name_lookup(reg, hash, name);
+	if (e) {
+		/* connection already owns that name */
+		if (e->conn == conn) {
+			ret = -EALREADY;
+			goto exit_unlock;
+		}
+
+		if (kdbus_conn_is_activator(conn)) {
+			/* An activator can only own a single name */
+			if (conn->activator_of) {
+				if (conn->activator_of == e)
+					ret = -EALREADY;
+				else
+					ret = -EINVAL;
+			} else if (!e->activator && !conn->activator_of) {
+				/*
+				 * Activator registers for name that is
+				 * already owned
+				 */
+				e->activator = kdbus_conn_ref(conn);
+				conn->activator_of = e;
+			}
+
+			goto exit_unlock;
+		}
+
+		/* take over the name of an activator connection */
+		if (e->flags & KDBUS_NAME_ACTIVATOR) {
+			/*
+			 * Take over the messages queued in the activator
+			 * connection, the activator itself never reads them.
+			 */
+			ret = kdbus_conn_move_messages(conn, e->activator, 0);
+			if (ret < 0)
+				goto exit_unlock;
+
+			ret = kdbus_name_replace_owner(e, conn, *flags);
+			goto exit_unlock;
+		}
+
+		/* take over the name if both parties agree */
+		if ((*flags & KDBUS_NAME_REPLACE_EXISTING) &&
+		    (e->flags & KDBUS_NAME_ALLOW_REPLACEMENT)) {
+			/*
+			 * Move name back to the queue, in case we take it away
+			 * from a connection which asked for queuing.
+			 */
+			if (e->flags & KDBUS_NAME_QUEUE) {
+				ret = kdbus_name_queue_conn(e->conn,
+							    e->flags, e);
+				if (ret < 0)
+					goto exit_unlock;
+			}
+
+			ret = kdbus_name_replace_owner(e, conn, *flags);
+			goto exit_unlock;
+		}
+
+		/* add it to the queue waiting for the name */
+		if (*flags & KDBUS_NAME_QUEUE) {
+			ret = kdbus_name_queue_conn(conn, *flags, e);
+			if (ret < 0)
+				goto exit_unlock;
+
+			/* tell the caller that we queued it */
+			*flags |= KDBUS_NAME_IN_QUEUE;
+
+			goto exit_unlock;
+		}
+
+		/* the name is busy, return a failure */
+		ret = -EEXIST;
+		goto exit_unlock;
+	} else {
+		/* An activator can only own a single name */
+		if (kdbus_conn_is_activator(conn) &&
+		    conn->activator_of) {
+			ret = -EINVAL;
+			goto exit_unlock;
+		}
+	}
+
+	/* new name entry */
+	e = kzalloc(sizeof(*e), GFP_KERNEL);
+	if (!e) {
+		ret = -ENOMEM;
+		goto exit_unlock;
+	}
+
+	e->name = kstrdup(name, GFP_KERNEL);
+	if (!e->name) {
+		kfree(e);
+		ret = -ENOMEM;
+		goto exit_unlock;
+	}
+
+	if (kdbus_conn_is_activator(conn)) {
+		e->activator = kdbus_conn_ref(conn);
+		conn->activator_of = e;
+	}
+
+	e->flags = *flags;
+	INIT_LIST_HEAD(&e->queue_list);
+	e->name_id = ++reg->name_seq_last;
+
+	mutex_lock(&conn->lock);
+	if (!kdbus_conn_active(conn)) {
+		mutex_unlock(&conn->lock);
+		kfree(e->name);
+		kfree(e);
+		ret = -ECONNRESET;
+		goto exit_unlock;
+	}
+	hash_add(reg->entries_hash, &e->hentry, hash);
+	kdbus_name_entry_set_owner(e, conn);
+	mutex_unlock(&conn->lock);
+
+	kdbus_notify_name_change(e->conn->ep->bus, KDBUS_ITEM_NAME_ADD,
+				 0, e->conn->id,
+				 0, e->flags, e->name);
+
+exit_unlock:
+	up_write(&reg->rwlock);
+	mutex_unlock(&conn->ep->bus->lock);
+	kdbus_notify_flush(conn->ep->bus);
+	return ret;
+}
+
+/**
+ * kdbus_cmd_name_acquire() - acquire a name from a ioctl command buffer
+ * @reg:		The name registry
+ * @conn:		The connection to pin this entry to
+ * @cmd:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_acquire(struct kdbus_name_registry *reg,
+			   struct kdbus_conn *conn,
+			   struct kdbus_cmd_name *cmd)
+{
+	const char *name;
+	int ret;
+
+	name = kdbus_items_get_str(cmd->items, KDBUS_ITEMS_SIZE(cmd, items),
+				   KDBUS_ITEM_NAME);
+	if (IS_ERR(name))
+		return -EINVAL;
+
+	if (!kdbus_name_is_valid(name, false))
+		return -EINVAL;
+
+	/*
+	 * Do atomic_inc_return here to reserve our slot, then decrement
+	 * it before returning.
+	 */
+	if (atomic_inc_return(&conn->name_count) > KDBUS_CONN_MAX_NAMES) {
+		ret = -E2BIG;
+		goto out_dec;
+	}
+
+	if (!kdbus_conn_policy_own_name(conn, current_cred(), name)) {
+		ret = -EPERM;
+		goto out_dec;
+	}
+
+	ret = kdbus_name_acquire(reg, conn, name, &cmd->flags);
+
+out_dec:
+	/* Decrement the previous allocated slot */
+	atomic_dec(&conn->name_count);
+	return ret;
+}
+
+/**
+ * kdbus_cmd_name_release() - release a name entry from a ioctl command buffer
+ * @reg:		The name registry
+ * @conn:		The connection that holds the name
+ * @cmd:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_release(struct kdbus_name_registry *reg,
+			   struct kdbus_conn *conn,
+			   const struct kdbus_cmd_name *cmd)
+{
+	int ret;
+	const char *name;
+
+	name = kdbus_items_get_str(cmd->items, KDBUS_ITEMS_SIZE(cmd, items),
+				   KDBUS_ITEM_NAME);
+	if (IS_ERR(name))
+		return -EINVAL;
+
+	if (!kdbus_name_is_valid(name, false))
+		return -EINVAL;
+
+	ret = kdbus_name_release(reg, conn, name);
+
+	kdbus_notify_flush(conn->ep->bus);
+	return ret;
+}
+
+static int kdbus_name_list_write(struct kdbus_conn *conn,
+				 struct kdbus_conn *c,
+				 struct kdbus_pool_slice *slice,
+				 size_t *pos,
+				 struct kdbus_name_entry *e,
+				 bool write)
+{
+	struct kvec kvec[4];
+	size_t cnt = 0;
+	int ret;
+
+	/* info header */
+	struct kdbus_name_info info = {
+		.size = 0,
+		.owner_id = c->id,
+		.conn_flags = c->flags,
+	};
+
+	/* fake the header of a kdbus_name item */
+	struct {
+		u64 size;
+		u64 type;
+		u64 flags;
+	} h = {};
+
+	if (e && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
+						      e->name))
+		return 0;
+
+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
+
+	/* append name */
+	if (e) {
+		size_t slen = strlen(e->name) + 1;
+
+		h.size = offsetof(struct kdbus_item, name.name) + slen;
+		h.type = KDBUS_ITEM_OWNED_NAME;
+		h.flags = e->flags;
+
+		kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
+		kdbus_kvec_set(&kvec[cnt++], e->name, slen, &info.size);
+		cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
+	}
+
+	if (write) {
+		ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
+						 cnt, info.size);
+		if (ret < 0)
+			return ret;
+	}
+
+	*pos += info.size;
+	return 0;
+}
+
+static int kdbus_name_list_all(struct kdbus_conn *conn, u64 flags,
+			       struct kdbus_pool_slice *slice,
+			       size_t *pos, bool write)
+{
+	struct kdbus_conn *c;
+	size_t p = *pos;
+	int ret, i;
+
+	hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
+		bool added = false;
+
+		/* skip activators */
+		if (!(flags & KDBUS_NAME_LIST_ACTIVATORS) &&
+		    kdbus_conn_is_activator(c))
+			continue;
+
+		/* all names the connection owns */
+		if (flags & (KDBUS_NAME_LIST_NAMES |
+			     KDBUS_NAME_LIST_ACTIVATORS)) {
+			struct kdbus_name_entry *e;
+
+			mutex_lock(&c->lock);
+			list_for_each_entry(e, &c->names_list, conn_entry) {
+				struct kdbus_conn *a = e->activator;
+
+				if ((flags & KDBUS_NAME_LIST_ACTIVATORS) &&
+				    a && a != c) {
+					ret = kdbus_name_list_write(conn, a,
+							slice, &p, e, write);
+					if (ret < 0) {
+						mutex_unlock(&c->lock);
+						return ret;
+					}
+
+					added = true;
+				}
+
+				if (flags & KDBUS_NAME_LIST_NAMES ||
+				    kdbus_conn_is_activator(c)) {
+					ret = kdbus_name_list_write(conn, c,
+							slice, &p, e, write);
+					if (ret < 0) {
+						mutex_unlock(&c->lock);
+						return ret;
+					}
+
+					added = true;
+				}
+			}
+			mutex_unlock(&c->lock);
+		}
+
+		/* queue of names the connection is currently waiting for */
+		if (flags & KDBUS_NAME_LIST_QUEUED) {
+			struct kdbus_name_queue_item *q;
+
+			mutex_lock(&c->lock);
+			list_for_each_entry(q, &c->names_queue_list,
+					    conn_entry) {
+				ret = kdbus_name_list_write(conn, c,
+						slice, &p, q->entry, write);
+				if (ret < 0) {
+					mutex_unlock(&c->lock);
+					return ret;
+				}
+
+				added = true;
+			}
+			mutex_unlock(&c->lock);
+		}
+
+		/* nothing added so far, just add the unique ID */
+		if (!added && flags & KDBUS_NAME_LIST_UNIQUE) {
+			ret = kdbus_name_list_write(conn, c,
+					slice, &p, NULL, write);
+			if (ret < 0)
+				return ret;
+		}
+	}
+
+	*pos = p;
+	return 0;
+}
+
+/**
+ * kdbus_cmd_name_list() - list names of a connection
+ * @reg:		The name registry
+ * @conn:		The connection holding the name entries
+ * @cmd:		The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_list(struct kdbus_name_registry *reg,
+			struct kdbus_conn *conn,
+			struct kdbus_cmd_name_list *cmd)
+{
+	struct kdbus_pool_slice *slice = NULL;
+	struct kdbus_name_list list = {};
+	const struct kdbus_item *item;
+	struct kvec kvec;
+	size_t pos;
+	int ret;
+
+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+		/* no items supported so far */
+		switch (item->type) {
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* lock order: domain -> bus -> ep -> names -> conn */
+	down_read(&reg->rwlock);
+	down_read(&conn->ep->bus->conn_rwlock);
+	down_read(&conn->ep->policy_db.entries_rwlock);
+
+	/* size of header + records */
+	pos = sizeof(struct kdbus_name_list);
+	ret = kdbus_name_list_all(conn, cmd->flags, NULL, &pos, false);
+	if (ret < 0)
+		goto exit_unlock;
+
+	/* copy the header, specifying the overall size */
+	list.size = pos;
+	kvec.iov_base = &list;
+	kvec.iov_len = sizeof(list);
+
+	slice = kdbus_pool_slice_alloc(conn->pool, list.size, NULL, NULL, 0);
+	if (IS_ERR(slice)) {
+		ret = PTR_ERR(slice);
+		slice = NULL;
+		goto exit_unlock;
+	}
+
+	ret = kdbus_pool_slice_copy_kvec(slice, 0, &kvec, 1, kvec.iov_len);
+	if (ret < 0)
+		goto exit_unlock;
+
+	/* copy the records */
+	pos = sizeof(struct kdbus_name_list);
+	ret = kdbus_name_list_all(conn, cmd->flags, slice, &pos, true);
+	if (ret < 0)
+		goto exit_unlock;
+
+	kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
+	ret = 0;
+
+exit_unlock:
+	kdbus_pool_slice_release(slice);
+	up_read(&conn->ep->policy_db.entries_rwlock);
+	up_read(&conn->ep->bus->conn_rwlock);
+	up_read(&reg->rwlock);
+	return ret;
+}
diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
new file mode 100644
index 000000000000..81a2e2ac49c1
--- /dev/null
+++ b/ipc/kdbus/names.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni <tixxdz@opendz.org>
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NAMES_H
+#define __KDBUS_NAMES_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+/**
+ * struct kdbus_name_registry - names registered for a bus
+ * @entries_hash:	Map of entries
+ * @lock:		Registry data lock
+ * @name_seq_last:	Last used sequence number to assign to a name entry
+ */
+struct kdbus_name_registry {
+	DECLARE_HASHTABLE(entries_hash, 8);
+	struct rw_semaphore rwlock;
+	u64 name_seq_last;
+};
+
+/**
+ * struct kdbus_name_entry - well-know name entry
+ * @name:		The well-known name
+ * @name_id:		Sequence number of name entry to be able to uniquely
+ *			identify a name over its registration lifetime
+ * @flags:		KDBUS_NAME_* flags
+ * @queue_list:		List of queued waiters for the well-known name
+ * @conn_entry:		Entry in connection
+ * @hentry:		Entry in registry map
+ * @conn:		Connection owning the name
+ * @activator:		Connection of the activator queuing incoming messages
+ */
+struct kdbus_name_entry {
+	char *name;
+	u64 name_id;
+	u64 flags;
+	struct list_head queue_list;
+	struct list_head conn_entry;
+	struct hlist_node hentry;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *activator;
+};
+
+struct kdbus_name_registry *kdbus_name_registry_new(void);
+void kdbus_name_registry_free(struct kdbus_name_registry *reg);
+
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+		       struct kdbus_conn *conn,
+		       const char *name, u64 *flags);
+int kdbus_cmd_name_acquire(struct kdbus_name_registry *reg,
+			   struct kdbus_conn *conn,
+			   struct kdbus_cmd_name *cmd);
+int kdbus_cmd_name_release(struct kdbus_name_registry *reg,
+			   struct kdbus_conn *conn,
+			   const struct kdbus_cmd_name *cmd);
+int kdbus_cmd_name_list(struct kdbus_name_registry *reg,
+			struct kdbus_conn *conn,
+			struct kdbus_cmd_name_list *cmd);
+
+struct kdbus_name_entry *kdbus_name_lock(struct kdbus_name_registry *reg,
+					 const char *name);
+struct kdbus_name_entry *kdbus_name_unlock(struct kdbus_name_registry *reg,
+					   struct kdbus_name_entry *entry);
+
+void kdbus_name_remove_by_conn(struct kdbus_name_registry *reg,
+			       struct kdbus_conn *conn);
+
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 11/13] kdbus: add policy database implementation
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds the policy database implementation.

A policy databases restrict the possibilities of connections to own,
see and talk to well-known names. It can be associated with a bus
(through a policy holder connection) or a custom endpoint.

By default, buses have an empty policy database that is augmented on
demand when a policy holder connection is instantiated.

Policies are set through KDBUS_CMD_HELLO (when creating a policy
holder connection), KDBUS_CMD_CONN_UPDATE (when updating a policy
holder connection), KDBUS_CMD_EP_MAKE (creating a custom endpoint)
or KDBUS_CMD_EP_UPDATE (updating a custom endpoint). In all cases,
the name and policy access information is stored in items of type
KDBUS_ITEM_NAME and KDBUS_ITEM_POLICY_ACCESS.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/policy.c | 481 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/policy.h |  51 ++++++
 2 files changed, 532 insertions(+)
 create mode 100644 ipc/kdbus/policy.c
 create mode 100644 ipc/kdbus/policy.h

diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
new file mode 100644
index 000000000000..0b60f5f381bf
--- /dev/null
+++ b/ipc/kdbus/policy.c
@@ -0,0 +1,481 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "item.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_POLICY_HASH_SIZE	64
+
+/**
+ * struct kdbus_policy_db_entry_access - a database entry access item
+ * @type:		One of KDBUS_POLICY_ACCESS_* types
+ * @access:		Access to grant. One of KDBUS_POLICY_*
+ * @uid:		For KDBUS_POLICY_ACCESS_USER, the global uid
+ * @gid:		For KDBUS_POLICY_ACCESS_GROUP, the global gid
+ * @list:		List entry item for the entry's list
+ *
+ * This is the internal version of struct kdbus_policy_db_access.
+ */
+struct kdbus_policy_db_entry_access {
+	u8 type;		/* USER, GROUP, WORLD */
+	u8 access;		/* OWN, TALK, SEE */
+	union {
+		kuid_t uid;	/* global uid */
+		kgid_t gid;	/* global gid */
+	};
+	struct list_head list;
+};
+
+/**
+ * struct kdbus_policy_db_entry - a policy database entry
+ * @name:		The name to match the policy entry against
+ * @hentry:		The hash entry for the database's entries_hash
+ * @access_list:	List head for keeping tracks of the entry's
+ *			access items.
+ * @owner:		The owner of this entry. Can be a kdbus_conn or
+ *			a kdbus_ep object.
+ * @wildcard:		The name is a wildcard, such as ending on '.*'
+ */
+struct kdbus_policy_db_entry {
+	char *name;
+	struct hlist_node hentry;
+	struct list_head access_list;
+	const void *owner;
+	bool wildcard:1;
+};
+
+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
+{
+	struct kdbus_policy_db_entry_access *a, *tmp;
+
+	list_for_each_entry_safe(a, tmp, &e->access_list, list) {
+		list_del(&a->list);
+		kfree(a);
+	}
+
+	kfree(e->name);
+	kfree(e);
+}
+
+static const struct kdbus_policy_db_entry *
+kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
+{
+	struct kdbus_policy_db_entry *e;
+	const char *dot;
+	size_t len;
+
+	/* find exact match */
+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
+		if (strcmp(e->name, name) == 0 && !e->wildcard)
+			return e;
+
+	/* find wildcard match */
+
+	dot = strrchr(name, '.');
+	if (!dot)
+		return NULL;
+
+	len = dot - name;
+	hash = kdbus_strnhash(name, len);
+
+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
+		if (e->wildcard && !strncmp(e->name, name, len) &&
+		    !e->name[len])
+			return e;
+
+	return NULL;
+}
+
+/**
+ * kdbus_policy_db_clear - release all memory from a policy db
+ * @db:		The policy database
+ */
+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
+{
+	struct kdbus_policy_db_entry *e;
+	struct hlist_node *tmp;
+	unsigned int i;
+
+	/* purge entries */
+	down_write(&db->entries_rwlock);
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
+		hash_del(&e->hentry);
+		kdbus_policy_entry_free(e);
+	}
+	up_write(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_db_init() - initialize a new policy database
+ * @db:		The location of the database
+ *
+ * This initializes a new policy-db. The underlying memory must have been
+ * cleared to zero by the caller.
+ */
+void kdbus_policy_db_init(struct kdbus_policy_db *db)
+{
+	hash_init(db->entries_hash);
+	init_rwsem(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_query_unlocked() - Query the policy database
+ * @db:		Policy database
+ * @cred:	Credentials to test against
+ * @name:	Name to query
+ * @hash:	Hash value of @name
+ *
+ * Same as kdbus_policy_query() but requires the caller to lock the policy
+ * database against concurrent writes.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+				const struct cred *cred, const char *name,
+				unsigned int hash)
+{
+	struct kdbus_policy_db_entry_access *a;
+	const struct kdbus_policy_db_entry *e;
+	int i, highest = -EPERM;
+
+	e = kdbus_policy_lookup(db, name, hash);
+	if (!e)
+		return -EPERM;
+
+	list_for_each_entry(a, &e->access_list, list) {
+		if ((int)a->access <= highest)
+			continue;
+
+		switch (a->type) {
+		case KDBUS_POLICY_ACCESS_USER:
+			if (uid_eq(cred->euid, a->uid))
+				highest = a->access;
+			break;
+		case KDBUS_POLICY_ACCESS_GROUP:
+			if (gid_eq(cred->egid, a->gid)) {
+				highest = a->access;
+				break;
+			}
+
+			for (i = 0; i < cred->group_info->ngroups; i++) {
+				kgid_t gid = GROUP_AT(cred->group_info, i);
+
+				if (gid_eq(gid, a->gid)) {
+					highest = a->access;
+					break;
+				}
+			}
+
+			break;
+		case KDBUS_POLICY_ACCESS_WORLD:
+			highest = a->access;
+			break;
+		}
+
+		/* OWN is the highest possible policy */
+		if (highest >= KDBUS_POLICY_OWN)
+			break;
+	}
+
+	return highest;
+}
+
+/**
+ * kdbus_policy_query() - Query the policy database
+ * @db:		Policy database
+ * @cred:	Credentials to test against
+ * @name:	Name to query
+ * @hash:	Hash value of @name
+ *
+ * Query the policy database @db for the access rights of @cred to the name
+ * @name. The access rights of @cred are returned, or -EPERM if no access is
+ * granted.
+ *
+ * This call effectively searches for the highest access-right granted to
+ * @cred. The caller should really cache those as policy lookups are rather
+ * expensive.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+		       const char *name, unsigned int hash)
+{
+	int ret;
+
+	down_read(&db->entries_rwlock);
+	ret = kdbus_policy_query_unlocked(db, cred, name, hash);
+	up_read(&db->entries_rwlock);
+
+	return ret;
+}
+
+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+					const void *owner)
+{
+	struct kdbus_policy_db_entry *e;
+	struct hlist_node *tmp;
+	int i;
+
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+		if (e->owner == owner) {
+			hash_del(&e->hentry);
+			kdbus_policy_entry_free(e);
+		}
+}
+
+/**
+ * kdbus_policy_remove_owner() - remove all entries related to a connection
+ * @db:		The policy database
+ * @owner:	The connection which items to remove
+ */
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+			       const void *owner)
+{
+	down_write(&db->entries_rwlock);
+	__kdbus_policy_remove_owner(db, owner);
+	up_write(&db->entries_rwlock);
+}
+
+/*
+ * Convert user provided policy access to internal kdbus policy
+ * access
+ */
+static struct kdbus_policy_db_entry_access *
+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
+{
+	int ret;
+	struct kdbus_policy_db_entry_access *a;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return ERR_PTR(-ENOMEM);
+
+	ret = -EINVAL;
+	switch (uaccess->access) {
+	case KDBUS_POLICY_SEE:
+	case KDBUS_POLICY_TALK:
+	case KDBUS_POLICY_OWN:
+		a->access = uaccess->access;
+		break;
+	default:
+		goto err;
+	}
+
+	switch (uaccess->type) {
+	case KDBUS_POLICY_ACCESS_USER:
+		a->uid = make_kuid(current_user_ns(), uaccess->id);
+		if (!uid_valid(a->uid))
+			goto err;
+
+		break;
+	case KDBUS_POLICY_ACCESS_GROUP:
+		a->gid = make_kgid(current_user_ns(), uaccess->id);
+		if (!gid_valid(a->gid))
+			goto err;
+
+		break;
+	case KDBUS_POLICY_ACCESS_WORLD:
+		break;
+	default:
+		goto err;
+	}
+
+	a->type = uaccess->type;
+
+	return a;
+
+err:
+	kfree(a);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_policy_set() - set a connection's policy rules
+ * @db:				The policy database
+ * @items:			A list of kdbus_item elements that contain both
+ *				names and access rules to set.
+ * @items_size:			The total size of the items.
+ * @max_policies:		The maximum number of policy entries to allow.
+ *				Pass 0 for no limit.
+ * @allow_wildcards:		Boolean value whether wildcard entries (such
+ *				ending on '.*') should be allowed.
+ * @owner:			The owner of the new policy items.
+ *
+ * This function sets a new set of policies for a given owner. The names and
+ * access rules are gathered by walking the list of items passed in as
+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
+ * pattern than denoted in @max_policies, -EINVAL is returned.
+ *
+ * In order to allow atomic replacement of rules, the function first removes
+ * all entries that have been created for the given owner previously.
+ *
+ * Callers to this function must make sur that the owner is a custom
+ * endpoint, or if the endpoint is a default endpoint, then it must be
+ * either a policy holder or an activator.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_policy_set(struct kdbus_policy_db *db,
+		     const struct kdbus_item *items,
+		     size_t items_size,
+		     size_t max_policies,
+		     bool allow_wildcards,
+		     const void *owner)
+{
+	struct kdbus_policy_db_entry_access *a;
+	struct kdbus_policy_db_entry *e, *p;
+	const struct kdbus_item *item;
+	struct hlist_node *tmp;
+	HLIST_HEAD(entries);
+	HLIST_HEAD(restore);
+	size_t count = 0;
+	int i, ret = 0;
+	u32 hash;
+
+	if (items_size > KDBUS_POLICY_MAX_SIZE)
+		return -E2BIG;
+
+	/* Walk the list of items and look for new policies */
+	e = NULL;
+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
+		switch (item->type) {
+		case KDBUS_ITEM_NAME: {
+			size_t len;
+
+			if (max_policies && ++count > max_policies) {
+				ret = -E2BIG;
+				goto exit;
+			}
+
+			if (!kdbus_name_is_valid(item->str, true)) {
+				ret = -EINVAL;
+				goto exit;
+			}
+
+			e = kzalloc(sizeof(*e), GFP_KERNEL);
+			if (!e) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			INIT_LIST_HEAD(&e->access_list);
+			e->owner = owner;
+			hlist_add_head(&e->hentry, &entries);
+
+			e->name = kstrdup(item->str, GFP_KERNEL);
+			if (!e->name) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			/*
+			 * If a supplied name ends with an '.*', cut off that
+			 * part, only store anything before it, and mark the
+			 * entry as wildcard.
+			 */
+			len = strlen(e->name);
+			if (len > 2 &&
+			    e->name[len - 3] == '.' &&
+			    e->name[len - 2] == '*') {
+				if (!allow_wildcards) {
+					ret = -EINVAL;
+					goto exit;
+				}
+
+				e->name[len - 3] = '\0';
+				e->wildcard = true;
+			}
+
+			break;
+		}
+
+		case KDBUS_ITEM_POLICY_ACCESS:
+			if (!e) {
+				ret = -EINVAL;
+				goto exit;
+			}
+
+			a = kdbus_policy_make_access(&item->policy_access);
+			if (IS_ERR(a)) {
+				ret = PTR_ERR(a);
+				goto exit;
+			}
+
+			list_add_tail(&a->list, &e->access_list);
+			break;
+		}
+	}
+
+	down_write(&db->entries_rwlock);
+
+	/* remember previous entries to restore in case of failure */
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+		if (e->owner == owner) {
+			hash_del(&e->hentry);
+			hlist_add_head(&e->hentry, &restore);
+		}
+
+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+		/* prevent duplicates */
+		hash = kdbus_strhash(e->name);
+		hash_for_each_possible(db->entries_hash, p, hentry, hash)
+			if (strcmp(e->name, p->name) == 0 &&
+			    e->wildcard == p->wildcard) {
+				ret = -EEXIST;
+				goto restore;
+			}
+
+		hlist_del(&e->hentry);
+		hash_add(db->entries_hash, &e->hentry, hash);
+	}
+
+restore:
+	/* if we failed, flush all entries we added so far */
+	if (ret < 0)
+		__kdbus_policy_remove_owner(db, owner);
+
+	/* if we failed, restore entries, otherwise release them */
+	hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
+		hlist_del(&e->hentry);
+		if (ret < 0) {
+			hash = kdbus_strhash(e->name);
+			hash_add(db->entries_hash, &e->hentry, hash);
+		} else {
+			kdbus_policy_entry_free(e);
+		}
+	}
+
+	up_write(&db->entries_rwlock);
+
+exit:
+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+		hlist_del(&e->hentry);
+		kdbus_policy_entry_free(e);
+	}
+
+	return ret;
+}
diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
new file mode 100644
index 000000000000..c137dd996861
--- /dev/null
+++ b/ipc/kdbus/policy.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POLICY_H
+#define __KDBUS_POLICY_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+struct kdbus_conn;
+struct kdbus_item;
+
+/**
+ * struct kdbus_policy_db - policy database
+ * @entries_hash:	Hashtable of entries
+ * @entries_lock:	Mutex to protect the database's access entries
+ */
+struct kdbus_policy_db {
+	DECLARE_HASHTABLE(entries_hash, 6);
+	struct rw_semaphore entries_rwlock;
+};
+
+void kdbus_policy_db_init(struct kdbus_policy_db *db);
+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
+
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+				const struct cred *cred, const char *name,
+				unsigned int hash);
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+		       const char *name, unsigned int hash);
+
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+			       const void *owner);
+int kdbus_policy_set(struct kdbus_policy_db *db,
+		     const struct kdbus_item *items,
+		     size_t items_size,
+		     size_t max_policies,
+		     bool allow_wildcards,
+		     const void *owner);
+
+#endif
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 11/13] kdbus: add policy database implementation
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

This patch adds the policy database implementation.

A policy databases restrict the possibilities of connections to own,
see and talk to well-known names. It can be associated with a bus
(through a policy holder connection) or a custom endpoint.

By default, buses have an empty policy database that is augmented on
demand when a policy holder connection is instantiated.

Policies are set through KDBUS_CMD_HELLO (when creating a policy
holder connection), KDBUS_CMD_CONN_UPDATE (when updating a policy
holder connection), KDBUS_CMD_EP_MAKE (creating a custom endpoint)
or KDBUS_CMD_EP_UPDATE (updating a custom endpoint). In all cases,
the name and policy access information is stored in items of type
KDBUS_ITEM_NAME and KDBUS_ITEM_POLICY_ACCESS.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
 ipc/kdbus/policy.c | 481 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/kdbus/policy.h |  51 ++++++
 2 files changed, 532 insertions(+)
 create mode 100644 ipc/kdbus/policy.c
 create mode 100644 ipc/kdbus/policy.h

diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
new file mode 100644
index 000000000000..0b60f5f381bf
--- /dev/null
+++ b/ipc/kdbus/policy.c
@@ -0,0 +1,481 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "item.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_POLICY_HASH_SIZE	64
+
+/**
+ * struct kdbus_policy_db_entry_access - a database entry access item
+ * @type:		One of KDBUS_POLICY_ACCESS_* types
+ * @access:		Access to grant. One of KDBUS_POLICY_*
+ * @uid:		For KDBUS_POLICY_ACCESS_USER, the global uid
+ * @gid:		For KDBUS_POLICY_ACCESS_GROUP, the global gid
+ * @list:		List entry item for the entry's list
+ *
+ * This is the internal version of struct kdbus_policy_db_access.
+ */
+struct kdbus_policy_db_entry_access {
+	u8 type;		/* USER, GROUP, WORLD */
+	u8 access;		/* OWN, TALK, SEE */
+	union {
+		kuid_t uid;	/* global uid */
+		kgid_t gid;	/* global gid */
+	};
+	struct list_head list;
+};
+
+/**
+ * struct kdbus_policy_db_entry - a policy database entry
+ * @name:		The name to match the policy entry against
+ * @hentry:		The hash entry for the database's entries_hash
+ * @access_list:	List head for keeping tracks of the entry's
+ *			access items.
+ * @owner:		The owner of this entry. Can be a kdbus_conn or
+ *			a kdbus_ep object.
+ * @wildcard:		The name is a wildcard, such as ending on '.*'
+ */
+struct kdbus_policy_db_entry {
+	char *name;
+	struct hlist_node hentry;
+	struct list_head access_list;
+	const void *owner;
+	bool wildcard:1;
+};
+
+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
+{
+	struct kdbus_policy_db_entry_access *a, *tmp;
+
+	list_for_each_entry_safe(a, tmp, &e->access_list, list) {
+		list_del(&a->list);
+		kfree(a);
+	}
+
+	kfree(e->name);
+	kfree(e);
+}
+
+static const struct kdbus_policy_db_entry *
+kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
+{
+	struct kdbus_policy_db_entry *e;
+	const char *dot;
+	size_t len;
+
+	/* find exact match */
+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
+		if (strcmp(e->name, name) == 0 && !e->wildcard)
+			return e;
+
+	/* find wildcard match */
+
+	dot = strrchr(name, '.');
+	if (!dot)
+		return NULL;
+
+	len = dot - name;
+	hash = kdbus_strnhash(name, len);
+
+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
+		if (e->wildcard && !strncmp(e->name, name, len) &&
+		    !e->name[len])
+			return e;
+
+	return NULL;
+}
+
+/**
+ * kdbus_policy_db_clear - release all memory from a policy db
+ * @db:		The policy database
+ */
+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
+{
+	struct kdbus_policy_db_entry *e;
+	struct hlist_node *tmp;
+	unsigned int i;
+
+	/* purge entries */
+	down_write(&db->entries_rwlock);
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
+		hash_del(&e->hentry);
+		kdbus_policy_entry_free(e);
+	}
+	up_write(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_db_init() - initialize a new policy database
+ * @db:		The location of the database
+ *
+ * This initializes a new policy-db. The underlying memory must have been
+ * cleared to zero by the caller.
+ */
+void kdbus_policy_db_init(struct kdbus_policy_db *db)
+{
+	hash_init(db->entries_hash);
+	init_rwsem(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_query_unlocked() - Query the policy database
+ * @db:		Policy database
+ * @cred:	Credentials to test against
+ * @name:	Name to query
+ * @hash:	Hash value of @name
+ *
+ * Same as kdbus_policy_query() but requires the caller to lock the policy
+ * database against concurrent writes.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+				const struct cred *cred, const char *name,
+				unsigned int hash)
+{
+	struct kdbus_policy_db_entry_access *a;
+	const struct kdbus_policy_db_entry *e;
+	int i, highest = -EPERM;
+
+	e = kdbus_policy_lookup(db, name, hash);
+	if (!e)
+		return -EPERM;
+
+	list_for_each_entry(a, &e->access_list, list) {
+		if ((int)a->access <= highest)
+			continue;
+
+		switch (a->type) {
+		case KDBUS_POLICY_ACCESS_USER:
+			if (uid_eq(cred->euid, a->uid))
+				highest = a->access;
+			break;
+		case KDBUS_POLICY_ACCESS_GROUP:
+			if (gid_eq(cred->egid, a->gid)) {
+				highest = a->access;
+				break;
+			}
+
+			for (i = 0; i < cred->group_info->ngroups; i++) {
+				kgid_t gid = GROUP_AT(cred->group_info, i);
+
+				if (gid_eq(gid, a->gid)) {
+					highest = a->access;
+					break;
+				}
+			}
+
+			break;
+		case KDBUS_POLICY_ACCESS_WORLD:
+			highest = a->access;
+			break;
+		}
+
+		/* OWN is the highest possible policy */
+		if (highest >= KDBUS_POLICY_OWN)
+			break;
+	}
+
+	return highest;
+}
+
+/**
+ * kdbus_policy_query() - Query the policy database
+ * @db:		Policy database
+ * @cred:	Credentials to test against
+ * @name:	Name to query
+ * @hash:	Hash value of @name
+ *
+ * Query the policy database @db for the access rights of @cred to the name
+ * @name. The access rights of @cred are returned, or -EPERM if no access is
+ * granted.
+ *
+ * This call effectively searches for the highest access-right granted to
+ * @cred. The caller should really cache those as policy lookups are rather
+ * expensive.
+ *
+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
+ */
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+		       const char *name, unsigned int hash)
+{
+	int ret;
+
+	down_read(&db->entries_rwlock);
+	ret = kdbus_policy_query_unlocked(db, cred, name, hash);
+	up_read(&db->entries_rwlock);
+
+	return ret;
+}
+
+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+					const void *owner)
+{
+	struct kdbus_policy_db_entry *e;
+	struct hlist_node *tmp;
+	int i;
+
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+		if (e->owner == owner) {
+			hash_del(&e->hentry);
+			kdbus_policy_entry_free(e);
+		}
+}
+
+/**
+ * kdbus_policy_remove_owner() - remove all entries related to a connection
+ * @db:		The policy database
+ * @owner:	The connection which items to remove
+ */
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+			       const void *owner)
+{
+	down_write(&db->entries_rwlock);
+	__kdbus_policy_remove_owner(db, owner);
+	up_write(&db->entries_rwlock);
+}
+
+/*
+ * Convert user provided policy access to internal kdbus policy
+ * access
+ */
+static struct kdbus_policy_db_entry_access *
+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
+{
+	int ret;
+	struct kdbus_policy_db_entry_access *a;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return ERR_PTR(-ENOMEM);
+
+	ret = -EINVAL;
+	switch (uaccess->access) {
+	case KDBUS_POLICY_SEE:
+	case KDBUS_POLICY_TALK:
+	case KDBUS_POLICY_OWN:
+		a->access = uaccess->access;
+		break;
+	default:
+		goto err;
+	}
+
+	switch (uaccess->type) {
+	case KDBUS_POLICY_ACCESS_USER:
+		a->uid = make_kuid(current_user_ns(), uaccess->id);
+		if (!uid_valid(a->uid))
+			goto err;
+
+		break;
+	case KDBUS_POLICY_ACCESS_GROUP:
+		a->gid = make_kgid(current_user_ns(), uaccess->id);
+		if (!gid_valid(a->gid))
+			goto err;
+
+		break;
+	case KDBUS_POLICY_ACCESS_WORLD:
+		break;
+	default:
+		goto err;
+	}
+
+	a->type = uaccess->type;
+
+	return a;
+
+err:
+	kfree(a);
+	return ERR_PTR(ret);
+}
+
+/**
+ * kdbus_policy_set() - set a connection's policy rules
+ * @db:				The policy database
+ * @items:			A list of kdbus_item elements that contain both
+ *				names and access rules to set.
+ * @items_size:			The total size of the items.
+ * @max_policies:		The maximum number of policy entries to allow.
+ *				Pass 0 for no limit.
+ * @allow_wildcards:		Boolean value whether wildcard entries (such
+ *				ending on '.*') should be allowed.
+ * @owner:			The owner of the new policy items.
+ *
+ * This function sets a new set of policies for a given owner. The names and
+ * access rules are gathered by walking the list of items passed in as
+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
+ * pattern than denoted in @max_policies, -EINVAL is returned.
+ *
+ * In order to allow atomic replacement of rules, the function first removes
+ * all entries that have been created for the given owner previously.
+ *
+ * Callers to this function must make sur that the owner is a custom
+ * endpoint, or if the endpoint is a default endpoint, then it must be
+ * either a policy holder or an activator.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_policy_set(struct kdbus_policy_db *db,
+		     const struct kdbus_item *items,
+		     size_t items_size,
+		     size_t max_policies,
+		     bool allow_wildcards,
+		     const void *owner)
+{
+	struct kdbus_policy_db_entry_access *a;
+	struct kdbus_policy_db_entry *e, *p;
+	const struct kdbus_item *item;
+	struct hlist_node *tmp;
+	HLIST_HEAD(entries);
+	HLIST_HEAD(restore);
+	size_t count = 0;
+	int i, ret = 0;
+	u32 hash;
+
+	if (items_size > KDBUS_POLICY_MAX_SIZE)
+		return -E2BIG;
+
+	/* Walk the list of items and look for new policies */
+	e = NULL;
+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
+		switch (item->type) {
+		case KDBUS_ITEM_NAME: {
+			size_t len;
+
+			if (max_policies && ++count > max_policies) {
+				ret = -E2BIG;
+				goto exit;
+			}
+
+			if (!kdbus_name_is_valid(item->str, true)) {
+				ret = -EINVAL;
+				goto exit;
+			}
+
+			e = kzalloc(sizeof(*e), GFP_KERNEL);
+			if (!e) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			INIT_LIST_HEAD(&e->access_list);
+			e->owner = owner;
+			hlist_add_head(&e->hentry, &entries);
+
+			e->name = kstrdup(item->str, GFP_KERNEL);
+			if (!e->name) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			/*
+			 * If a supplied name ends with an '.*', cut off that
+			 * part, only store anything before it, and mark the
+			 * entry as wildcard.
+			 */
+			len = strlen(e->name);
+			if (len > 2 &&
+			    e->name[len - 3] == '.' &&
+			    e->name[len - 2] == '*') {
+				if (!allow_wildcards) {
+					ret = -EINVAL;
+					goto exit;
+				}
+
+				e->name[len - 3] = '\0';
+				e->wildcard = true;
+			}
+
+			break;
+		}
+
+		case KDBUS_ITEM_POLICY_ACCESS:
+			if (!e) {
+				ret = -EINVAL;
+				goto exit;
+			}
+
+			a = kdbus_policy_make_access(&item->policy_access);
+			if (IS_ERR(a)) {
+				ret = PTR_ERR(a);
+				goto exit;
+			}
+
+			list_add_tail(&a->list, &e->access_list);
+			break;
+		}
+	}
+
+	down_write(&db->entries_rwlock);
+
+	/* remember previous entries to restore in case of failure */
+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+		if (e->owner == owner) {
+			hash_del(&e->hentry);
+			hlist_add_head(&e->hentry, &restore);
+		}
+
+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+		/* prevent duplicates */
+		hash = kdbus_strhash(e->name);
+		hash_for_each_possible(db->entries_hash, p, hentry, hash)
+			if (strcmp(e->name, p->name) == 0 &&
+			    e->wildcard == p->wildcard) {
+				ret = -EEXIST;
+				goto restore;
+			}
+
+		hlist_del(&e->hentry);
+		hash_add(db->entries_hash, &e->hentry, hash);
+	}
+
+restore:
+	/* if we failed, flush all entries we added so far */
+	if (ret < 0)
+		__kdbus_policy_remove_owner(db, owner);
+
+	/* if we failed, restore entries, otherwise release them */
+	hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
+		hlist_del(&e->hentry);
+		if (ret < 0) {
+			hash = kdbus_strhash(e->name);
+			hash_add(db->entries_hash, &e->hentry, hash);
+		} else {
+			kdbus_policy_entry_free(e);
+		}
+	}
+
+	up_write(&db->entries_rwlock);
+
+exit:
+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+		hlist_del(&e->hentry);
+		kdbus_policy_entry_free(e);
+	}
+
+	return ret;
+}
diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
new file mode 100644
index 000000000000..c137dd996861
--- /dev/null
+++ b/ipc/kdbus/policy.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+ * Copyright (C) 2013-2014 Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+ * Copyright (C) 2013-2014 David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POLICY_H
+#define __KDBUS_POLICY_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+struct kdbus_conn;
+struct kdbus_item;
+
+/**
+ * struct kdbus_policy_db - policy database
+ * @entries_hash:	Hashtable of entries
+ * @entries_lock:	Mutex to protect the database's access entries
+ */
+struct kdbus_policy_db {
+	DECLARE_HASHTABLE(entries_hash, 6);
+	struct rw_semaphore entries_rwlock;
+};
+
+void kdbus_policy_db_init(struct kdbus_policy_db *db);
+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
+
+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
+				const struct cred *cred, const char *name,
+				unsigned int hash);
+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
+		       const char *name, unsigned int hash);
+
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+			       const void *owner);
+int kdbus_policy_set(struct kdbus_policy_db *db,
+		     const struct kdbus_item *items,
+		     size_t items_size,
+		     size_t max_policies,
+		     bool allow_wildcards,
+		     const void *owner);
+
+#endif
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 12/13] kdbus: add Makefile, Kconfig and MAINTAINERS entry
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch hooks up the build system to actually compile the files
added by previous patches. It also adds an entry to MAINTAINERS to
direct people to Greg KH, David Herrmann, Djalal Harouni and me for
questions and patches.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 MAINTAINERS        | 12 ++++++++++++
 init/Kconfig       | 12 ++++++++++++
 ipc/Makefile       |  2 +-
 ipc/kdbus/Makefile | 22 ++++++++++++++++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 ipc/kdbus/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 3589d67437f8..638f78ea6fb6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5410,6 +5410,18 @@ S:	Maintained
 F:	Documentation/kbuild/kconfig-language.txt
 F:	scripts/kconfig/
 
+KDBUS
+M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+M:	Daniel Mack <daniel@zonque.org>
+M:	David Herrmann <dh.herrmann@googlemail.com>
+M:	Djalal Harouni <tixxdz@opendz.org>
+L:	linux-kernel@vger.kernel.org
+S:	Maintained
+F:	ipc/kdbus/*
+F:	Documentation/kdbus.txt
+F:	include/uapi/linux/kdbus.h
+F:	tools/testing/selftests/kdbus/
+
 KDUMP
 M:	Vivek Goyal <vgoyal@redhat.com>
 M:	Haren Myneni <hbabu@us.ibm.com>
diff --git a/init/Kconfig b/init/Kconfig
index 9afb971497f4..4263ef30dbcc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -261,6 +261,18 @@ config POSIX_MQUEUE_SYSCTL
 	depends on SYSCTL
 	default y
 
+config KDBUS
+	tristate "kdbus interprocess communication"
+	depends on TMPFS
+	help
+	  D-Bus is a system for low-latency, low-overhead, easy to use
+	  interprocess communication (IPC).
+
+	  See Documentation/kdbus.txt
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called kdbus.
+
 config CROSS_MEMORY_ATTACH
 	bool "Enable process_vm_readv/writev syscalls"
 	depends on MMU
diff --git a/ipc/Makefile b/ipc/Makefile
index 86c7300ecdf5..68ec4167d11b 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
 obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
 obj-$(CONFIG_IPC_NS) += namespace.o
 obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
-
+obj-$(CONFIG_KDBUS) += kdbus/
diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
new file mode 100644
index 000000000000..7ee9271e1449
--- /dev/null
+++ b/ipc/kdbus/Makefile
@@ -0,0 +1,22 @@
+kdbus-y := \
+	bus.o \
+	connection.o \
+	endpoint.o \
+	fs.o \
+	handle.o \
+	item.o \
+	main.o \
+	match.o \
+	message.o \
+	metadata.o \
+	names.o \
+	node.o \
+	notify.o \
+	domain.o \
+	policy.o \
+	pool.o \
+	reply.o \
+	queue.o \
+	util.o
+
+obj-$(CONFIG_KDBUS) += kdbus.o
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 12/13] kdbus: add Makefile, Kconfig and MAINTAINERS entry
@ 2015-01-16 19:16   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

This patch hooks up the build system to actually compile the files
added by previous patches. It also adds an entry to MAINTAINERS to
direct people to Greg KH, David Herrmann, Djalal Harouni and me for
questions and patches.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
 MAINTAINERS        | 12 ++++++++++++
 init/Kconfig       | 12 ++++++++++++
 ipc/Makefile       |  2 +-
 ipc/kdbus/Makefile | 22 ++++++++++++++++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 ipc/kdbus/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 3589d67437f8..638f78ea6fb6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5410,6 +5410,18 @@ S:	Maintained
 F:	Documentation/kbuild/kconfig-language.txt
 F:	scripts/kconfig/
 
+KDBUS
+M:	Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
+M:	Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
+M:	David Herrmann <dh.herrmann-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
+M:	Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
+L:	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+S:	Maintained
+F:	ipc/kdbus/*
+F:	Documentation/kdbus.txt
+F:	include/uapi/linux/kdbus.h
+F:	tools/testing/selftests/kdbus/
+
 KDUMP
 M:	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
 M:	Haren Myneni <hbabu-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
diff --git a/init/Kconfig b/init/Kconfig
index 9afb971497f4..4263ef30dbcc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -261,6 +261,18 @@ config POSIX_MQUEUE_SYSCTL
 	depends on SYSCTL
 	default y
 
+config KDBUS
+	tristate "kdbus interprocess communication"
+	depends on TMPFS
+	help
+	  D-Bus is a system for low-latency, low-overhead, easy to use
+	  interprocess communication (IPC).
+
+	  See Documentation/kdbus.txt
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called kdbus.
+
 config CROSS_MEMORY_ATTACH
 	bool "Enable process_vm_readv/writev syscalls"
 	depends on MMU
diff --git a/ipc/Makefile b/ipc/Makefile
index 86c7300ecdf5..68ec4167d11b 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
 obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
 obj-$(CONFIG_IPC_NS) += namespace.o
 obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
-
+obj-$(CONFIG_KDBUS) += kdbus/
diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
new file mode 100644
index 000000000000..7ee9271e1449
--- /dev/null
+++ b/ipc/kdbus/Makefile
@@ -0,0 +1,22 @@
+kdbus-y := \
+	bus.o \
+	connection.o \
+	endpoint.o \
+	fs.o \
+	handle.o \
+	item.o \
+	main.o \
+	match.o \
+	message.o \
+	metadata.o \
+	names.o \
+	node.o \
+	notify.o \
+	domain.o \
+	policy.o \
+	pool.o \
+	reply.o \
+	queue.o \
+	util.o
+
+obj-$(CONFIG_KDBUS) += kdbus.o
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH 13/13] kdbus: add selftests
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  (?)
@ 2015-01-16 19:16 ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 19:16 UTC (permalink / raw)
  To: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Daniel Mack, Greg Kroah-Hartman

From: Daniel Mack <daniel@zonque.org>

This patch adds a quite extensive test suite for kdbus that checks
the most important code pathes in the driver. The idea is to extend
the test suite over time.

Also, this code can serve as an example implementation to show how to
use the kernel API from userspace.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 tools/testing/selftests/Makefile                  |    1 +
 tools/testing/selftests/kdbus/.gitignore          |   11 +
 tools/testing/selftests/kdbus/Makefile            |   46 +
 tools/testing/selftests/kdbus/kdbus-enum.c        |   95 ++
 tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
 tools/testing/selftests/kdbus/kdbus-test.c        |  920 ++++++++++++
 tools/testing/selftests/kdbus/kdbus-test.h        |   85 ++
 tools/testing/selftests/kdbus/kdbus-util.c        | 1646 +++++++++++++++++++++
 tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
 tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
 tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++++
 tools/testing/selftests/kdbus/test-benchmark.c    |  427 ++++++
 tools/testing/selftests/kdbus/test-bus.c          |  174 +++
 tools/testing/selftests/kdbus/test-chat.c         |  123 ++
 tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++++
 tools/testing/selftests/kdbus/test-daemon.c       |   66 +
 tools/testing/selftests/kdbus/test-endpoint.c     |  344 +++++
 tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++++
 tools/testing/selftests/kdbus/test-free.c         |   36 +
 tools/testing/selftests/kdbus/test-match.c        |  442 ++++++
 tools/testing/selftests/kdbus/test-message.c      |  658 ++++++++
 tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++++
 tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
 tools/testing/selftests/kdbus/test-names.c        |  184 +++
 tools/testing/selftests/kdbus/test-policy-ns.c    |  633 ++++++++
 tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 ++++++++++++++++
 tools/testing/selftests/kdbus/test-policy.c       |   81 +
 tools/testing/selftests/kdbus/test-race.c         |  313 ++++
 tools/testing/selftests/kdbus/test-sync.c         |  368 +++++
 tools/testing/selftests/kdbus/test-timeout.c      |   99 ++
 30 files changed, 11308 insertions(+)
 create mode 100644 tools/testing/selftests/kdbus/.gitignore
 create mode 100644 tools/testing/selftests/kdbus/Makefile
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
 create mode 100644 tools/testing/selftests/kdbus/test-activator.c
 create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
 create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
 create mode 100644 tools/testing/selftests/kdbus/test-bus.c
 create mode 100644 tools/testing/selftests/kdbus/test-chat.c
 create mode 100644 tools/testing/selftests/kdbus/test-connection.c
 create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
 create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
 create mode 100644 tools/testing/selftests/kdbus/test-fd.c
 create mode 100644 tools/testing/selftests/kdbus/test-free.c
 create mode 100644 tools/testing/selftests/kdbus/test-match.c
 create mode 100644 tools/testing/selftests/kdbus/test-message.c
 create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
 create mode 100644 tools/testing/selftests/kdbus/test-names.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy.c
 create mode 100644 tools/testing/selftests/kdbus/test-race.c
 create mode 100644 tools/testing/selftests/kdbus/test-sync.c
 create mode 100644 tools/testing/selftests/kdbus/test-timeout.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 4e511221a0c1..7b51cceae9dd 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += exec
 TARGETS += firmware
 TARGETS += ftrace
 TARGETS += kcmp
+TARGETS += kdbus
 TARGETS += memfd
 TARGETS += memory-hotplug
 TARGETS += mount
diff --git a/tools/testing/selftests/kdbus/.gitignore b/tools/testing/selftests/kdbus/.gitignore
new file mode 100644
index 000000000000..4b97beee5f80
--- /dev/null
+++ b/tools/testing/selftests/kdbus/.gitignore
@@ -0,0 +1,11 @@
+*.cmd
+*.ko
+*.mod.c
+modules.order
+Module.symvers
+*.o
+*.swp
+.tmp_versions
+tags
+tools/kdbus-monitor
+test/kdbus-test
diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
new file mode 100644
index 000000000000..dc2376a015eb
--- /dev/null
+++ b/tools/testing/selftests/kdbus/Makefile
@@ -0,0 +1,46 @@
+CFLAGS += -I../../../../usr/include/
+CFLAGS += -I../../../../include/uapi/
+CFLAGS += -std=gnu99
+CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
+LDLIBS = -pthread -lcap
+
+OBJS= \
+	kdbus-enum.o		\
+	kdbus-util.o		\
+	kdbus-test.o		\
+	kdbus-test.o		\
+	test-activator.o	\
+	test-attach-flags.o	\
+	test-benchmark.o	\
+	test-bus.o		\
+	test-chat.o		\
+	test-connection.o	\
+	test-daemon.o		\
+	test-endpoint.o		\
+	test-fd.o		\
+	test-free.o		\
+	test-match.o		\
+	test-message.o		\
+	test-metadata-ns.o	\
+	test-monitor.o		\
+	test-names.o		\
+	test-policy.o		\
+	test-policy-ns.o	\
+	test-policy-priv.o	\
+	test-race.o		\
+	test-sync.o		\
+	test-timeout.o
+
+all: kdbus-test
+
+%.o: %.c
+	gcc $(CFLAGS) -c $< -o $@
+
+kdbus-test: $(OBJS)
+	gcc $(CFLAGS) $^ $(LDLIBS) -o $@
+
+run_tests:
+	./kdbus-test --tap
+
+clean:
+	rm -f *.o kdbus-test
diff --git a/tools/testing/selftests/kdbus/kdbus-enum.c b/tools/testing/selftests/kdbus/kdbus-enum.c
new file mode 100644
index 000000000000..256a991b4f1e
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-enum.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <sys/ioctl.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+struct kdbus_enum_table {
+	long long id;
+	const char *name;
+};
+
+#define TABLE(what) static struct kdbus_enum_table kdbus_table_##what[]
+#define ENUM(_id) { .id = _id, .name = STRINGIFY(_id) }
+#define LOOKUP(what)							\
+	const char *enum_##what(long long id)				\
+	{								\
+		for (size_t i = 0; i < ELEMENTSOF(kdbus_table_##what); i++) \
+			if (id == kdbus_table_##what[i].id)		\
+				return kdbus_table_##what[i].name;	\
+		return "UNKNOWN";					\
+	}
+
+TABLE(CMD) = {
+	ENUM(KDBUS_CMD_BUS_MAKE),
+	ENUM(KDBUS_CMD_ENDPOINT_MAKE),
+	ENUM(KDBUS_CMD_HELLO),
+	ENUM(KDBUS_CMD_SEND),
+	ENUM(KDBUS_CMD_RECV),
+	ENUM(KDBUS_CMD_NAME_LIST),
+	ENUM(KDBUS_CMD_NAME_RELEASE),
+	ENUM(KDBUS_CMD_CONN_INFO),
+	ENUM(KDBUS_CMD_MATCH_ADD),
+	ENUM(KDBUS_CMD_MATCH_REMOVE),
+};
+LOOKUP(CMD);
+
+TABLE(MSG) = {
+	ENUM(_KDBUS_ITEM_NULL),
+	ENUM(KDBUS_ITEM_PAYLOAD_VEC),
+	ENUM(KDBUS_ITEM_PAYLOAD_OFF),
+	ENUM(KDBUS_ITEM_PAYLOAD_MEMFD),
+	ENUM(KDBUS_ITEM_FDS),
+	ENUM(KDBUS_ITEM_BLOOM_PARAMETER),
+	ENUM(KDBUS_ITEM_BLOOM_FILTER),
+	ENUM(KDBUS_ITEM_DST_NAME),
+	ENUM(KDBUS_ITEM_MAKE_NAME),
+	ENUM(KDBUS_ITEM_ATTACH_FLAGS_SEND),
+	ENUM(KDBUS_ITEM_ATTACH_FLAGS_RECV),
+	ENUM(KDBUS_ITEM_ID),
+	ENUM(KDBUS_ITEM_NAME),
+	ENUM(KDBUS_ITEM_TIMESTAMP),
+	ENUM(KDBUS_ITEM_CREDS),
+	ENUM(KDBUS_ITEM_PIDS),
+	ENUM(KDBUS_ITEM_AUXGROUPS),
+	ENUM(KDBUS_ITEM_OWNED_NAME),
+	ENUM(KDBUS_ITEM_TID_COMM),
+	ENUM(KDBUS_ITEM_PID_COMM),
+	ENUM(KDBUS_ITEM_EXE),
+	ENUM(KDBUS_ITEM_CMDLINE),
+	ENUM(KDBUS_ITEM_CGROUP),
+	ENUM(KDBUS_ITEM_CAPS),
+	ENUM(KDBUS_ITEM_SECLABEL),
+	ENUM(KDBUS_ITEM_AUDIT),
+	ENUM(KDBUS_ITEM_CONN_DESCRIPTION),
+	ENUM(KDBUS_ITEM_NAME_ADD),
+	ENUM(KDBUS_ITEM_NAME_REMOVE),
+	ENUM(KDBUS_ITEM_NAME_CHANGE),
+	ENUM(KDBUS_ITEM_ID_ADD),
+	ENUM(KDBUS_ITEM_ID_REMOVE),
+	ENUM(KDBUS_ITEM_REPLY_TIMEOUT),
+	ENUM(KDBUS_ITEM_REPLY_DEAD),
+};
+LOOKUP(MSG);
+
+TABLE(PAYLOAD) = {
+	ENUM(KDBUS_PAYLOAD_KERNEL),
+	ENUM(KDBUS_PAYLOAD_DBUS),
+};
+LOOKUP(PAYLOAD);
diff --git a/tools/testing/selftests/kdbus/kdbus-enum.h b/tools/testing/selftests/kdbus/kdbus-enum.h
new file mode 100644
index 000000000000..110bfd332859
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-enum.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+#pragma once
+
+const char *enum_CMD(long long id);
+const char *enum_MSG(long long id);
+const char *enum_MATCH(long long id);
+const char *enum_PAYLOAD(long long id);
diff --git a/tools/testing/selftests/kdbus/kdbus-test.c b/tools/testing/selftests/kdbus/kdbus-test.c
new file mode 100644
index 000000000000..2d27c5567450
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-test.c
@@ -0,0 +1,920 @@
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <assert.h>
+#include <getopt.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/eventfd.h>
+#include <linux/sched.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+enum {
+	TEST_CREATE_BUS		= 1 << 0,
+	TEST_CREATE_CONN	= 1 << 1,
+};
+
+struct kdbus_test {
+	const char *name;
+	const char *desc;
+	int (*func)(struct kdbus_test_env *env);
+	unsigned int flags;
+};
+
+struct kdbus_test_args {
+	bool mntns;
+	bool pidns;
+	bool userns;
+	char *uid_map;
+	char *gid_map;
+	int loop;
+	int wait;
+	int fork;
+	int tap_output;
+	char *module;
+	char *root;
+	char *test;
+	char *busname;
+	char *mask_param_path;
+};
+
+static const struct kdbus_test tests[] = {
+	{
+		.name	= "bus-make",
+		.desc	= "bus make functions",
+		.func	= kdbus_test_bus_make,
+		.flags	= 0,
+	},
+	{
+		.name	= "hello",
+		.desc	= "the HELLO command",
+		.func	= kdbus_test_hello,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "byebye",
+		.desc	= "the BYEBYE command",
+		.func	= kdbus_test_byebye,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "chat",
+		.desc	= "a chat pattern",
+		.func	= kdbus_test_chat,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "daemon",
+		.desc	= "a simple daemon",
+		.func	= kdbus_test_daemon,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "fd-passing",
+		.desc	= "file descriptor passing",
+		.func	= kdbus_test_fd_passing,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "endpoint",
+		.desc	= "custom endpoint",
+		.func	= kdbus_test_custom_endpoint,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "monitor",
+		.desc	= "monitor functionality",
+		.func	= kdbus_test_monitor,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "name-basics",
+		.desc	= "basic name registry functions",
+		.func	= kdbus_test_name_basic,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "name-conflict",
+		.desc	= "name registry conflict details",
+		.func	= kdbus_test_name_conflict,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "name-queue",
+		.desc	= "queuing of names",
+		.func	= kdbus_test_name_queue,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "message-basic",
+		.desc	= "basic message handling",
+		.func	= kdbus_test_message_basic,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "message-prio",
+		.desc	= "handling of messages with priority",
+		.func	= kdbus_test_message_prio,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "message-quota",
+		.desc	= "message quotas are enforced",
+		.func	= kdbus_test_message_quota,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "timeout",
+		.desc	= "timeout",
+		.func	= kdbus_test_timeout,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "sync-byebye",
+		.desc	= "synchronous replies vs. BYEBYE",
+		.func	= kdbus_test_sync_byebye,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "sync-reply",
+		.desc	= "synchronous replies",
+		.func	= kdbus_test_sync_reply,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "message-free",
+		.desc	= "freeing of memory",
+		.func	= kdbus_test_free,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "connection-info",
+		.desc	= "retrieving connection information",
+		.func	= kdbus_test_conn_info,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "connection-update",
+		.desc	= "updating connection information",
+		.func	= kdbus_test_conn_update,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "writable-pool",
+		.desc	= "verifying pools are never writable",
+		.func	= kdbus_test_writable_pool,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "policy",
+		.desc	= "policy",
+		.func	= kdbus_test_policy,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "policy-priv",
+		.desc	= "unprivileged bus access",
+		.func	= kdbus_test_policy_priv,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "policy-ns",
+		.desc	= "policy in user namespaces",
+		.func	= kdbus_test_policy_ns,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "metadata-ns",
+		.desc	= "metadata in different namespaces",
+		.func	= kdbus_test_metadata_ns,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-id-add",
+		.desc	= "adding of matches by id",
+		.func	= kdbus_test_match_id_add,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-id-remove",
+		.desc	= "removing of matches by id",
+		.func	= kdbus_test_match_id_remove,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-replace",
+		.desc	= "replace of matches with the same cookie",
+		.func	= kdbus_test_match_replace,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-name-add",
+		.desc	= "adding of matches by name",
+		.func	= kdbus_test_match_name_add,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-name-remove",
+		.desc	= "removing of matches by name",
+		.func	= kdbus_test_match_name_remove,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-name-change",
+		.desc	= "matching for name changes",
+		.func	= kdbus_test_match_name_change,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "match-bloom",
+		.desc	= "matching with bloom filters",
+		.func	= kdbus_test_match_bloom,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "activator",
+		.desc	= "activator connections",
+		.func	= kdbus_test_activator,
+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
+	},
+	{
+		.name	= "benchmark",
+		.desc	= "benchmark",
+		.func	= kdbus_test_benchmark,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "benchmark-nomemfds",
+		.desc	= "benchmark without using memfds",
+		.func	= kdbus_test_benchmark_nomemfds,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "race-byebye",
+		.desc	= "race multiple byebyes",
+		.func	= kdbus_test_race_byebye,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		.name	= "race-byebye-match",
+		.desc	= "race byebye vs match removal",
+		.func	= kdbus_test_race_byebye_match,
+		.flags	= TEST_CREATE_BUS,
+	},
+	{
+		/* Last test */
+		.name	= "attach-flags",
+		.desc	= "attach flags mask",
+		.func	= kdbus_test_attach_flags,
+		.flags	= 0,
+	},
+};
+
+#define N_TESTS ((int) (sizeof(tests) / sizeof(tests[0])))
+
+static int test_prepare_env(const struct kdbus_test *t,
+			    const struct kdbus_test_args *args,
+			    struct kdbus_test_env *env)
+{
+	if (t->flags & TEST_CREATE_BUS) {
+		char *s;
+		char *n = NULL;
+		int ret;
+
+		asprintf(&s, "%s/control", args->root);
+
+		env->control_fd = open(s, O_RDWR);
+		free(s);
+		ASSERT_RETURN(env->control_fd >= 0);
+
+		if (!args->busname) {
+			n = unique_name("test-bus");
+			ASSERT_RETURN(n);
+		}
+
+		ret = kdbus_create_bus(env->control_fd,
+				       args->busname ?: n,
+				       _KDBUS_ATTACH_ALL,
+				       _KDBUS_ATTACH_ALL, &s);
+		free(n);
+		ASSERT_RETURN(ret == 0);
+
+		asprintf(&env->buspath, "%s/%s/bus", args->root, s);
+		free(s);
+	}
+
+	if (t->flags & TEST_CREATE_CONN) {
+		env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(env->conn);
+	}
+
+	env->root = args->root;
+	env->module = args->module;
+	env->mask_param_path = args->mask_param_path;
+
+	return 0;
+}
+
+void test_unprepare_env(const struct kdbus_test *t, struct kdbus_test_env *env)
+{
+	if (env->conn) {
+		kdbus_conn_free(env->conn);
+		env->conn = NULL;
+	}
+
+	if (env->control_fd >= 0) {
+		close(env->control_fd);
+		env->control_fd = -1;
+	}
+
+	if (env->buspath) {
+		free(env->buspath);
+		env->buspath = NULL;
+	}
+}
+
+static int test_run(const struct kdbus_test *t,
+		    const struct kdbus_test_args *kdbus_args,
+		    int wait)
+{
+	int ret;
+	struct kdbus_test_env env = {};
+
+	ret = test_prepare_env(t, kdbus_args, &env);
+	if (ret != TEST_OK)
+		return ret;
+
+	if (wait > 0) {
+		printf("Sleeping %d seconds before running test ...\n", wait);
+		sleep(wait);
+	}
+
+	ret = t->func(&env);
+	test_unprepare_env(t, &env);
+	return ret;
+}
+
+static int test_run_forked(const struct kdbus_test *t,
+			   const struct kdbus_test_args *kdbus_args,
+			   int wait)
+{
+	int ret;
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0) {
+		return TEST_ERR;
+	} else if (pid == 0) {
+		ret = test_run(t, kdbus_args, wait);
+		_exit(ret);
+	}
+
+	pid = waitpid(pid, &ret, 0);
+	if (pid <= 0)
+		return TEST_ERR;
+	else if (!WIFEXITED(ret))
+		return TEST_ERR;
+	else
+		return WEXITSTATUS(ret);
+}
+
+static void print_test_result(int ret)
+{
+	switch (ret) {
+	case TEST_OK:
+		printf("OK");
+		break;
+	case TEST_SKIP:
+		printf("SKIPPED");
+		break;
+	case TEST_ERR:
+		printf("ERROR");
+		break;
+	}
+}
+
+static int start_all_tests(struct kdbus_test_args *kdbus_args)
+{
+	int ret;
+	unsigned int fail_cnt = 0;
+	unsigned int skip_cnt = 0;
+	unsigned int ok_cnt = 0;
+	unsigned int i;
+
+	if (kdbus_args->tap_output)
+		printf("1..%d\n", N_TESTS);
+
+	kdbus_util_verbose = false;
+
+	for (i = 0; i < N_TESTS; i++) {
+		const struct kdbus_test *t = tests + i;
+
+		if (!kdbus_args->tap_output) {
+			unsigned int n;
+
+			printf("Testing %s (%s) ", t->desc, t->name);
+			for (n = 0; n < 60 - strlen(t->desc) - strlen(t->name); n++)
+				printf(".");
+			printf(" ");
+		}
+
+		ret = test_run_forked(t, kdbus_args, 0);
+		switch (ret) {
+		case TEST_OK:
+			ok_cnt++;
+			break;
+		case TEST_SKIP:
+			skip_cnt++;
+			break;
+		case TEST_ERR:
+			fail_cnt++;
+			break;
+		}
+
+		if (kdbus_args->tap_output) {
+			printf("%sok %d - %s%s (%s)\n",
+			       (ret != TEST_OK) ? "not " : "", i + 1,
+			       (ret == TEST_SKIP) ? "# SKIP " : "",
+			       t->desc, t->name);
+		} else {
+			print_test_result(ret);
+			printf("\n");
+		}
+	}
+
+	if (kdbus_args->tap_output)
+		printf("Failed %d/%d tests, %.2f%% okay\n", fail_cnt, N_TESTS,
+		       100.0 - (fail_cnt * 100.0) / ((float) N_TESTS));
+	else
+		printf("\nSUMMARY: %u tests passed, %u skipped, %u failed\n",
+		       ok_cnt, skip_cnt, fail_cnt);
+
+	return fail_cnt > 0 ? TEST_ERR : TEST_OK;
+}
+
+static int start_one_test(struct kdbus_test_args *kdbus_args)
+{
+	int i, ret;
+	bool test_found = false;
+
+	for (i = 0; i < N_TESTS; i++) {
+		const struct kdbus_test *t = tests + i;
+
+		if (strcmp(t->name, kdbus_args->test))
+			continue;
+
+		do {
+			test_found = true;
+			if (kdbus_args->fork)
+				ret = test_run_forked(t, kdbus_args,
+						      kdbus_args->wait);
+			else
+				ret = test_run(t, kdbus_args,
+					       kdbus_args->wait);
+
+			printf("Testing %s: ", t->desc);
+			print_test_result(ret);
+			printf("\n");
+
+			if (ret != TEST_OK)
+				break;
+		} while (kdbus_args->loop);
+
+		return ret;
+	}
+
+	if (!test_found) {
+		printf("Unknown test-id '%s'\n", kdbus_args->test);
+		return TEST_ERR;
+	}
+
+	return TEST_OK;
+}
+
+static void usage(const char *argv0)
+{
+	unsigned int i, j;
+
+	printf("Usage: %s [options]\n"
+	       "Options:\n"
+	       "\t-a, --tap		Output test results in TAP format\n"
+	       "\t-m, --module <module>	Kdbus module name\n"
+	       "\t-x, --loop		Run in a loop\n"
+	       "\t-f, --fork		Fork before running a test\n"
+	       "\t-h, --help		Print this help\n"
+	       "\t-r, --root <root>	Toplevel of the kdbus hierarchy\n"
+	       "\t-t, --test <test-id>	Run one specific test only, in verbose mode\n"
+	       "\t-b, --bus <busname>	Instead of generating a random bus name, take <busname>.\n"
+	       "\t-w, --wait <secs>	Wait <secs> before actually starting test\n"
+	       "\t    --mntns		New mount namespace\n"
+	       "\t    --pidns		New PID namespace\n"
+	       "\t    --userns		New user namespace\n"
+	       "\t    --uidmap uid_map	UID map for user namespace\n"
+	       "\t    --gidmap gid_map	GID map for user namespace\n"
+	       "\n", argv0);
+
+	printf("By default, all test are run once, and a summary is printed.\n"
+	       "Available tests for --test:\n\n");
+
+	for (i = 0; i < N_TESTS; i++) {
+		const struct kdbus_test *t = tests + i;
+
+		printf("\t%s", t->name);
+
+		for (j = 0; j < 24 - strlen(t->name); j++)
+			printf(" ");
+
+		printf("Test %s\n", t->desc);
+	}
+
+	printf("\n");
+	printf("Note that some tests may, if run specifically by --test, "
+	       "behave differently, and not terminate by themselves.\n");
+
+	exit(EXIT_FAILURE);
+}
+
+void print_kdbus_test_args(struct kdbus_test_args *args)
+{
+	if (args->userns || args->pidns || args->mntns)
+		printf("# Starting tests in new %s%s%s namespaces%s\n",
+			args->mntns ? "MOUNT " : "",
+			args->pidns ? "PID " : "",
+			args->userns ? "USER " : "",
+			args->mntns ? ", kdbusfs will be remounted" : "");
+	else
+		printf("# Starting tests in the same namespaces\n");
+}
+
+void print_metadata_support(void)
+{
+	bool no_meta_audit, no_meta_cgroups, no_meta_seclabel;
+
+	/*
+	 * KDBUS_ATTACH_CGROUP, KDBUS_ATTACH_AUDIT and
+	 * KDBUS_ATTACH_SECLABEL
+	 */
+	no_meta_audit = !config_auditsyscall_is_enabled();
+	no_meta_cgroups = !config_cgroups_is_enabled();
+	no_meta_seclabel = !config_security_is_enabled();
+
+	if (no_meta_audit | no_meta_cgroups | no_meta_seclabel)
+		printf("# Starting tests without %s%s%s metadata support\n",
+		       no_meta_audit ? "AUDIT " : "",
+		       no_meta_cgroups ? "CGROUP " : "",
+		       no_meta_seclabel ? "SECLABEL " : "");
+	else
+		printf("# Starting tests with full metadata support\n");
+}
+
+int run_tests(struct kdbus_test_args *kdbus_args)
+{
+	int ret;
+	static char control[4096];
+
+	snprintf(control, sizeof(control), "%s/control", kdbus_args->root);
+
+	if (access(control, W_OK) < 0) {
+		printf("Unable to locate control node at '%s'.\n",
+			control);
+		return TEST_ERR;
+	}
+
+	if (kdbus_args->test) {
+		ret = start_one_test(kdbus_args);
+	} else {
+		do {
+			ret = start_all_tests(kdbus_args);
+			if (ret != TEST_OK)
+				break;
+		} while (kdbus_args->loop);
+	}
+
+	return ret;
+}
+
+static void nop_handler(int sig) {}
+
+static int test_prepare_mounts(struct kdbus_test_args *kdbus_args)
+{
+	int ret;
+	char kdbusfs[64] = {'\0'};
+
+	snprintf(kdbusfs, sizeof(kdbusfs), "%sfs", kdbus_args->module);
+
+	/* make current mount slave */
+	ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
+	if (ret < 0) {
+		ret = -errno;
+		printf("error mount() root: %d (%m)\n", ret);
+		return ret;
+	}
+
+	/* Remount procfs since we need it in our tests */
+	if (kdbus_args->pidns) {
+		ret = mount("proc", "/proc", "proc",
+			    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
+		if (ret < 0) {
+			ret = -errno;
+			printf("error mount() /proc : %d (%m)\n", ret);
+			return ret;
+		}
+	}
+
+	/* Remount kdbusfs */
+	ret = mount(kdbusfs, kdbus_args->root, kdbusfs,
+		    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
+	if (ret < 0) {
+		ret = -errno;
+		printf("error mount() %s :%d (%m)\n", kdbusfs, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int run_tests_in_namespaces(struct kdbus_test_args *kdbus_args)
+{
+	int ret;
+	int efd = -1;
+	int status;
+	pid_t pid, rpid;
+	struct sigaction oldsa;
+	struct sigaction sa = {
+		.sa_handler = nop_handler,
+		.sa_flags = SA_NOCLDSTOP,
+	};
+
+	efd = eventfd(0, EFD_CLOEXEC);
+	if (efd < 0) {
+		ret = -errno;
+		printf("eventfd() failed: %d (%m)\n", ret);
+		return TEST_ERR;
+	}
+
+	ret = sigaction(SIGCHLD, &sa, &oldsa);
+	if (ret < 0) {
+		ret = -errno;
+		printf("sigaction() failed: %d (%m)\n", ret);
+		return TEST_ERR;
+	}
+
+	/* setup namespaces */
+	pid = syscall(__NR_clone, SIGCHLD|
+		      (kdbus_args->userns ? CLONE_NEWUSER : 0) |
+		      (kdbus_args->mntns ? CLONE_NEWNS : 0) |
+		      (kdbus_args->pidns ? CLONE_NEWPID : 0), NULL);
+	if (pid < 0) {
+		printf("clone() failed: %d (%m)\n", -errno);
+		return TEST_ERR;
+	}
+
+	if (pid == 0) {
+		eventfd_t event_status = 0;
+
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (ret < 0) {
+			ret = -errno;
+			printf("error prctl(): %d (%m)\n", ret);
+			_exit(TEST_ERR);
+		}
+
+		/* reset sighandlers of childs */
+		ret = sigaction(SIGCHLD, &oldsa, NULL);
+		if (ret < 0) {
+			ret = -errno;
+			printf("sigaction() failed: %d (%m)\n", ret);
+			_exit(TEST_ERR);
+		}
+
+		ret = eventfd_read(efd, &event_status);
+		if (ret < 0 || event_status != 1) {
+			printf("error eventfd_read()\n");
+			_exit(TEST_ERR);
+		}
+
+		if (kdbus_args->mntns) {
+			ret = test_prepare_mounts(kdbus_args);
+			if (ret < 0) {
+				printf("error preparing mounts\n");
+				_exit(TEST_ERR);
+			}
+		}
+
+		ret = run_tests(kdbus_args);
+		_exit(ret);
+	}
+
+	/* Setup userns mapping */
+	if (kdbus_args->userns) {
+		ret = userns_map_uid_gid(pid, kdbus_args->uid_map,
+					 kdbus_args->gid_map);
+		if (ret < 0) {
+			printf("error mapping uid and gid in userns\n");
+			eventfd_write(efd, 2);
+			return TEST_ERR;
+		}
+	}
+
+	ret = eventfd_write(efd, 1);
+	if (ret < 0) {
+		ret = -errno;
+		printf("error eventfd_write(): %d (%m)\n", ret);
+		return TEST_ERR;
+	}
+
+	rpid = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(rpid == pid, TEST_ERR);
+
+	close(efd);
+
+	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
+		return TEST_ERR;
+
+	return TEST_OK;
+}
+
+int start_tests(struct kdbus_test_args *kdbus_args)
+{
+	int ret;
+	bool namespaces;
+	uint64_t kdbus_param_mask;
+	static char fspath[4096], parampath[4096];
+
+	namespaces = (kdbus_args->mntns || kdbus_args->pidns ||
+		      kdbus_args->userns);
+
+	/* for pidns we need mntns set */
+	if (kdbus_args->pidns && !kdbus_args->mntns) {
+		printf("Failed: please set both pid and mnt namesapces\n");
+		return TEST_ERR;
+	}
+
+	if (kdbus_args->userns) {
+		if (!config_user_ns_is_enabled()) {
+			printf("User namespace not supported\n");
+			return TEST_ERR;
+		}
+
+		if (!kdbus_args->uid_map || !kdbus_args->gid_map) {
+			printf("Failed: please specify uid or gid mapping\n");
+			return TEST_ERR;
+		}
+	}
+
+	print_kdbus_test_args(kdbus_args);
+	print_metadata_support();
+
+	/* setup kdbus paths */
+	if (!kdbus_args->module)
+		kdbus_args->module = "kdbus";
+
+	if (!kdbus_args->root) {
+		snprintf(fspath, sizeof(fspath), "/sys/fs/%s",
+			 kdbus_args->module);
+		kdbus_args->root = fspath;
+	}
+
+	snprintf(parampath, sizeof(parampath),
+		 "/sys/module/%s/parameters/attach_flags_mask",
+		 kdbus_args->module);
+	kdbus_args->mask_param_path = parampath;
+
+	ret = kdbus_sysfs_get_parameter_mask(kdbus_args->mask_param_path,
+					     &kdbus_param_mask);
+	if (ret < 0)
+		return TEST_ERR;
+
+	printf("# Starting tests with an attach_flags_mask=0x%llx\n",
+		(unsigned long long)kdbus_param_mask);
+
+	/* Start tests */
+	if (namespaces)
+		ret = run_tests_in_namespaces(kdbus_args);
+	else
+		ret = run_tests(kdbus_args);
+
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	int t, ret = 0;
+	struct kdbus_test_args *kdbus_args;
+	enum {
+		ARG_MNTNS = 0x100,
+		ARG_PIDNS,
+		ARG_USERNS,
+		ARG_UIDMAP,
+		ARG_GIDMAP,
+	};
+
+	kdbus_args = malloc(sizeof(*kdbus_args));
+	if (!kdbus_args) {
+		printf("unable to malloc() kdbus_args\n");
+		return EXIT_FAILURE;
+	}
+
+	memset(kdbus_args, 0, sizeof(*kdbus_args));
+
+	static const struct option options[] = {
+		{ "loop",	no_argument,		NULL, 'x' },
+		{ "help",	no_argument,		NULL, 'h' },
+		{ "root",	required_argument,	NULL, 'r' },
+		{ "test",	required_argument,	NULL, 't' },
+		{ "bus",	required_argument,	NULL, 'b' },
+		{ "wait",	required_argument,	NULL, 'w' },
+		{ "fork",	no_argument,		NULL, 'f' },
+		{ "module",	required_argument,	NULL, 'm' },
+		{ "tap",	no_argument,		NULL, 'a' },
+		{ "mntns",	no_argument,		NULL, ARG_MNTNS },
+		{ "pidns",	no_argument,		NULL, ARG_PIDNS },
+		{ "userns",	no_argument,		NULL, ARG_USERNS },
+		{ "uidmap",	required_argument,	NULL, ARG_UIDMAP },
+		{ "gidmap",	required_argument,	NULL, ARG_GIDMAP },
+		{}
+	};
+
+	srand(time(NULL));
+
+	while ((t = getopt_long(argc, argv, "hxfm:r:t:b:w:a", options, NULL)) >= 0) {
+		switch (t) {
+		case 'x':
+			kdbus_args->loop = 1;
+			break;
+
+		case 'm':
+			kdbus_args->module = optarg;
+			break;
+
+		case 'r':
+			kdbus_args->root = optarg;
+			break;
+
+		case 't':
+			kdbus_args->test = optarg;
+			break;
+
+		case 'b':
+			kdbus_args->busname = optarg;
+			break;
+
+		case 'w':
+			kdbus_args->wait = strtol(optarg, NULL, 10);
+			break;
+
+		case 'f':
+			kdbus_args->fork = 1;
+			break;
+
+		case 'a':
+			kdbus_args->tap_output = 1;
+			break;
+
+		case ARG_MNTNS:
+			kdbus_args->mntns = true;
+			break;
+
+		case ARG_PIDNS:
+			kdbus_args->pidns = true;
+			break;
+
+		case ARG_USERNS:
+			kdbus_args->userns = true;
+			break;
+
+		case ARG_UIDMAP:
+			kdbus_args->uid_map = optarg;
+			break;
+
+		case ARG_GIDMAP:
+			kdbus_args->gid_map = optarg;
+			break;
+
+		default:
+		case 'h':
+			usage(argv[0]);
+		}
+	}
+
+	ret = start_tests(kdbus_args);
+	if (ret == TEST_ERR)
+		return EXIT_FAILURE;
+
+	free(kdbus_args);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/kdbus/kdbus-test.h b/tools/testing/selftests/kdbus/kdbus-test.h
new file mode 100644
index 000000000000..ce8c5836f65e
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-test.h
@@ -0,0 +1,85 @@
+#ifndef _TEST_KDBUS_H_
+#define _TEST_KDBUS_H_
+
+struct kdbus_test_env {
+	char *buspath;
+	const char *root;
+	const char *module;
+	const char *mask_param_path;
+	int control_fd;
+	struct kdbus_conn *conn;
+};
+
+enum {
+	TEST_OK,
+	TEST_SKIP,
+	TEST_ERR,
+};
+
+#define ASSERT_RETURN_VAL(cond, val)		\
+	if (!(cond)) {			\
+		fprintf(stderr,	"Assertion '%s' failed in %s(), %s:%d\n", \
+			#cond, __func__, __FILE__, __LINE__);	\
+		return val;	\
+	}
+
+#define ASSERT_EXIT_VAL(cond, val)		\
+	if (!(cond)) {			\
+		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
+			#cond, __func__, __FILE__, __LINE__);	\
+		_exit(val);	\
+	}
+
+#define ASSERT_BREAK(cond)		\
+	if (!(cond)) {			\
+		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
+			#cond, __func__, __FILE__, __LINE__);	\
+		break; \
+	}
+
+#define ASSERT_RETURN(cond)		\
+	ASSERT_RETURN_VAL(cond, TEST_ERR)
+
+#define ASSERT_EXIT(cond)		\
+	ASSERT_EXIT_VAL(cond, EXIT_FAILURE)
+
+int kdbus_test_activator(struct kdbus_test_env *env);
+int kdbus_test_attach_flags(struct kdbus_test_env *env);
+int kdbus_test_benchmark(struct kdbus_test_env *env);
+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env);
+int kdbus_test_bus_make(struct kdbus_test_env *env);
+int kdbus_test_byebye(struct kdbus_test_env *env);
+int kdbus_test_chat(struct kdbus_test_env *env);
+int kdbus_test_conn_info(struct kdbus_test_env *env);
+int kdbus_test_conn_update(struct kdbus_test_env *env);
+int kdbus_test_daemon(struct kdbus_test_env *env);
+int kdbus_test_custom_endpoint(struct kdbus_test_env *env);
+int kdbus_test_fd_passing(struct kdbus_test_env *env);
+int kdbus_test_free(struct kdbus_test_env *env);
+int kdbus_test_hello(struct kdbus_test_env *env);
+int kdbus_test_match_bloom(struct kdbus_test_env *env);
+int kdbus_test_match_id_add(struct kdbus_test_env *env);
+int kdbus_test_match_id_remove(struct kdbus_test_env *env);
+int kdbus_test_match_replace(struct kdbus_test_env *env);
+int kdbus_test_match_name_add(struct kdbus_test_env *env);
+int kdbus_test_match_name_change(struct kdbus_test_env *env);
+int kdbus_test_match_name_remove(struct kdbus_test_env *env);
+int kdbus_test_message_basic(struct kdbus_test_env *env);
+int kdbus_test_message_prio(struct kdbus_test_env *env);
+int kdbus_test_message_quota(struct kdbus_test_env *env);
+int kdbus_test_metadata_ns(struct kdbus_test_env *env);
+int kdbus_test_monitor(struct kdbus_test_env *env);
+int kdbus_test_name_basic(struct kdbus_test_env *env);
+int kdbus_test_name_conflict(struct kdbus_test_env *env);
+int kdbus_test_name_queue(struct kdbus_test_env *env);
+int kdbus_test_policy(struct kdbus_test_env *env);
+int kdbus_test_policy_ns(struct kdbus_test_env *env);
+int kdbus_test_policy_priv(struct kdbus_test_env *env);
+int kdbus_test_race_byebye(struct kdbus_test_env *env);
+int kdbus_test_race_byebye_match(struct kdbus_test_env *env);
+int kdbus_test_sync_byebye(struct kdbus_test_env *env);
+int kdbus_test_sync_reply(struct kdbus_test_env *env);
+int kdbus_test_timeout(struct kdbus_test_env *env);
+int kdbus_test_writable_pool(struct kdbus_test_env *env);
+
+#endif /* _TEST_KDBUS_H_ */
diff --git a/tools/testing/selftests/kdbus/kdbus-util.c b/tools/testing/selftests/kdbus/kdbus-util.c
new file mode 100644
index 000000000000..8c69d2e651ce
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-util.c
@@ -0,0 +1,1646 @@
+/*
+ * Copyright (C) 2013-2014 Daniel Mack
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <time.h>
+#include <inttypes.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <grp.h>
+#include <sys/capability.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <linux/unistd.h>
+#include <linux/memfd.h>
+
+#ifndef __NR_memfd_create
+  #ifdef __x86_64__
+    #define __NR_memfd_create 319
+  #elif defined __arm__
+    #define __NR_memfd_create 385
+  #else
+    #define __NR_memfd_create 356
+  #endif
+#endif
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+#ifndef F_ADD_SEALS
+#define F_LINUX_SPECIFIC_BASE	1024
+#define F_ADD_SEALS     (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS     (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL     0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW     0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE    0x0008  /* prevent writes */
+#endif
+
+int kdbus_util_verbose = true;
+
+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask)
+{
+	int ret;
+	FILE *file;
+	unsigned long long value;
+
+	file = fopen(path, "r");
+	if (!file) {
+		ret = -errno;
+		kdbus_printf("--- error fopen(): %d (%m)\n", ret);
+		return ret;
+	}
+
+	ret = fscanf(file, "%llu", &value);
+	if (ret != 1) {
+		if (ferror(file))
+			ret = -errno;
+		else
+			ret = -EIO;
+
+		kdbus_printf("--- error fscanf(): %d\n", ret);
+		fclose(file);
+		return ret;
+	}
+
+	*mask = (uint64_t)value;
+
+	fclose(file);
+
+	return 0;
+}
+
+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask)
+{
+	int ret;
+	FILE *file;
+
+	file = fopen(path, "w");
+	if (!file) {
+		ret = -errno;
+		kdbus_printf("--- error open(): %d (%m)\n", ret);
+		return ret;
+	}
+
+	ret = fprintf(file, "%llu", (unsigned long long)mask);
+	if (ret <= 0) {
+		ret = -EIO;
+		kdbus_printf("--- error fprintf(): %d\n", ret);
+	}
+
+	fclose(file);
+
+	return ret > 0 ? 0 : ret;
+}
+
+int kdbus_create_bus(int control_fd, const char *name,
+		     uint64_t req_meta, uint64_t owner_meta,
+		     char **path)
+{
+	struct {
+		struct kdbus_cmd_make head;
+
+		/* bloom size item */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_bloom_parameter bloom;
+		} bp;
+
+		/* required and owner metadata items */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			uint64_t flags;
+		} attach[2];
+
+		/* name item */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			char str[64];
+		} name;
+	} bus_make;
+	int ret;
+
+	memset(&bus_make, 0, sizeof(bus_make));
+	bus_make.bp.size = sizeof(bus_make.bp);
+	bus_make.bp.type = KDBUS_ITEM_BLOOM_PARAMETER;
+	bus_make.bp.bloom.size = 64;
+	bus_make.bp.bloom.n_hash = 1;
+
+	snprintf(bus_make.name.str, sizeof(bus_make.name.str),
+		 "%u-%s", getuid(), name);
+
+	bus_make.attach[0].type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
+	bus_make.attach[0].size = sizeof(bus_make.attach[0]);
+	bus_make.attach[0].flags = req_meta;
+
+	bus_make.attach[1].type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
+	bus_make.attach[1].size = sizeof(bus_make.attach[0]);
+	bus_make.attach[1].flags = owner_meta;
+
+	bus_make.name.type = KDBUS_ITEM_MAKE_NAME;
+	bus_make.name.size = KDBUS_ITEM_HEADER_SIZE +
+			     strlen(bus_make.name.str) + 1;
+
+	bus_make.head.flags = KDBUS_MAKE_ACCESS_WORLD;
+	bus_make.head.size = sizeof(bus_make.head) +
+			     bus_make.bp.size +
+			     bus_make.attach[0].size +
+			     bus_make.attach[1].size +
+			     bus_make.name.size;
+
+	kdbus_printf("Creating bus with name >%s< on control fd %d ...\n",
+		     name, control_fd);
+
+	ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error when making bus: %d (%m)\n", ret);
+		return ret;
+	}
+
+	if (ret == 0 && path)
+		*path = strdup(bus_make.name.str);
+
+	return ret;
+}
+
+struct kdbus_conn *
+kdbus_hello(const char *path, uint64_t flags,
+	    const struct kdbus_item *item, size_t item_size)
+{
+	struct kdbus_cmd_free cmd_free = {};
+	int fd, ret;
+	struct {
+		struct kdbus_cmd_hello hello;
+
+		struct {
+			uint64_t size;
+			uint64_t type;
+			char str[16];
+		} conn_name;
+
+		uint8_t extra_items[item_size];
+	} h;
+	struct kdbus_conn *conn;
+
+	memset(&h, 0, sizeof(h));
+
+	if (item_size > 0)
+		memcpy(h.extra_items, item, item_size);
+
+	kdbus_printf("-- opening bus connection %s\n", path);
+	fd = open(path, O_RDWR|O_CLOEXEC);
+	if (fd < 0) {
+		kdbus_printf("--- error %d (%m)\n", fd);
+		return NULL;
+	}
+
+	h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
+	h.hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	h.hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
+	h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
+	strcpy(h.conn_name.str, "this-is-my-name");
+	h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
+
+	h.hello.size = sizeof(h);
+	h.hello.pool_size = POOL_SIZE;
+
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &h.hello);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
+		return NULL;
+	}
+	kdbus_printf("-- Our peer ID for %s: %llu -- bus uuid: '%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x'\n",
+		     path, (unsigned long long)h.hello.id,
+		     h.hello.id128[0],  h.hello.id128[1],  h.hello.id128[2],
+		     h.hello.id128[3],  h.hello.id128[4],  h.hello.id128[5],
+		     h.hello.id128[6],  h.hello.id128[7],  h.hello.id128[8],
+		     h.hello.id128[9],  h.hello.id128[10], h.hello.id128[11],
+		     h.hello.id128[12], h.hello.id128[13], h.hello.id128[14],
+		     h.hello.id128[15]);
+
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = h.hello.offset;
+	ioctl(fd, KDBUS_CMD_FREE, &cmd_free);
+
+	conn = malloc(sizeof(*conn));
+	if (!conn) {
+		kdbus_printf("unable to malloc()!?\n");
+		return NULL;
+	}
+
+	conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
+	if (conn->buf == MAP_FAILED) {
+		free(conn);
+		close(fd);
+		kdbus_printf("--- error mmap (%m)\n");
+		return NULL;
+	}
+
+	conn->fd = fd;
+	conn->id = h.hello.id;
+	return conn;
+}
+
+struct kdbus_conn *
+kdbus_hello_registrar(const char *path, const char *name,
+		      const struct kdbus_policy_access *access,
+		      size_t num_access, uint64_t flags)
+{
+	struct kdbus_item *item, *items;
+	size_t i, size;
+
+	size = KDBUS_ITEM_SIZE(strlen(name) + 1) +
+		num_access * KDBUS_ITEM_SIZE(sizeof(*access));
+
+	items = alloca(size);
+
+	item = items;
+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+	item->type = KDBUS_ITEM_NAME;
+	strcpy(item->str, name);
+	item = KDBUS_ITEM_NEXT(item);
+
+	for (i = 0; i < num_access; i++) {
+		item->size = KDBUS_ITEM_HEADER_SIZE +
+			     sizeof(struct kdbus_policy_access);
+		item->type = KDBUS_ITEM_POLICY_ACCESS;
+
+		item->policy_access.type = access[i].type;
+		item->policy_access.access = access[i].access;
+		item->policy_access.id = access[i].id;
+
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	return kdbus_hello(path, flags, items, size);
+}
+
+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
+				   const struct kdbus_policy_access *access,
+				   size_t num_access)
+{
+	return kdbus_hello_registrar(path, name, access, num_access,
+				     KDBUS_HELLO_ACTIVATOR);
+}
+
+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type)
+{
+	const struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, msg, items)
+		if (item->type == type)
+			return true;
+
+	return false;
+}
+
+int kdbus_bus_creator_info(struct kdbus_conn *conn,
+			   uint64_t flags,
+			   uint64_t *offset)
+{
+	struct kdbus_cmd_info *cmd;
+	size_t size = sizeof(*cmd);
+	int ret;
+
+	cmd = alloca(size);
+	memset(cmd, 0, size);
+	cmd->size = size;
+	cmd->flags = flags;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
+		return ret;
+	}
+
+	if (offset)
+		*offset = cmd->offset;
+	else
+		kdbus_free(conn, cmd->offset);
+
+	return 0;
+}
+
+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
+		    const char *name, uint64_t flags,
+		    uint64_t *offset)
+{
+	struct kdbus_cmd_info *cmd;
+	size_t size = sizeof(*cmd);
+	struct kdbus_info *info;
+	int ret;
+
+	if (name)
+		size += KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+
+	cmd = alloca(size);
+	memset(cmd, 0, size);
+	cmd->size = size;
+	cmd->flags = flags;
+
+	if (name) {
+		cmd->items[0].size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+		cmd->items[0].type = KDBUS_ITEM_NAME;
+		strcpy(cmd->items[0].str, name);
+	} else {
+		cmd->id = id;
+	}
+
+	ret = ioctl(conn->fd, KDBUS_CMD_CONN_INFO, cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
+		return ret;
+	}
+
+	info = (struct kdbus_info *) (conn->buf + cmd->offset);
+	if (info->size != cmd->info_size) {
+		kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
+				(int) info->size, (int) cmd->info_size);
+		return -EIO;
+	}
+
+	if (offset)
+		*offset = cmd->offset;
+	else
+		kdbus_free(conn, cmd->offset);
+
+	return 0;
+}
+
+void kdbus_conn_free(struct kdbus_conn *conn)
+{
+	if (!conn)
+		return;
+
+	if (conn->buf)
+		munmap(conn->buf, POOL_SIZE);
+
+	if (conn->fd >= 0)
+		close(conn->fd);
+
+	free(conn);
+}
+
+int sys_memfd_create(const char *name, __u64 size)
+{
+	int ret, fd;
+
+	ret = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
+	if (ret < 0)
+		return ret;
+
+	fd = ret;
+
+	ret = ftruncate(fd, size);
+	if (ret < 0) {
+		close(fd);
+		return ret;
+	}
+
+	return fd;
+}
+
+int sys_memfd_seal_set(int fd)
+{
+	return fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK |
+			 F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL);
+}
+
+off_t sys_memfd_get_size(int fd, off_t *size)
+{
+	struct stat stat;
+	int ret;
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		kdbus_printf("stat() failed: %m\n");
+		return ret;
+	}
+
+	*size = stat.st_size;
+	return 0;
+}
+
+static int __kdbus_msg_send(const struct kdbus_conn *conn,
+			    const char *name,
+			    uint64_t cookie,
+			    uint64_t flags,
+			    uint64_t timeout,
+			    int64_t priority,
+			    uint64_t dst_id,
+			    uint64_t cmd_flags,
+			    int cancel_fd)
+{
+	struct kdbus_cmd_send *cmd;
+	struct kdbus_msg *msg;
+	const char ref1[1024 * 128 + 3] = "0123456789_0";
+	const char ref2[] = "0123456789_1";
+	struct kdbus_item *item;
+	struct timespec now;
+	uint64_t size;
+	int memfd = -1;
+	int ret;
+
+	size = sizeof(*msg);
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST)
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+	else {
+		memfd = sys_memfd_create("my-name-is-nice", 1024 * 1024);
+		if (memfd < 0) {
+			kdbus_printf("failed to create memfd: %m\n");
+			return memfd;
+		}
+
+		if (write(memfd, "kdbus memfd 1234567", 19) != 19) {
+			ret = -errno;
+			kdbus_printf("writing to memfd failed: %m\n");
+			return ret;
+		}
+
+		ret = sys_memfd_seal_set(memfd);
+		if (ret < 0) {
+			ret = -errno;
+			kdbus_printf("memfd sealing failed: %m\n");
+			return ret;
+		}
+
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+	}
+
+	if (name)
+		size += KDBUS_ITEM_SIZE(strlen(name) + 1);
+
+	msg = malloc(size);
+	if (!msg) {
+		ret = -errno;
+		kdbus_printf("unable to malloc()!?\n");
+		return ret;
+	}
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST)
+		flags |= KDBUS_MSG_SIGNAL;
+
+	memset(msg, 0, size);
+	msg->flags = flags;
+	msg->priority = priority;
+	msg->size = size;
+	msg->src_id = conn->id;
+	msg->dst_id = name ? 0 : dst_id;
+	msg->cookie = cookie;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+
+	if (timeout) {
+		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
+		if (ret < 0)
+			return ret;
+
+		msg->timeout_ns = now.tv_sec * 1000000000ULL +
+				  now.tv_nsec + timeout;
+	}
+
+	item = msg->items;
+
+	if (name) {
+		item->type = KDBUS_ITEM_DST_NAME;
+		item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+		strcpy(item->str, name);
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t)&ref1;
+	item->vec.size = sizeof(ref1);
+	item = KDBUS_ITEM_NEXT(item);
+
+	/* data padding for ref1 */
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t)NULL;
+	item->vec.size =  KDBUS_ALIGN8(sizeof(ref1)) - sizeof(ref1);
+	item = KDBUS_ITEM_NEXT(item);
+
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t)&ref2;
+	item->vec.size = sizeof(ref2);
+	item = KDBUS_ITEM_NEXT(item);
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
+		item->type = KDBUS_ITEM_BLOOM_FILTER;
+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+		item->bloom_filter.generation = 0;
+	} else {
+		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
+		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
+		item->memfd.size = 16;
+		item->memfd.fd = memfd;
+	}
+	item = KDBUS_ITEM_NEXT(item);
+
+	size = sizeof(*cmd);
+	if (cancel_fd != -1)
+		size += KDBUS_ITEM_SIZE(sizeof(cancel_fd));
+
+	cmd = malloc(size);
+	cmd->size = size;
+	cmd->flags = cmd_flags;
+	cmd->msg_address = (uintptr_t)msg;
+
+	item = cmd->items;
+
+	if (cancel_fd != -1) {
+		item->type = KDBUS_ITEM_CANCEL_FD;
+		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(cancel_fd);
+		item->fds[0] = cancel_fd;
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, cmd);
+	if (memfd >= 0)
+		close(memfd);
+
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+		return ret;
+	}
+
+	if (cmd_flags & KDBUS_SEND_SYNC_REPLY) {
+		struct kdbus_msg *reply;
+
+		kdbus_printf("SYNC REPLY @offset %llu:\n", cmd->reply.offset);
+		reply = (struct kdbus_msg *)(conn->buf + cmd->reply.offset);
+		kdbus_msg_dump(conn, reply);
+
+		kdbus_msg_free(reply);
+
+		ret = kdbus_free(conn, cmd->reply.offset);
+		if (ret < 0)
+			return ret;
+	}
+
+	free(msg);
+	free(cmd);
+
+	return 0;
+}
+
+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
+		   uint64_t cookie, uint64_t flags, uint64_t timeout,
+		   int64_t priority, uint64_t dst_id)
+{
+	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
+				dst_id, 0, -1);
+}
+
+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
+			uint64_t cookie, uint64_t flags, uint64_t timeout,
+			int64_t priority, uint64_t dst_id, int cancel_fd)
+{
+	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
+				dst_id, KDBUS_SEND_SYNC_REPLY, cancel_fd);
+}
+
+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
+			 uint64_t reply_cookie,
+			 uint64_t dst_id)
+{
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_msg *msg;
+	const char ref1[1024 * 128 + 3] = "0123456789_0";
+	struct kdbus_item *item;
+	uint64_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_msg);
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+
+	msg = malloc(size);
+	if (!msg) {
+		ret = -errno;
+		kdbus_printf("unable to malloc()!?\n");
+		return ret;
+	}
+
+	memset(msg, 0, size);
+	msg->size = size;
+	msg->src_id = conn->id;
+	msg->dst_id = dst_id;
+	msg->cookie_reply = reply_cookie;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+
+	item = msg->items;
+
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t)&ref1;
+	item->vec.size = sizeof(ref1);
+	item = KDBUS_ITEM_NEXT(item);
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+	}
+
+	free(msg);
+
+	return ret;
+}
+static char *msg_id(uint64_t id, char *buf)
+{
+	if (id == 0)
+		return "KERNEL";
+	if (id == ~0ULL)
+		return "BROADCAST";
+	sprintf(buf, "%llu", (unsigned long long)id);
+	return buf;
+}
+
+int kdbus_msg_dump(const struct kdbus_conn *conn, const struct kdbus_msg *msg)
+{
+	const struct kdbus_item *item = msg->items;
+	char buf_src[32];
+	char buf_dst[32];
+	uint64_t timeout = 0;
+	uint64_t cookie_reply = 0;
+	int ret = 0;
+
+	if (msg->flags & KDBUS_MSG_EXPECT_REPLY)
+		timeout = msg->timeout_ns;
+	else
+		cookie_reply = msg->cookie_reply;
+
+	kdbus_printf("MESSAGE: %s (%llu bytes) flags=0x%08llx, %s → %s, "
+		     "cookie=%llu, timeout=%llu cookie_reply=%llu priority=%lli\n",
+		enum_PAYLOAD(msg->payload_type), (unsigned long long)msg->size,
+		(unsigned long long)msg->flags,
+		msg_id(msg->src_id, buf_src), msg_id(msg->dst_id, buf_dst),
+		(unsigned long long)msg->cookie, (unsigned long long)timeout,
+		(unsigned long long)cookie_reply, (long long)msg->priority);
+
+	KDBUS_ITEM_FOREACH(item, msg, items) {
+		if (item->size < KDBUS_ITEM_HEADER_SIZE) {
+			kdbus_printf("  +%s (%llu bytes) invalid data record\n",
+				     enum_MSG(item->type), item->size);
+			ret = -EINVAL;
+			break;
+		}
+
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_OFF: {
+			char *s;
+
+			if (item->vec.offset == ~0ULL)
+				s = "[\\0-bytes]";
+			else
+				s = (char *)conn->buf + item->vec.offset;
+
+			kdbus_printf("  +%s (%llu bytes) off=%llu size=%llu '%s'\n",
+			       enum_MSG(item->type), item->size,
+			       (unsigned long long)item->vec.offset,
+			       (unsigned long long)item->vec.size, s);
+			break;
+		}
+
+		case KDBUS_ITEM_FDS: {
+			int i, n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
+					sizeof(int);
+
+			kdbus_printf("  +%s (%llu bytes, %d fds)\n",
+			       enum_MSG(item->type), item->size, n);
+
+			for (i = 0; i < n; i++)
+				kdbus_printf("    fd[%d] = %d\n",
+					     i, item->fds[i]);
+
+			break;
+		}
+
+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
+			char *buf;
+			off_t size;
+
+			buf = mmap(NULL, item->memfd.size, PROT_READ,
+				   MAP_SHARED, item->memfd.fd, 0);
+			if (buf == MAP_FAILED) {
+				kdbus_printf("mmap() fd=%i size=%llu failed: %m\n",
+					     item->memfd.fd, item->memfd.size);
+				break;
+			}
+
+			if (sys_memfd_get_size(item->memfd.fd, &size) < 0) {
+				kdbus_printf("KDBUS_CMD_MEMFD_SIZE_GET failed: %m\n");
+				break;
+			}
+
+			kdbus_printf("  +%s (%llu bytes) fd=%i size=%llu filesize=%llu '%s'\n",
+			       enum_MSG(item->type), item->size, item->memfd.fd,
+			       (unsigned long long)item->memfd.size,
+			       (unsigned long long)size, buf);
+			munmap(buf, item->memfd.size);
+			break;
+		}
+
+		case KDBUS_ITEM_CREDS:
+			kdbus_printf("  +%s (%llu bytes) uid=%d, euid=%d, suid=%d, fsuid=%d, "
+							"gid=%d, egid=%d, sgid=%d, fsgid=%d\n",
+				enum_MSG(item->type), item->size,
+				item->creds.uid, item->creds.euid,
+				item->creds.suid, item->creds.fsuid,
+				item->creds.gid, item->creds.egid,
+				item->creds.sgid, item->creds.fsgid);
+			break;
+
+		case KDBUS_ITEM_PIDS:
+			kdbus_printf("  +%s (%llu bytes) pid=%lld, tid=%lld, ppid=%lld\n",
+				enum_MSG(item->type), item->size,
+				item->pids.pid, item->pids.tid,
+				item->pids.ppid);
+			break;
+
+		case KDBUS_ITEM_AUXGROUPS: {
+			int i, n;
+
+			kdbus_printf("  +%s (%llu bytes)\n",
+				     enum_MSG(item->type), item->size);
+			n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
+				sizeof(uint32_t);
+
+			for (i = 0; i < n; i++)
+				kdbus_printf("    gid[%d] = %d\n",
+					     i, item->data32[i]);
+			break;
+		}
+
+		case KDBUS_ITEM_NAME:
+		case KDBUS_ITEM_PID_COMM:
+		case KDBUS_ITEM_TID_COMM:
+		case KDBUS_ITEM_EXE:
+		case KDBUS_ITEM_CGROUP:
+		case KDBUS_ITEM_SECLABEL:
+		case KDBUS_ITEM_DST_NAME:
+		case KDBUS_ITEM_CONN_DESCRIPTION:
+			kdbus_printf("  +%s (%llu bytes) '%s' (%zu)\n",
+				     enum_MSG(item->type), item->size,
+				     item->str, strlen(item->str));
+			break;
+
+		case KDBUS_ITEM_OWNED_NAME: {
+			kdbus_printf("  +%s (%llu bytes) '%s' (%zu) flags=0x%08llx\n",
+				     enum_MSG(item->type), item->size,
+				     item->name.name, strlen(item->name.name),
+				     item->name.flags);
+			break;
+		}
+
+		case KDBUS_ITEM_CMDLINE: {
+			size_t size = item->size - KDBUS_ITEM_HEADER_SIZE;
+			const char *str = item->str;
+			int count = 0;
+
+			kdbus_printf("  +%s (%llu bytes) ",
+				     enum_MSG(item->type), item->size);
+			while (size) {
+				kdbus_printf("'%s' ", str);
+				size -= strlen(str) + 1;
+				str += strlen(str) + 1;
+				count++;
+			}
+
+			kdbus_printf("(%d string%s)\n",
+				     count, (count == 1) ? "" : "s");
+			break;
+		}
+
+		case KDBUS_ITEM_AUDIT:
+			kdbus_printf("  +%s (%llu bytes) loginuid=%u sessionid=%u\n",
+			       enum_MSG(item->type), item->size,
+			       item->audit.loginuid, item->audit.sessionid);
+			break;
+
+		case KDBUS_ITEM_CAPS: {
+			const uint32_t *cap;
+			int n, i;
+
+			kdbus_printf("  +%s (%llu bytes) len=%llu bytes, last_cap %d\n",
+				     enum_MSG(item->type), item->size,
+				     (unsigned long long)item->size -
+					KDBUS_ITEM_HEADER_SIZE,
+				     (int) item->caps.last_cap);
+
+			cap = item->caps.caps;
+			n = (item->size - offsetof(struct kdbus_item, caps.caps))
+				/ 4 / sizeof(uint32_t);
+
+			kdbus_printf("    CapInh=");
+			for (i = 0; i < n; i++)
+				kdbus_printf("%08x", cap[(0 * n) + (n - i - 1)]);
+
+			kdbus_printf(" CapPrm=");
+			for (i = 0; i < n; i++)
+				kdbus_printf("%08x", cap[(1 * n) + (n - i - 1)]);
+
+			kdbus_printf(" CapEff=");
+			for (i = 0; i < n; i++)
+				kdbus_printf("%08x", cap[(2 * n) + (n - i - 1)]);
+
+			kdbus_printf(" CapBnd=");
+			for (i = 0; i < n; i++)
+				kdbus_printf("%08x", cap[(3 * n) + (n - i - 1)]);
+			kdbus_printf("\n");
+			break;
+		}
+
+		case KDBUS_ITEM_TIMESTAMP:
+			kdbus_printf("  +%s (%llu bytes) seq=%llu realtime=%lluns monotonic=%lluns\n",
+			       enum_MSG(item->type), item->size,
+			       (unsigned long long)item->timestamp.seqnum,
+			       (unsigned long long)item->timestamp.realtime_ns,
+			       (unsigned long long)item->timestamp.monotonic_ns);
+			break;
+
+		case KDBUS_ITEM_REPLY_TIMEOUT:
+			kdbus_printf("  +%s (%llu bytes) cookie=%llu\n",
+			       enum_MSG(item->type), item->size,
+			       msg->cookie_reply);
+			break;
+
+		case KDBUS_ITEM_NAME_ADD:
+		case KDBUS_ITEM_NAME_REMOVE:
+		case KDBUS_ITEM_NAME_CHANGE:
+			kdbus_printf("  +%s (%llu bytes) '%s', old id=%lld, now id=%lld, old_flags=0x%llx new_flags=0x%llx\n",
+				enum_MSG(item->type),
+				(unsigned long long) item->size,
+				item->name_change.name,
+				item->name_change.old_id.id,
+				item->name_change.new_id.id,
+				item->name_change.old_id.flags,
+				item->name_change.new_id.flags);
+			break;
+
+		case KDBUS_ITEM_ID_ADD:
+		case KDBUS_ITEM_ID_REMOVE:
+			kdbus_printf("  +%s (%llu bytes) id=%llu flags=%llu\n",
+			       enum_MSG(item->type),
+			       (unsigned long long) item->size,
+			       (unsigned long long) item->id_change.id,
+			       (unsigned long long) item->id_change.flags);
+			break;
+
+		default:
+			kdbus_printf("  +%s (%llu bytes)\n",
+				     enum_MSG(item->type), item->size);
+			break;
+		}
+	}
+
+	if ((char *)item - ((char *)msg + msg->size) >= 8) {
+		kdbus_printf("invalid padding at end of message\n");
+		ret = -EINVAL;
+	}
+
+	kdbus_printf("\n");
+
+	return ret;
+}
+
+void kdbus_msg_free(struct kdbus_msg *msg)
+{
+	const struct kdbus_item *item;
+	int nfds, i;
+
+	if (!msg)
+		return;
+
+	KDBUS_ITEM_FOREACH(item, msg, items) {
+		switch (item->type) {
+		/* close all memfds */
+		case KDBUS_ITEM_PAYLOAD_MEMFD:
+			close(item->memfd.fd);
+			break;
+		case KDBUS_ITEM_FDS:
+			nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
+				sizeof(int);
+
+			for (i = 0; i < nfds; i++)
+				close(item->fds[i]);
+
+			break;
+		}
+	}
+}
+
+int kdbus_msg_recv(struct kdbus_conn *conn,
+		   struct kdbus_msg **msg_out,
+		   uint64_t *offset)
+{
+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
+	struct kdbus_msg *msg;
+	int ret;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_RECV, &recv);
+	if (ret < 0) {
+		ret = -errno;
+		/* store how many lost packets */
+		if (ret == -EOVERFLOW && offset)
+			*offset = recv.dropped_msgs;
+
+		return ret;
+	}
+
+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
+	ret = kdbus_msg_dump(conn, msg);
+	if (ret < 0) {
+		kdbus_msg_free(msg);
+		return ret;
+	}
+
+	if (msg_out) {
+		*msg_out = msg;
+
+		if (offset)
+			*offset = recv.msg.offset;
+	} else {
+		kdbus_msg_free(msg);
+
+		ret = kdbus_free(conn, recv.msg.offset);
+		if (ret < 0)
+			return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * Returns: 0 on success, negative errno on failure.
+ *
+ * We must return -ETIMEDOUT, -ECONNREST, -EAGAIN and other errors.
+ * We must return the result of kdbus_msg_recv()
+ */
+int kdbus_msg_recv_poll(struct kdbus_conn *conn,
+			int timeout_ms,
+			struct kdbus_msg **msg_out,
+			uint64_t *offset)
+{
+	int ret;
+
+	do {
+		struct timeval before, after, diff;
+		struct pollfd fd;
+
+		fd.fd = conn->fd;
+		fd.events = POLLIN | POLLPRI | POLLHUP;
+		fd.revents = 0;
+
+		gettimeofday(&before, NULL);
+		ret = poll(&fd, 1, timeout_ms);
+		gettimeofday(&after, NULL);
+
+		if (ret == 0) {
+			ret = -ETIMEDOUT;
+			break;
+		}
+
+		if (ret > 0) {
+			if (fd.revents & POLLIN)
+				ret = kdbus_msg_recv(conn, msg_out, offset);
+
+			if (fd.revents & (POLLHUP | POLLERR))
+				ret = -ECONNRESET;
+		}
+
+		if (ret == 0 || ret != -EAGAIN)
+			break;
+
+		timersub(&after, &before, &diff);
+		timeout_ms -= diff.tv_sec * 1000UL +
+			      diff.tv_usec / 1000UL;
+	} while (timeout_ms > 0);
+
+	return ret;
+}
+
+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset)
+{
+	struct kdbus_cmd_free cmd_free = {};
+	int ret;
+
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = offset;
+	cmd_free.flags = 0;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_FREE, &cmd_free);
+	if (ret < 0) {
+		kdbus_printf("KDBUS_CMD_FREE failed: %d (%m)\n", ret);
+		return -errno;
+	}
+
+	return 0;
+}
+
+int kdbus_name_acquire(struct kdbus_conn *conn,
+		       const char *name, uint64_t *flags)
+{
+	struct kdbus_cmd_name *cmd_name;
+	size_t name_len = strlen(name) + 1;
+	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
+	struct kdbus_item *item;
+	int ret;
+
+	cmd_name = alloca(size);
+
+	memset(cmd_name, 0, size);
+
+	item = cmd_name->items;
+	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
+	item->type = KDBUS_ITEM_NAME;
+	strcpy(item->str, name);
+
+	cmd_name->size = size;
+	if (flags)
+		cmd_name->flags = *flags;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_NAME_ACQUIRE, cmd_name);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error aquiring name: %s\n", strerror(-ret));
+		return ret;
+	}
+
+	kdbus_printf("%s(): flags after call: 0x%llx\n", __func__,
+		     cmd_name->flags);
+
+	if (flags)
+		*flags = cmd_name->flags;
+
+	return 0;
+}
+
+int kdbus_name_release(struct kdbus_conn *conn, const char *name)
+{
+	struct kdbus_cmd_name *cmd_name;
+	size_t name_len = strlen(name) + 1;
+	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
+	struct kdbus_item *item;
+	int ret;
+
+	cmd_name = alloca(size);
+
+	memset(cmd_name, 0, size);
+
+	item = cmd_name->items;
+	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
+	item->type = KDBUS_ITEM_NAME;
+	strcpy(item->str, name);
+
+	cmd_name->size = size;
+
+	kdbus_printf("conn %lld giving up name '%s'\n",
+		     (unsigned long long) conn->id, name);
+
+	ret = ioctl(conn->fd, KDBUS_CMD_NAME_RELEASE, cmd_name);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error releasing name: %s\n", strerror(-ret));
+		return ret;
+	}
+
+	return 0;
+}
+
+int kdbus_name_list(struct kdbus_conn *conn, uint64_t flags)
+{
+	struct kdbus_cmd_name_list cmd_list;
+	struct kdbus_name_list *list;
+	struct kdbus_name_info *name;
+	int ret;
+
+	cmd_list.size = sizeof(cmd_list);
+	cmd_list.flags = flags;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_NAME_LIST, &cmd_list);
+	if (ret < 0) {
+		kdbus_printf("error listing names: %d (%m)\n", ret);
+		return EXIT_FAILURE;
+	}
+
+	kdbus_printf("REGISTRY:\n");
+	list = (struct kdbus_name_list *)(conn->buf + cmd_list.offset);
+	if (list->size != cmd_list.list_size) {
+		kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
+				(int) list->size, (int) cmd_list.list_size);
+		return -EIO;
+	}
+
+	KDBUS_ITEM_FOREACH(name, list, names) {
+		uint64_t flags = 0;
+		struct kdbus_item *item;
+		const char *n = "MISSING-NAME";
+
+		if (name->size == sizeof(struct kdbus_cmd_name))
+			continue;
+
+		KDBUS_ITEM_FOREACH(item, name, items)
+			if (item->type == KDBUS_ITEM_OWNED_NAME) {
+				n = item->name.name;
+				flags = item->name.flags;
+			}
+
+		kdbus_printf("%8llu flags=0x%08llx conn=0x%08llx '%s'\n",
+			     name->owner_id, (unsigned long long )flags,
+			     name->conn_flags, n);
+	}
+	kdbus_printf("\n");
+
+	ret = kdbus_free(conn, cmd_list.offset);
+
+	return ret;
+}
+
+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
+				   uint64_t attach_flags_send,
+				   uint64_t attach_flags_recv)
+{
+	int ret;
+	size_t size;
+	struct kdbus_cmd_update *update;
+	struct kdbus_item *item;
+
+	size = sizeof(struct kdbus_cmd_update);
+	size += KDBUS_ITEM_SIZE(sizeof(uint64_t)) * 2;
+
+	update = malloc(size);
+	if (!update) {
+		ret = -errno;
+		kdbus_printf("error malloc: %d (%m)\n", ret);
+		return ret;
+	}
+
+	memset(update, 0, size);
+	update->size = size;
+
+	item = update->items;
+
+	item->type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
+	item->data64[0] = attach_flags_send;
+	item = KDBUS_ITEM_NEXT(item);
+
+	item->type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
+	item->data64[0] = attach_flags_recv;
+	item = KDBUS_ITEM_NEXT(item);
+
+	ret = ioctl(conn->fd, KDBUS_CMD_CONN_UPDATE, update);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error conn update: %d (%m)\n", ret);
+	}
+
+	free(update);
+
+	return ret;
+}
+
+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
+			     const struct kdbus_policy_access *access,
+			     size_t num_access)
+{
+	struct kdbus_cmd_update *update;
+	struct kdbus_item *item;
+	size_t i, size;
+	int ret;
+
+	size = sizeof(struct kdbus_cmd_update);
+	size += KDBUS_ITEM_SIZE(strlen(name) + 1);
+	size += num_access * KDBUS_ITEM_SIZE(sizeof(struct kdbus_policy_access));
+
+	update = malloc(size);
+	if (!update) {
+		ret = -errno;
+		kdbus_printf("error malloc: %d (%m)\n", ret);
+		return ret;
+	}
+
+	memset(update, 0, size);
+	update->size = size;
+
+	item = update->items;
+
+	item->type = KDBUS_ITEM_NAME;
+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
+	strcpy(item->str, name);
+	item = KDBUS_ITEM_NEXT(item);
+
+	for (i = 0; i < num_access; i++) {
+		item->size = KDBUS_ITEM_HEADER_SIZE +
+			     sizeof(struct kdbus_policy_access);
+		item->type = KDBUS_ITEM_POLICY_ACCESS;
+
+		item->policy_access.type = access[i].type;
+		item->policy_access.access = access[i].access;
+		item->policy_access.id = access[i].id;
+
+		item = KDBUS_ITEM_NEXT(item);
+	}
+
+	ret = ioctl(conn->fd, KDBUS_CMD_CONN_UPDATE, update);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error conn update: %d (%m)\n", ret);
+	}
+
+	free(update);
+
+	return ret;
+}
+
+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
+		       uint64_t type, uint64_t id)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_id_change chg;
+		} item;
+	} buf;
+	int ret;
+
+	memset(&buf, 0, sizeof(buf));
+
+	buf.cmd.size = sizeof(buf);
+	buf.cmd.cookie = cookie;
+	buf.item.size = sizeof(buf.item);
+	buf.item.type = type;
+	buf.item.chg.id = id;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
+	}
+
+	return ret;
+}
+
+int kdbus_add_match_empty(struct kdbus_conn *conn)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct kdbus_item item;
+	} buf;
+	int ret;
+
+	memset(&buf, 0, sizeof(buf));
+
+	buf.item.size = sizeof(uint64_t) * 3;
+	buf.item.type = KDBUS_ITEM_ID;
+	buf.item.id = KDBUS_MATCH_ID_ANY;
+
+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	if (ret < 0) {
+		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
+		ret = -errno;
+	}
+
+	return ret;
+}
+
+static int all_ids_are_mapped(const char *path)
+{
+	int ret;
+	FILE *file;
+	uint32_t inside_id, length;
+
+	file = fopen(path, "r");
+	if (!file) {
+		ret = -errno;
+		kdbus_printf("error fopen() %s: %d (%m)\n",
+			     path, ret);
+		return ret;
+	}
+
+	ret = fscanf(file, "%u\t%*u\t%u", &inside_id, &length);
+	if (ret != 2) {
+		if (ferror(file))
+			ret = -errno;
+		else
+			ret = -EIO;
+
+		kdbus_printf("--- error fscanf(): %d\n", ret);
+		fclose(file);
+		return ret;
+	}
+
+	fclose(file);
+
+	/*
+	 * If length is 4294967295 which means the invalid uid
+	 * (uid_t) -1 then we are able to map all uid/gids
+	 */
+	if (inside_id == 0 && length == (uid_t) -1)
+		return 1;
+
+	return 0;
+}
+
+int all_uids_gids_are_mapped()
+{
+	int ret;
+
+	ret = all_ids_are_mapped("/proc/self/uid_map");
+	if (ret <= 0) {
+		kdbus_printf("--- error not all uids are mapped\n");
+		return 0;
+	}
+
+	ret = all_ids_are_mapped("/proc/self/gid_map");
+	if (ret <= 0) {
+		kdbus_printf("--- error not all gids are mapped\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+int drop_privileges(uid_t uid, gid_t gid)
+{
+	int ret;
+
+	ret = setgroups(0, NULL);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error setgroups: %d (%m)\n", ret);
+		return ret;
+	}
+
+	ret = setresgid(gid, gid, gid);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error setresgid: %d (%m)\n", ret);
+		return ret;
+	}
+
+	ret = setresuid(uid, uid, uid);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error setresuid: %d (%m)\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+uint64_t now(clockid_t clock)
+{
+	struct timespec spec;
+
+	clock_gettime(clock, &spec);
+	return spec.tv_sec * 1000ULL * 1000ULL * 1000ULL + spec.tv_nsec;
+}
+
+char *unique_name(const char *prefix)
+{
+	unsigned int i;
+	uint64_t u_now;
+	char n[17];
+	char *str;
+	int r;
+
+	/*
+	 * This returns a random string which is guaranteed to be
+	 * globally unique across all calls to unique_name(). We
+	 * compose the string as:
+	 *   <prefix>-<random>-<time>
+	 * With:
+	 *   <prefix>: string provided by the caller
+	 *   <random>: a random alpha string of 16 characters
+	 *   <time>: the current time in micro-seconds since last boot
+	 *
+	 * The <random> part makes the string always look vastly different,
+	 * the <time> part makes sure no two calls return the same string.
+	 */
+
+	u_now = now(CLOCK_MONOTONIC);
+
+	for (i = 0; i < sizeof(n) - 1; ++i)
+		n[i] = 'a' + (rand() % ('z' - 'a'));
+	n[sizeof(n) - 1] = 0;
+
+	r = asprintf(&str, "%s-%s-%" PRIu64, prefix, n, u_now);
+	if (r < 0)
+		return NULL;
+
+	return str;
+}
+
+static int do_userns_map_id(pid_t pid,
+			    const char *map_file,
+			    const char *map_id)
+{
+	int ret;
+	int fd;
+	char *map;
+	unsigned int i;
+
+	map = strndupa(map_id, strlen(map_id));
+	if (!map) {
+		ret = -errno;
+		kdbus_printf("error strndupa %s: %d (%m)\n",
+			map_file, ret);
+		return ret;
+	}
+
+	for (i = 0; i < strlen(map); i++)
+		if (map[i] == ',')
+			map[i] = '\n';
+
+	fd = open(map_file, O_RDWR);
+	if (fd < 0) {
+		ret = -errno;
+		kdbus_printf("error open %s: %d (%m)\n",
+			map_file, ret);
+		return ret;
+	}
+
+	ret = write(fd, map, strlen(map));
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error write to %s: %d (%m)\n",
+			     map_file, ret);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	close(fd);
+	return ret;
+}
+
+int userns_map_uid_gid(pid_t pid,
+		       const char *map_uid,
+		       const char *map_gid)
+{
+	int fd, ret;
+	char file_id[128] = {'\0'};
+
+	snprintf(file_id, sizeof(file_id), "/proc/%ld/uid_map",
+		 (long) pid);
+
+	ret = do_userns_map_id(pid, file_id, map_uid);
+	if (ret < 0)
+		return ret;
+
+	snprintf(file_id, sizeof(file_id), "/proc/%ld/setgroups",
+		 (long) pid);
+
+	fd = open(file_id, O_WRONLY);
+	if (fd >= 0) {
+		write(fd, "deny\n", 5);
+		close(fd);
+	}
+
+	snprintf(file_id, sizeof(file_id), "/proc/%ld/gid_map",
+		 (long) pid);
+
+	return do_userns_map_id(pid, file_id, map_gid);
+}
+
+static int do_cap_get_flag(cap_t caps, cap_value_t cap)
+{
+	int ret;
+	cap_flag_value_t flag_set;
+
+	ret = cap_get_flag(caps, cap, CAP_EFFECTIVE, &flag_set);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error cap_get_flag(): %d (%m)\n", ret);
+		return ret;
+	}
+
+	return (flag_set == CAP_SET);
+}
+
+/*
+ * Returns:
+ *  1 in case all the requested effective capabilities are set.
+ *  0 in case we do not have the requested capabilities. This value
+ *    will be used to abort tests with TEST_SKIP
+ *  Negative errno on failure.
+ *
+ *  Terminate args with a negative value.
+ */
+int test_is_capable(int cap, ...)
+{
+	int ret;
+	va_list ap;
+	cap_t caps;
+
+	caps = cap_get_proc();
+	if (!cap) {
+		ret = -errno;
+		kdbus_printf("error cap_get_proc(): %d (%m)\n", ret);
+		return ret;
+	}
+
+	ret = do_cap_get_flag(caps, (cap_value_t)cap);
+	if (ret <= 0)
+		goto out;
+
+	va_start(ap, cap);
+	while ((cap = va_arg(ap, int)) > 0) {
+		ret = do_cap_get_flag(caps, (cap_value_t)cap);
+		if (ret <= 0)
+			break;
+	}
+	va_end(ap);
+
+out:
+	cap_free(caps);
+	return ret;
+}
+
+int config_user_ns_is_enabled(void)
+{
+	return (access("/proc/self/uid_map", F_OK) == 0);
+}
+
+int config_auditsyscall_is_enabled(void)
+{
+	return (access("/proc/self/loginuid", F_OK) == 0);
+}
+
+int config_cgroups_is_enabled(void)
+{
+	return (access("/proc/self/cgroup", F_OK) == 0);
+}
+
+int config_security_is_enabled(void)
+{
+	int fd;
+	int ret;
+	char buf[128];
+
+	/* CONFIG_SECURITY is disabled */
+	if (access("/proc/self/attr/current", F_OK) != 0)
+		return 0;
+
+	/*
+	 * Now only if read() fails with -EINVAL then we assume
+	 * that SECLABEL and LSM are disabled
+	 */
+	fd = open("/proc/self/attr/current", O_RDONLY|O_CLOEXEC);
+	if (fd < 0)
+		return 1;
+
+	ret = read(fd, buf, sizeof(buf));
+	if (ret == -1 && errno == EINVAL)
+		ret = 0;
+	else
+		ret = 1;
+
+	close(fd);
+
+	return ret;
+}
diff --git a/tools/testing/selftests/kdbus/kdbus-util.h b/tools/testing/selftests/kdbus/kdbus-util.h
new file mode 100644
index 000000000000..02ba8ef5d030
--- /dev/null
+++ b/tools/testing/selftests/kdbus/kdbus-util.h
@@ -0,0 +1,216 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Daniel Mack
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+#pragma once
+
+#define BIT(X) (1 << (X))
+
+#include <time.h>
+#include <stdbool.h>
+#include <linux/kdbus.h>
+
+#define _STRINGIFY(x) #x
+#define STRINGIFY(x) _STRINGIFY(x)
+#define ELEMENTSOF(x) (sizeof(x)/sizeof((x)[0]))
+
+#define KDBUS_PTR(addr) ((void *)(uintptr_t)(addr))
+
+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
+
+#define KDBUS_ITEM_NEXT(item) \
+	(typeof(item))(((uint8_t *)item) + KDBUS_ALIGN8((item)->size))
+#define KDBUS_ITEM_FOREACH(item, head, first)				\
+	for (item = (head)->first;					\
+	     (uint8_t *)(item) < (uint8_t *)(head) + (head)->size;	\
+	     item = KDBUS_ITEM_NEXT(item))
+
+
+#define _KDBUS_ATTACH_BITS_SET_NR  (__builtin_popcountll(_KDBUS_ATTACH_ALL))
+
+/* Sum of KDBUS_ITEM_* that reflects _KDBUS_ATTACH_ALL */
+#define KDBUS_ATTACH_ITEMS_TYPE_SUM \
+	((((_KDBUS_ATTACH_BITS_SET_NR - 1) * \
+	((_KDBUS_ATTACH_BITS_SET_NR - 1) + 1)) / 2 ) + \
+	(_KDBUS_ITEM_ATTACH_BASE * _KDBUS_ATTACH_BITS_SET_NR))
+
+
+#define POOL_SIZE (16 * 1024LU * 1024LU)
+
+#define UNPRIV_UID 65534
+#define UNPRIV_GID 65534
+
+/* Dump as user of process, useful for user namespace testing */
+#define SUID_DUMP_USER	1
+
+extern int kdbus_util_verbose;
+
+#define kdbus_printf(X...) \
+	if (kdbus_util_verbose) \
+		printf(X)
+
+#define RUN_UNPRIVILEGED(child_uid, child_gid, _child_, _parent_) ({	\
+		pid_t pid, rpid;					\
+		int ret;						\
+									\
+		pid = fork();						\
+		if (pid == 0) {						\
+			ret = drop_privileges(child_uid, child_gid);	\
+			ASSERT_EXIT_VAL(ret == 0, ret);			\
+									\
+			_child_;					\
+			_exit(0);					\
+		} else if (pid > 0) {					\
+			_parent_;					\
+			rpid = waitpid(pid, &ret, 0);			\
+			ASSERT_RETURN(rpid == pid);			\
+			ASSERT_RETURN(WIFEXITED(ret));			\
+			ASSERT_RETURN(WEXITSTATUS(ret) == 0);		\
+			ret = TEST_OK;					\
+		} else {						\
+			ret = pid;					\
+		}							\
+									\
+		ret;							\
+	})
+
+#define RUN_UNPRIVILEGED_CONN(_var_, _bus_, _code_)			\
+	RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({			\
+		struct kdbus_conn *_var_;				\
+		_var_ = kdbus_hello(_bus_, 0, NULL, 0);			\
+		ASSERT_EXIT(_var_);					\
+		_code_;							\
+		kdbus_conn_free(_var_);					\
+	}), ({ 0; }))
+
+#define RUN_CLONE_CHILD(clone_ret, flags, _setup_, _child_body_,	\
+			_parent_setup_, _parent_body_) ({		\
+	pid_t pid, rpid;						\
+	int ret;							\
+	int efd = -1;							\
+									\
+	_setup_;							\
+	efd = eventfd(0, EFD_CLOEXEC);					\
+	ASSERT_RETURN(efd >= 0);					\
+	*clone_ret = 0;							\
+	pid = syscall(__NR_clone, flags, NULL);				\
+	if (pid == 0) {							\
+		eventfd_t event_status = 0;				\
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);			\
+		ASSERT_EXIT(ret == 0);					\
+		ret = eventfd_read(efd, &event_status);			\
+		if (ret < 0 || event_status != 1) {			\
+			kdbus_printf("error eventfd_read()\n");		\
+			_exit(EXIT_FAILURE);				\
+		}							\
+		_child_body_;						\
+		_exit(0);						\
+	} else if (pid > 0) {						\
+		_parent_setup_;						\
+		ret = eventfd_write(efd, 1);				\
+		ASSERT_RETURN(ret >= 0);				\
+		_parent_body_;						\
+		rpid = waitpid(pid, &ret, 0);				\
+		ASSERT_RETURN(rpid == pid);				\
+		ASSERT_RETURN(WIFEXITED(ret));				\
+		ASSERT_RETURN(WEXITSTATUS(ret) == 0);			\
+		ret = TEST_OK;						\
+	} else {							\
+		ret = -errno;						\
+		*clone_ret = -errno;					\
+	}								\
+	close(efd);							\
+	ret;								\
+})
+
+/* Enums for parent if it should drop privs or not */
+enum kdbus_drop_parent {
+	DO_NOT_DROP,
+	DROP_SAME_UNPRIV,
+	DROP_OTHER_UNPRIV,
+};
+
+struct kdbus_conn {
+	int fd;
+	uint64_t id;
+	unsigned char *buf;
+};
+
+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask);
+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask);
+
+int sys_memfd_create(const char *name, __u64 size);
+int sys_memfd_seal_set(int fd);
+off_t sys_memfd_get_size(int fd, off_t *size);
+
+int kdbus_name_list(struct kdbus_conn *conn, uint64_t flags);
+int kdbus_name_release(struct kdbus_conn *conn, const char *name);
+int kdbus_name_acquire(struct kdbus_conn *conn, const char *name,
+		       uint64_t *flags);
+void kdbus_msg_free(struct kdbus_msg *msg);
+int kdbus_msg_recv(struct kdbus_conn *conn,
+		   struct kdbus_msg **msg, uint64_t *offset);
+int kdbus_msg_recv_poll(struct kdbus_conn *conn, int timeout_ms,
+			struct kdbus_msg **msg_out, uint64_t *offset);
+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset);
+int kdbus_msg_dump(const struct kdbus_conn *conn,
+		   const struct kdbus_msg *msg);
+int kdbus_create_bus(int control_fd, const char *name,
+		     uint64_t req_meta, uint64_t owner_meta,
+		     char **path);
+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
+		   uint64_t cookie, uint64_t flags, uint64_t timeout,
+		   int64_t priority, uint64_t dst_id);
+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
+			uint64_t cookie, uint64_t flags, uint64_t timeout,
+			int64_t priority, uint64_t dst_id, int cancel_fd);
+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
+			 uint64_t reply_cookie,
+			 uint64_t dst_id);
+struct kdbus_conn *kdbus_hello(const char *path, uint64_t hello_flags,
+			       const struct kdbus_item *item,
+			       size_t item_size);
+struct kdbus_conn *kdbus_hello_registrar(const char *path, const char *name,
+					 const struct kdbus_policy_access *access,
+					 size_t num_access, uint64_t flags);
+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
+					 const struct kdbus_policy_access *access,
+					 size_t num_access);
+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type);
+int kdbus_bus_creator_info(struct kdbus_conn *conn,
+			   uint64_t flags,
+			   uint64_t *offset);
+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
+		    const char *name, uint64_t flags, uint64_t *offset);
+void kdbus_conn_free(struct kdbus_conn *conn);
+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
+				   uint64_t attach_flags_send,
+				   uint64_t attach_flags_recv);
+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
+			     const struct kdbus_policy_access *access,
+			     size_t num_access);
+
+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
+		       uint64_t type, uint64_t id);
+int kdbus_add_match_empty(struct kdbus_conn *conn);
+
+int all_uids_gids_are_mapped();
+int drop_privileges(uid_t uid, gid_t gid);
+uint64_t now(clockid_t clock);
+char *unique_name(const char *prefix);
+
+int userns_map_uid_gid(pid_t pid,
+		       const char *map_uid,
+		       const char *map_gid);
+int test_is_capable(int cap, ...);
+int config_user_ns_is_enabled(void);
+int config_auditsyscall_is_enabled(void);
+int config_cgroups_is_enabled(void);
+int config_security_is_enabled(void);
diff --git a/tools/testing/selftests/kdbus/test-activator.c b/tools/testing/selftests/kdbus/test-activator.c
new file mode 100644
index 000000000000..d6f683d7a629
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-activator.c
@@ -0,0 +1,319 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <sys/capability.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/wait.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+static int kdbus_starter_poll(struct kdbus_conn *conn)
+{
+	int ret;
+	struct pollfd fd;
+
+	fd.fd = conn->fd;
+	fd.events = POLLIN | POLLPRI | POLLHUP;
+	fd.revents = 0;
+
+	ret = poll(&fd, 1, 100);
+	if (ret == 0)
+		return -ETIMEDOUT;
+	else if (ret > 0) {
+		if (fd.revents & POLLIN)
+			return 0;
+
+		if (fd.revents & (POLLHUP | POLLERR))
+			ret = -ECONNRESET;
+	}
+
+	return ret;
+}
+
+/* Ensure that kdbus activator logic is safe */
+static int kdbus_priv_activator(struct kdbus_test_env *env)
+{
+	int ret;
+	struct kdbus_msg *msg = NULL;
+	uint64_t cookie = 0xdeadbeef;
+	uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
+	struct kdbus_conn *activator;
+	struct kdbus_conn *service;
+	struct kdbus_conn *client;
+	struct kdbus_conn *holder;
+	struct kdbus_policy_access *access;
+
+	access = (struct kdbus_policy_access[]){
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = getuid(),
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = getuid(),
+			.access = KDBUS_POLICY_TALK,
+		},
+	};
+
+	activator = kdbus_hello_activator(env->buspath, "foo.priv.activator",
+					  access, 2);
+	ASSERT_RETURN(activator);
+
+	service = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(service);
+
+	client = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(client);
+
+	/*
+	 * Make sure that other users can't TALK to the activator
+	 */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		/* Try to talk using the ID */
+		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
+				     0, activator->id);
+		ASSERT_EXIT(ret == -ENXIO);
+
+		/* Try to talk to the name */
+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
+				     0xdeadbeef, 0, 0, 0,
+				     KDBUS_DST_ID_NAME);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure that we did not receive anything, so the
+	 * service will not be started automatically
+	 */
+
+	ret = kdbus_starter_poll(activator);
+	ASSERT_RETURN(ret == -ETIMEDOUT);
+
+	/*
+	 * Now try to emulate the starter/service logic and
+	 * acquire the name.
+	 */
+
+	cookie++;
+	ret = kdbus_msg_send(service, "foo.priv.activator", cookie,
+			     0, 0, 0, KDBUS_DST_ID_NAME);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_starter_poll(activator);
+	ASSERT_RETURN(ret == 0);
+
+	/* Policies are still checked, access denied */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
+					 &flags);
+		ASSERT_RETURN(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_name_acquire(service, "foo.priv.activator",
+				 &flags);
+	ASSERT_RETURN(ret == 0);
+
+	/* We read our previous starter message */
+
+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Try to talk, we still fail */
+
+	cookie++;
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		/* Try to talk to the name */
+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
+				     cookie, 0, 0, 0,
+				     KDBUS_DST_ID_NAME);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/* Still nothing to read */
+
+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
+	ASSERT_RETURN(ret == -ETIMEDOUT);
+
+	/* We receive every thing now */
+
+	cookie++;
+	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
+			     0, 0, 0, KDBUS_DST_ID_NAME);
+	ASSERT_RETURN(ret == 0);
+	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	/* Policies default to deny TALK now */
+	kdbus_conn_free(activator);
+
+	cookie++;
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		/* Try to talk to the name */
+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
+				     cookie, 0, 0, 0,
+				     KDBUS_DST_ID_NAME);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
+	ASSERT_RETURN(ret == -ETIMEDOUT);
+
+	/* Same user is able to TALK */
+	cookie++;
+	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
+			     0, 0, 0, KDBUS_DST_ID_NAME);
+	ASSERT_RETURN(ret == 0);
+	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	access = (struct kdbus_policy_access []){
+		{
+			.type = KDBUS_POLICY_ACCESS_WORLD,
+			.id = getuid(),
+			.access = KDBUS_POLICY_TALK,
+		},
+	};
+
+	holder = kdbus_hello_registrar(env->buspath, "foo.priv.activator",
+				       access, 1, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder);
+
+	/* Now we are able to TALK to the name */
+
+	cookie++;
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		/* Try to talk to the name */
+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
+				     cookie, 0, 0, 0,
+				     KDBUS_DST_ID_NAME);
+		ASSERT_EXIT(ret == 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
+					 &flags);
+		ASSERT_RETURN(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	kdbus_conn_free(service);
+	kdbus_conn_free(client);
+	kdbus_conn_free(holder);
+
+	return 0;
+}
+
+int kdbus_test_activator(struct kdbus_test_env *env)
+{
+	int ret;
+	struct kdbus_conn *activator;
+	struct pollfd fds[2];
+	bool activator_done = false;
+	struct kdbus_policy_access access[2];
+
+	access[0].type = KDBUS_POLICY_ACCESS_USER;
+	access[0].id = getuid();
+	access[0].access = KDBUS_POLICY_OWN;
+
+	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
+	access[1].access = KDBUS_POLICY_TALK;
+
+	activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
+					  access, 2);
+	ASSERT_RETURN(activator);
+
+	ret = kdbus_add_match_empty(env->conn);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_list(env->conn, KDBUS_NAME_LIST_NAMES |
+					 KDBUS_NAME_LIST_UNIQUE |
+					 KDBUS_NAME_LIST_ACTIVATORS |
+					 KDBUS_NAME_LIST_QUEUED);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_send(env->conn, "foo.test.activator", 0xdeafbeef,
+			     0, 0, 0, KDBUS_DST_ID_NAME);
+	ASSERT_RETURN(ret == 0);
+
+	fds[0].fd = activator->fd;
+	fds[1].fd = env->conn->fd;
+
+	kdbus_printf("-- entering poll loop ...\n");
+
+	for (;;) {
+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
+
+		for (i = 0; i < nfds; i++) {
+			fds[i].events = POLLIN | POLLPRI;
+			fds[i].revents = 0;
+		}
+
+		ret = poll(fds, nfds, 3000);
+		ASSERT_RETURN(ret >= 0);
+
+		ret = kdbus_name_list(env->conn, KDBUS_NAME_LIST_NAMES);
+		ASSERT_RETURN(ret == 0);
+
+		if ((fds[0].revents & POLLIN) && !activator_done) {
+			uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
+
+			kdbus_printf("Starter was called back!\n");
+
+			ret = kdbus_name_acquire(env->conn,
+						 "foo.test.activator", &flags);
+			ASSERT_RETURN(ret == 0);
+
+			activator_done = true;
+		}
+
+		if (fds[1].revents & POLLIN) {
+			kdbus_msg_recv(env->conn, NULL, NULL);
+			break;
+		}
+	}
+
+	/* Check if all uids/gids are mapped */
+	if (!all_uids_gids_are_mapped())
+		return TEST_SKIP;
+
+	/* Check now capabilities, so we run the previous tests */
+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	if (!ret)
+		return TEST_SKIP;
+
+	ret = kdbus_priv_activator(env);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(activator);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-attach-flags.c b/tools/testing/selftests/kdbus/test-attach-flags.c
new file mode 100644
index 000000000000..8bf6bc8722bd
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-attach-flags.c
@@ -0,0 +1,751 @@
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/capability.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <linux/unistd.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+/*
+ * Should be the sum of the currently supported and compiled-in
+ * KDBUS_ITEMS_* that reflect KDBUS_ATTACH_* flags.
+ */
+static unsigned int KDBUS_TEST_ITEMS_SUM = KDBUS_ATTACH_ITEMS_TYPE_SUM;
+
+static struct kdbus_conn *__kdbus_hello(const char *path, uint64_t flags,
+					uint64_t attach_flags_send,
+					uint64_t attach_flags_recv)
+{
+	struct kdbus_cmd_free cmd_free = {};
+	int ret, fd;
+	struct kdbus_conn *conn;
+	struct {
+		struct kdbus_cmd_hello hello;
+
+		struct {
+			uint64_t size;
+			uint64_t type;
+			char str[16];
+		} conn_name;
+
+		uint8_t extra_items[0];
+	} h;
+
+	memset(&h, 0, sizeof(h));
+
+	kdbus_printf("-- opening bus connection %s\n", path);
+	fd = open(path, O_RDWR|O_CLOEXEC);
+	if (fd < 0) {
+		kdbus_printf("--- error %d (%m)\n", fd);
+		return NULL;
+	}
+
+	h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
+	h.hello.attach_flags_send = attach_flags_send;
+	h.hello.attach_flags_recv = attach_flags_recv;
+	h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
+	strcpy(h.conn_name.str, "this-is-my-name");
+	h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
+
+	h.hello.size = sizeof(h);
+	h.hello.pool_size = POOL_SIZE;
+
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &h.hello);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
+		return NULL;
+	}
+
+	kdbus_printf("-- New connection ID : %llu\n",
+		     (unsigned long long)h.hello.id);
+
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = h.hello.offset;
+	ret = ioctl(fd, KDBUS_CMD_FREE, &cmd_free);
+	if (ret < 0)
+		return NULL;
+
+	conn = malloc(sizeof(*conn));
+	if (!conn) {
+		kdbus_printf("unable to malloc()!?\n");
+		return NULL;
+	}
+
+	conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
+	if (conn->buf == MAP_FAILED) {
+		ret = -errno;
+		free(conn);
+		close(fd);
+		kdbus_printf("--- error mmap: %d (%m)\n", ret);
+		return NULL;
+	}
+
+	conn->fd = fd;
+	conn->id = h.hello.id;
+	return conn;
+}
+
+static int kdbus_test_peers_creation(struct kdbus_test_env *env)
+{
+	int ret;
+	int control_fd;
+	char *path;
+	char *busname;
+	char buspath[2048];
+	char control_path[2048];
+	uint64_t attach_flags_mask;
+	struct kdbus_conn *conn;
+
+	snprintf(control_path, sizeof(control_path),
+		 "%s/control", env->root);
+
+	/*
+	 * Set kdbus system-wide mask to 0, this has nothing
+	 * to do with the following tests, bus and connection
+	 * creation nor connection update, but we do it so we are
+	 * sure that everything work as expected
+	 */
+
+	attach_flags_mask = 0;
+	ret = kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+					     attach_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+
+	/*
+	 * Create bus with a full set of ATTACH flags
+	 */
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peers-creation-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, _KDBUS_ATTACH_ALL,
+			       0, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	/*
+	 * Create a connection with an empty send attach flags, or
+	 * with just KDBUS_ATTACH_CREDS, this should fail
+	 */
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn == NULL);
+	ASSERT_RETURN(errno == ECONNREFUSED);
+
+	conn = __kdbus_hello(buspath, 0, KDBUS_ATTACH_CREDS,
+			     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(conn == NULL);
+	ASSERT_RETURN(errno == ECONNREFUSED);
+
+	conn = __kdbus_hello(buspath, 0, _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(conn);
+
+	/* Try to cut back some send attach flags */
+	ret = kdbus_conn_update_attach_flags(conn,
+					     KDBUS_ATTACH_CREDS|
+					     KDBUS_ATTACH_PIDS,
+					     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_conn_update_attach_flags(conn,
+					     _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+
+	/* Test a new bus with KDBUS_ATTACH_PIDS */
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peer-flags-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, KDBUS_ATTACH_PIDS,
+			       0, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	/*
+	 * Create a connection with an empty send attach flags, or
+	 * all flags except KDBUS_ATTACH_PIDS
+	 */
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn == NULL);
+	ASSERT_RETURN(errno == ECONNREFUSED);
+
+	conn = __kdbus_hello(buspath, 0,
+			     _KDBUS_ATTACH_ALL & ~KDBUS_ATTACH_PIDS,
+			     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(conn == NULL);
+	ASSERT_RETURN(errno == ECONNREFUSED);
+
+	/* The following should succeed */
+	conn = __kdbus_hello(buspath, 0, KDBUS_ATTACH_PIDS, 0);
+	ASSERT_RETURN(conn);
+	kdbus_conn_free(conn);
+
+	conn = __kdbus_hello(buspath, 0, _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_conn_update_attach_flags(conn,
+					     _KDBUS_ATTACH_ALL &
+					     ~KDBUS_ATTACH_PIDS,
+					     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_conn_update_attach_flags(conn, 0,
+					     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	/* Now we want only KDBUS_ATTACH_PIDS */
+	ret = kdbus_conn_update_attach_flags(conn,
+					     KDBUS_ATTACH_PIDS, 0);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+
+	/*
+	 * Create bus with 0 as ATTACH flags, the bus does not
+	 * require any attach flags
+	 */
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peer-flags-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, 0, 0, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	/* Bus is open it does not require any send attach flags */
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn);
+	kdbus_conn_free(conn);
+
+	conn = __kdbus_hello(buspath, 0, _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_conn_update_attach_flags(conn, 0, 0);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_attach_flags(conn, KDBUS_ATTACH_CREDS, 0);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+	return 0;
+}
+
+static int kdbus_test_peers_info(struct kdbus_test_env *env)
+{
+	int ret;
+	int control_fd;
+	char *path;
+	char *busname;
+	unsigned int i = 0;
+	uint64_t offset = 0;
+	char buspath[2048];
+	char control_path[2048];
+	uint64_t attach_flags_mask;
+	struct kdbus_item *item;
+	struct kdbus_info *info;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *reader;
+	unsigned long long attach_count = 0;
+
+	snprintf(control_path, sizeof(control_path),
+		 "%s/control", env->root);
+
+	attach_flags_mask = 0;
+	ret = kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+					     attach_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peers-info-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, _KDBUS_ATTACH_ALL,
+			       0, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	/* Create connections with the appropriate flags */
+	conn = __kdbus_hello(buspath, 0, _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(conn);
+
+	reader = __kdbus_hello(buspath, 0, _KDBUS_ATTACH_ALL, 0);
+	ASSERT_RETURN(reader);
+
+	ret = kdbus_conn_info(reader, conn->id, NULL,
+			      _KDBUS_ATTACH_ALL, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(reader->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	/* all attach flags are masked, no metadata */
+	KDBUS_ITEM_FOREACH(item, info, items)
+		i++;
+
+	ASSERT_RETURN(i == 0);
+
+	kdbus_free(reader, offset);
+
+	/* Set the mask to _KDBUS_ATTACH_ANY */
+	attach_flags_mask = _KDBUS_ATTACH_ANY;
+	ret = kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+					     attach_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_info(reader, conn->id, NULL,
+			      _KDBUS_ATTACH_ALL, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(reader->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	attach_count = 0;
+	KDBUS_ITEM_FOREACH(item, info, items)
+		    attach_count += item->type;
+
+	/*
+	 * All flags have been returned except for:
+	 * KDBUS_ITEM_TIMESTAMP and
+	 * KDBUS_ITEM_OWNED_NAME we do not own any name.
+	 */
+	ASSERT_RETURN(attach_count == (KDBUS_TEST_ITEMS_SUM -
+				       KDBUS_ITEM_OWNED_NAME -
+				       KDBUS_ITEM_TIMESTAMP));
+
+	kdbus_free(reader, offset);
+
+	/* Request only OWNED names */
+	ret = kdbus_conn_info(reader, conn->id, NULL,
+			      KDBUS_ATTACH_NAMES, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(reader->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	attach_count = 0;
+	KDBUS_ITEM_FOREACH(item, info, items)
+		attach_count += item->type;
+
+	/* we should not get any metadata since we do not own names */
+	ASSERT_RETURN(attach_count == 0);
+
+	kdbus_free(reader, offset);
+
+	kdbus_conn_free(conn);
+	kdbus_conn_free(reader);
+
+	return 0;
+}
+
+/**
+ * @kdbus_mask_param:	kdbus module mask parameter (system-wide)
+ * @requested_meta:	The bus owner metadata that we want
+ * @expected_items:	The returned KDBUS_ITEMS_* sum. Used to
+ *			validate the returned metadata items
+ */
+static int kdbus_cmp_bus_creator_metadata(struct kdbus_test_env *env,
+					  struct kdbus_conn *conn,
+					  uint64_t kdbus_mask_param,
+					  uint64_t requested_meta,
+					  unsigned long expected_items)
+{
+	int ret;
+	uint64_t offset = 0;
+	struct kdbus_info *info;
+	struct kdbus_item *item;
+	unsigned long attach_count = 0;
+
+	ret = kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+					     kdbus_mask_param);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_bus_creator_info(conn, requested_meta, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+
+	KDBUS_ITEM_FOREACH(item, info, items)
+		attach_count += item->type;
+
+	ASSERT_RETURN(attach_count == expected_items);
+
+	ret = kdbus_free(conn, offset);
+	ASSERT_RETURN(ret == 0);
+
+	return 0;
+}
+
+static int kdbus_test_bus_creator_info(struct kdbus_test_env *env)
+{
+	int ret;
+	int control_fd;
+	char *path;
+	char *busname;
+	char buspath[2048];
+	char control_path[2048];
+	uint64_t attach_flags_mask;
+	struct kdbus_conn *conn;
+	unsigned long expected_items = 0;
+
+	snprintf(control_path, sizeof(control_path),
+		 "%s/control", env->root);
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peers-info-bus");
+	ASSERT_RETURN(busname);
+
+	/*
+	 * Now the bus allows us to see all its KDBUS_ATTACH_*
+	 * items
+	 */
+	ret = kdbus_create_bus(control_fd, busname, 0,
+			       _KDBUS_ATTACH_ALL, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn);
+
+	/*
+	 * Start with a kdbus module mask set to _KDBUS_ATTACH_ANY
+	 */
+	attach_flags_mask = _KDBUS_ATTACH_ANY;
+
+	/*
+	 * All flags will be returned except for:
+	 * KDBUS_ITEM_TIMESTAMP
+	 * KDBUS_ITEM_OWNED_NAME
+	 * KDBUS_ITEM_CONN_DESCRIPTION
+	 *
+	 * An extra flags is always returned KDBUS_ITEM_MAKE_NAME
+	 * which contains the bus name
+	 */
+	expected_items = KDBUS_TEST_ITEMS_SUM + KDBUS_ITEM_MAKE_NAME;
+	expected_items -= KDBUS_ITEM_TIMESTAMP +
+			  KDBUS_ITEM_OWNED_NAME +
+			  KDBUS_ITEM_CONN_DESCRIPTION;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * We should have:
+	 * KDBUS_ITEM_PIDS + KDBUS_ITEM_CREDS + KDBUS_ITEM_MAKE_NAME
+	 */
+	expected_items = KDBUS_ITEM_PIDS + KDBUS_ITEM_CREDS +
+			 KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     KDBUS_ATTACH_PIDS |
+					     KDBUS_ATTACH_CREDS,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/* KDBUS_ITEM_MAKE_NAME is always returned */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     0, expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Restrict kdbus system-wide mask to KDBUS_ATTACH_PIDS
+	 */
+
+	attach_flags_mask = KDBUS_ATTACH_PIDS;
+
+	/*
+	 * We should have:
+	 * KDBUS_ITEM_PIDS + KDBUS_ITEM_MAKE_NAME
+	 */
+	expected_items = KDBUS_ITEM_PIDS + KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+
+	/* system-wide mask to 0 */
+	attach_flags_mask = 0;
+
+	/* we should only see: KDBUS_ITEM_MAKE_NAME */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+
+	/*
+	 * A new bus that hides all its owner metadata
+	 */
+
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peers-info-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, 0, 0, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn);
+
+	/*
+	 * Start with a kdbus module mask set to _KDBUS_ATTACH_ANY
+	 */
+	attach_flags_mask = _KDBUS_ATTACH_ANY;
+
+	/*
+	 * We only get the KDBUS_ITEM_MAKE_NAME
+	 */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * We still get only kdbus_ITEM_MAKE_NAME
+	 */
+	attach_flags_mask = 0;
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+
+	/*
+	 * A new bus that shows only the PID and CREDS metadata
+	 * of the bus owner.
+	 */
+	control_fd = open(control_path, O_RDWR);
+	ASSERT_RETURN(control_fd >= 0);
+
+	busname = unique_name("test-peers-info-bus");
+	ASSERT_RETURN(busname);
+
+	ret = kdbus_create_bus(control_fd, busname, 0,
+			       KDBUS_ATTACH_PIDS|
+			       KDBUS_ATTACH_CREDS, &path);
+	ASSERT_RETURN(ret == 0);
+
+	snprintf(buspath, sizeof(buspath), "%s/%s/bus", env->root, path);
+
+	conn = __kdbus_hello(buspath, 0, 0, 0);
+	ASSERT_RETURN(conn);
+
+	/*
+	 * Start with a kdbus module mask set to _KDBUS_ATTACH_ANY
+	 */
+	attach_flags_mask = _KDBUS_ATTACH_ANY;
+
+	/*
+	 * We should have:
+	 * KDBUS_ITEM_PIDS + KDBUS_ITEM_CREDS + KDBUS_ITEM_MAKE_NAME
+	 */
+	expected_items = KDBUS_ITEM_PIDS + KDBUS_ITEM_CREDS +
+			 KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	expected_items = KDBUS_ITEM_CREDS + KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     KDBUS_ATTACH_CREDS,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/* KDBUS_ITEM_MAKE_NAME is always returned */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     0, expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Restrict kdbus system-wide mask to KDBUS_ATTACH_PIDS
+	 */
+
+	attach_flags_mask = KDBUS_ATTACH_PIDS;
+	/*
+	 * We should have:
+	 * KDBUS_ITEM_PIDS + KDBUS_ITEM_MAKE_NAME
+	 */
+	expected_items = KDBUS_ITEM_PIDS + KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/* No KDBUS_ATTACH_CREDS */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     KDBUS_ATTACH_CREDS,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+	/* system-wide mask to 0 */
+	attach_flags_mask = 0;
+
+	/* we should only see: KDBUS_ITEM_MAKE_NAME */
+	expected_items = KDBUS_ITEM_MAKE_NAME;
+	ret = kdbus_cmp_bus_creator_metadata(env, conn,
+					     attach_flags_mask,
+					     _KDBUS_ATTACH_ALL,
+					     expected_items);
+	ASSERT_RETURN(ret == 0);
+
+
+	kdbus_conn_free(conn);
+	free(path);
+	free(busname);
+	close(control_fd);
+
+	return 0;
+}
+
+int kdbus_test_attach_flags(struct kdbus_test_env *env)
+{
+	int ret;
+	uint64_t flags_mask;
+	uint64_t old_kdbus_flags_mask;
+
+	/* We need CAP_DAC_OVERRIDE to overwrite the kdbus mask */
+	ret = test_is_capable(CAP_DAC_OVERRIDE, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	/* no enough privileges, SKIP test */
+	if (!ret)
+		return TEST_SKIP;
+
+	/*
+	 * We need to be able to write to
+	 * "/sys/module/kdbus/parameters/attach_flags_mask"
+	 * perhaps we are unprvileged/privileged in its userns
+	 */
+	ret = access(env->mask_param_path, W_OK);
+	if (ret < 0) {
+		kdbus_printf("--- access() '%s' failed: %d (%m)\n",
+			     env->mask_param_path, -errno);
+		return TEST_SKIP;
+	}
+
+	ret = kdbus_sysfs_get_parameter_mask(env->mask_param_path,
+					     &old_kdbus_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	/* setup the right KDBUS_TEST_ITEMS_SUM */
+	if (!config_auditsyscall_is_enabled())
+		KDBUS_TEST_ITEMS_SUM -= KDBUS_ITEM_AUDIT;
+
+	if (!config_cgroups_is_enabled())
+		KDBUS_TEST_ITEMS_SUM -= KDBUS_ITEM_CGROUP;
+
+	if (!config_security_is_enabled())
+		KDBUS_TEST_ITEMS_SUM -= KDBUS_ITEM_SECLABEL;
+
+	/*
+	 * Test the connection creation attach flags
+	 */
+	ret = kdbus_test_peers_creation(env);
+	/* Restore previous kdbus mask */
+	kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+				       old_kdbus_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test the CONN_INFO ioctl attach flags
+	 */
+	ret = kdbus_test_peers_info(env);
+	/* Restore previous kdbus mask */
+	kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+				       old_kdbus_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test the Bus creator info and its attach flags
+	 */
+	ret = kdbus_test_bus_creator_info(env);
+	/* Restore previous kdbus mask */
+	kdbus_sysfs_set_parameter_mask(env->mask_param_path,
+				       old_kdbus_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_sysfs_get_parameter_mask(env->mask_param_path,
+					     &flags_mask);
+	ASSERT_RETURN(ret == 0 && old_kdbus_flags_mask == flags_mask);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-benchmark.c b/tools/testing/selftests/kdbus/test-benchmark.c
new file mode 100644
index 000000000000..6cedd3f45fbd
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-benchmark.c
@@ -0,0 +1,427 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <locale.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <sys/time.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+#define SERVICE_NAME "foo.bar.echo"
+
+/*
+ * To have a banchmark comparison with unix socket, set:
+ * user_memfd	= false;
+ * compare_uds	= true;
+ * attach_none	= true;		do not attached metadata
+ */
+
+static bool use_memfd = true;		/* transmit memfd? */
+static bool compare_uds = false;		/* unix-socket comparison? */
+static bool attach_none = false;		/* clear attach-flags? */
+static char stress_payload[8192];
+
+struct stats {
+	uint64_t count;
+	uint64_t latency_acc;
+	uint64_t latency_low;
+	uint64_t latency_high;
+};
+
+static struct stats stats;
+
+static void reset_stats(void)
+{
+	stats.count = 0;
+	stats.latency_acc = 0;
+	stats.latency_low = UINT64_MAX;
+	stats.latency_high = 0;
+}
+
+static void dump_stats(bool is_uds)
+{
+	if (stats.count > 0) {
+		kdbus_printf("stats %s: %'llu packets processed, latency (nsecs) min/max/avg %'7llu // %'7llu // %'7llu\n",
+			     is_uds ? " (UNIX)" : "(KDBUS)",
+			     (unsigned long long) stats.count,
+			     (unsigned long long) stats.latency_low,
+			     (unsigned long long) stats.latency_high,
+			     (unsigned long long) (stats.latency_acc / stats.count));
+	} else {
+		kdbus_printf("*** no packets received. bus stuck?\n");
+	}
+}
+
+static void add_stats(uint64_t prev)
+{
+	uint64_t diff;
+
+	diff = now(CLOCK_THREAD_CPUTIME_ID) - prev;
+
+	stats.count++;
+	stats.latency_acc += diff;
+	if (stats.latency_low > diff)
+		stats.latency_low = diff;
+
+	if (stats.latency_high < diff)
+		stats.latency_high = diff;
+}
+
+static int setup_simple_kdbus_msg(struct kdbus_conn *conn,
+				  uint64_t dst_id,
+				  struct kdbus_msg **msg_out)
+{
+	struct kdbus_msg *msg;
+	struct kdbus_item *item;
+	uint64_t size;
+
+	size = sizeof(struct kdbus_msg);
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+
+	msg = malloc(size);
+	ASSERT_RETURN_VAL(msg, -ENOMEM);
+
+	memset(msg, 0, size);
+	msg->size = size;
+	msg->src_id = conn->id;
+	msg->dst_id = dst_id;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+
+	item = msg->items;
+
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t) stress_payload;
+	item->vec.size = sizeof(stress_payload);
+	item = KDBUS_ITEM_NEXT(item);
+
+	*msg_out = msg;
+
+	return 0;
+}
+
+static int setup_memfd_kdbus_msg(struct kdbus_conn *conn,
+				 uint64_t dst_id,
+				 off_t *memfd_item_offset,
+				 struct kdbus_msg **msg_out)
+{
+	struct kdbus_msg *msg;
+	struct kdbus_item *item;
+	uint64_t size;
+
+	size = sizeof(struct kdbus_msg);
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+
+	msg = malloc(size);
+	ASSERT_RETURN_VAL(msg, -ENOMEM);
+
+	memset(msg, 0, size);
+	msg->size = size;
+	msg->src_id = conn->id;
+	msg->dst_id = dst_id;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+
+	item = msg->items;
+
+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
+	item->vec.address = (uintptr_t) stress_payload;
+	item->vec.size = sizeof(stress_payload);
+	item = KDBUS_ITEM_NEXT(item);
+
+	item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
+	item->memfd.size = sizeof(uint64_t);
+
+	*memfd_item_offset = (unsigned char *)item - (unsigned char *)msg;
+	*msg_out = msg;
+
+	return 0;
+}
+
+static int
+send_echo_request(struct kdbus_conn *conn, uint64_t dst_id,
+		  void *kdbus_msg, off_t memfd_item_offset)
+{
+	struct kdbus_cmd_send cmd = {};
+	int memfd = -1;
+	int ret;
+
+	if (use_memfd) {
+		uint64_t now_ns = now(CLOCK_THREAD_CPUTIME_ID);
+		struct kdbus_item *item = memfd_item_offset + kdbus_msg;
+		memfd = sys_memfd_create("memfd-name", 0);
+		ASSERT_RETURN_VAL(memfd >= 0, memfd);
+
+		ret = write(memfd, &now_ns, sizeof(now_ns));
+		ASSERT_RETURN_VAL(ret == sizeof(now_ns), -EAGAIN);
+
+		ret = sys_memfd_seal_set(memfd);
+		ASSERT_RETURN_VAL(ret == 0, -errno);
+
+		item->memfd.fd = memfd;
+	}
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)kdbus_msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	ASSERT_RETURN_VAL(ret == 0, -errno);
+
+	close(memfd);
+
+	return 0;
+}
+
+static int
+handle_echo_reply(struct kdbus_conn *conn, uint64_t send_ns)
+{
+	int ret;
+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
+	struct kdbus_msg *msg;
+	const struct kdbus_item *item;
+	bool has_memfd = false;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_RECV, &recv);
+	if (ret < 0 && errno == EAGAIN)
+		return -EAGAIN;
+
+	ASSERT_RETURN_VAL(ret == 0, -errno);
+
+	if (!use_memfd)
+		goto out;
+
+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
+
+	KDBUS_ITEM_FOREACH(item, msg, items) {
+		switch (item->type) {
+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
+			char *buf;
+
+			buf = mmap(NULL, item->memfd.size, PROT_READ,
+				   MAP_PRIVATE, item->memfd.fd, 0);
+			ASSERT_RETURN_VAL(buf != MAP_FAILED, -EINVAL);
+			ASSERT_RETURN_VAL(item->memfd.size == sizeof(uint64_t),
+					  -EINVAL);
+
+			add_stats(*(uint64_t*)buf);
+			munmap(buf, item->memfd.size);
+			close(item->memfd.fd);
+			has_memfd = true;
+			break;
+		}
+
+		case KDBUS_ITEM_PAYLOAD_OFF:
+			/* ignore */
+			break;
+		}
+	}
+
+out:
+	if (!has_memfd)
+		add_stats(send_ns);
+
+	ret = kdbus_free(conn, recv.msg.offset);
+	ASSERT_RETURN_VAL(ret == 0, -errno);
+
+	return 0;
+}
+
+static int benchmark(struct kdbus_test_env *env)
+{
+	static char buf[sizeof(stress_payload)];
+	struct kdbus_msg *kdbus_msg = NULL;
+	off_t memfd_cached_offset = 0;
+	int ret;
+	struct kdbus_conn *conn_a, *conn_b;
+	struct pollfd fds[2];
+	uint64_t start, send_ns, now_ns, diff;
+	unsigned int i;
+	int uds[2];
+
+	setlocale(LC_ALL, "");
+
+	for (i = 0; i < sizeof(stress_payload); i++)
+		stress_payload[i] = i;
+
+	/* setup kdbus pair */
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	ret = kdbus_add_match_empty(conn_a);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(conn_b);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(conn_a, SERVICE_NAME, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	if (attach_none) {
+		ret = kdbus_conn_update_attach_flags(conn_a,
+						     _KDBUS_ATTACH_ALL,
+						     0);
+		ASSERT_RETURN(ret == 0);
+	}
+
+	/* setup UDS pair */
+
+	ret = socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0, uds);
+	ASSERT_RETURN(ret == 0);
+
+	/* setup a kdbus msg now */
+	if (use_memfd) {
+		ret = setup_memfd_kdbus_msg(conn_b, conn_a->id,
+					    &memfd_cached_offset,
+					    &kdbus_msg);
+		ASSERT_RETURN(ret == 0);
+	} else {
+		ret = setup_simple_kdbus_msg(conn_b, conn_a->id, &kdbus_msg);
+		ASSERT_RETURN(ret == 0);
+	}
+
+	/* start benchmark */
+
+	kdbus_printf("-- entering poll loop ...\n");
+
+	do {
+		/* run kdbus benchmark */
+		fds[0].fd = conn_a->fd;
+		fds[1].fd = conn_b->fd;
+
+		/* cancel any pending message */
+		handle_echo_reply(conn_a, 0);
+
+		start = now(CLOCK_THREAD_CPUTIME_ID);
+		reset_stats();
+
+		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
+		ret = send_echo_request(conn_b, conn_a->id,
+					kdbus_msg, memfd_cached_offset);
+		ASSERT_RETURN(ret == 0);
+
+		while (1) {
+			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
+			unsigned int i;
+
+			for (i = 0; i < nfds; i++) {
+				fds[i].events = POLLIN | POLLPRI | POLLHUP;
+				fds[i].revents = 0;
+			}
+
+			ret = poll(fds, nfds, 10);
+			if (ret < 0)
+				break;
+
+			if (fds[0].revents & POLLIN) {
+				ret = handle_echo_reply(conn_a, send_ns);
+				ASSERT_RETURN(ret == 0);
+
+				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
+				ret = send_echo_request(conn_b, conn_a->id,
+							kdbus_msg,
+							memfd_cached_offset);
+				ASSERT_RETURN(ret == 0);
+			}
+
+			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
+			diff = now_ns - start;
+			if (diff > 1000000000ULL) {
+				start = now_ns;
+
+				dump_stats(false);
+				break;
+			}
+		}
+
+		if (!compare_uds)
+			continue;
+
+		/* run unix-socket benchmark as comparison */
+
+		fds[0].fd = uds[0];
+		fds[1].fd = uds[1];
+
+		/* cancel any pendign message */
+		read(uds[1], buf, sizeof(buf));
+
+		start = now(CLOCK_THREAD_CPUTIME_ID);
+		reset_stats();
+
+		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
+		ret = write(uds[0], stress_payload, sizeof(stress_payload));
+		ASSERT_RETURN(ret == sizeof(stress_payload));
+
+		while (1) {
+			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
+			unsigned int i;
+
+			for (i = 0; i < nfds; i++) {
+				fds[i].events = POLLIN | POLLPRI | POLLHUP;
+				fds[i].revents = 0;
+			}
+
+			ret = poll(fds, nfds, 10);
+			if (ret < 0)
+				break;
+
+			if (fds[1].revents & POLLIN) {
+				ret = read(uds[1], buf, sizeof(buf));
+				ASSERT_RETURN(ret == sizeof(buf));
+
+				add_stats(send_ns);
+
+				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
+				ret = write(uds[0], buf, sizeof(buf));
+				ASSERT_RETURN(ret == sizeof(buf));
+			}
+
+			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
+			diff = now_ns - start;
+			if (diff > 1000000000ULL) {
+				start = now_ns;
+
+				dump_stats(true);
+				break;
+			}
+		}
+
+	} while (kdbus_util_verbose);
+
+	kdbus_printf("-- closing bus connections\n");
+
+	free(kdbus_msg);
+
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	return (stats.count > 1) ? TEST_OK : TEST_ERR;
+}
+
+int kdbus_test_benchmark(struct kdbus_test_env *env)
+{
+	use_memfd = true;
+	return benchmark(env);
+}
+
+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env)
+{
+	use_memfd = false;
+	return benchmark(env);
+}
diff --git a/tools/testing/selftests/kdbus/test-bus.c b/tools/testing/selftests/kdbus/test-bus.c
new file mode 100644
index 000000000000..1b9ce0b011bd
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-bus.c
@@ -0,0 +1,174 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <limits.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
+					 uint64_t type)
+{
+	struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, info, items)
+		if (item->type == type)
+			return item;
+
+	return NULL;
+}
+
+static int test_bus_creator_info(const char *bus_path)
+{
+	int ret;
+	uint64_t offset;
+	struct kdbus_conn *conn;
+	struct kdbus_info *info;
+	struct kdbus_item *item;
+	char *tmp, *busname;
+
+	/* extract the bus-name from @bus_path */
+	tmp = strdup(bus_path);
+	ASSERT_RETURN(tmp);
+	busname = strrchr(tmp, '/');
+	ASSERT_RETURN(busname);
+	*busname = 0;
+	busname = strrchr(tmp, '/');
+	ASSERT_RETURN(busname);
+	++busname;
+
+	conn = kdbus_hello(bus_path, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_bus_creator_info(conn, _KDBUS_ATTACH_ALL, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+
+	item = kdbus_get_item(info, KDBUS_ITEM_MAKE_NAME);
+	ASSERT_RETURN(item);
+	ASSERT_RETURN(!strcmp(item->str, busname));
+
+	ret = kdbus_free(conn, offset);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	free(tmp);
+	return 0;
+}
+
+int kdbus_test_bus_make(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_make head;
+
+		/* bloom size item */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_bloom_parameter bloom;
+		} bs;
+
+		/* name item */
+		uint64_t n_size;
+		uint64_t n_type;
+		char name[64];
+	} bus_make;
+	char s[PATH_MAX], *name;
+	int ret, control_fd2;
+	uid_t uid;
+
+	name = unique_name("");
+	ASSERT_RETURN(name);
+
+	snprintf(s, sizeof(s), "%s/control", env->root);
+	env->control_fd = open(s, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(env->control_fd >= 0);
+
+	control_fd2 = open(s, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(control_fd2 >= 0);
+
+	memset(&bus_make, 0, sizeof(bus_make));
+
+	bus_make.bs.size = sizeof(bus_make.bs);
+	bus_make.bs.type = KDBUS_ITEM_BLOOM_PARAMETER;
+	bus_make.bs.bloom.size = 64;
+	bus_make.bs.bloom.n_hash = 1;
+
+	bus_make.n_type = KDBUS_ITEM_MAKE_NAME;
+
+	uid = getuid();
+
+	/* missing uid prefix */
+	snprintf(bus_make.name, sizeof(bus_make.name), "foo");
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(env->control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	/* non alphanumeric character */
+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah@123", uid);
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(env->control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	/* '-' at the end */
+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah-", uid);
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(env->control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	/* create a new bus */
+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-1", uid, name);
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(env->control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == 0);
+
+	ret = ioctl(control_fd2, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == -1 && errno == EEXIST);
+
+	snprintf(s, sizeof(s), "%s/%u-%s-1/bus", env->root, uid, name);
+	ASSERT_RETURN(access(s, F_OK) == 0);
+
+	ret = test_bus_creator_info(s);
+	ASSERT_RETURN(ret == 0);
+
+	/* can't use the same fd for bus make twice, even though a different
+	 * bus name is used
+	 */
+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(env->control_fd, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == -1 && errno == EBADFD);
+
+	/* create a new bus, with different fd and different bus name */
+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
+	bus_make.head.size = sizeof(struct kdbus_cmd_make) +
+			     sizeof(bus_make.bs) + bus_make.n_size;
+	ret = ioctl(control_fd2, KDBUS_CMD_BUS_MAKE, &bus_make);
+	ASSERT_RETURN(ret == 0);
+
+	close(control_fd2);
+	free(name);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-chat.c b/tools/testing/selftests/kdbus/test-chat.c
new file mode 100644
index 000000000000..6a0efbcc3846
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-chat.c
@@ -0,0 +1,123 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+int kdbus_test_chat(struct kdbus_test_env *env)
+{
+	int ret, cookie;
+	struct kdbus_conn *conn_a, *conn_b;
+	struct pollfd fds[2];
+	uint64_t flags;
+	int count;
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
+	ret = kdbus_name_acquire(conn_a, "foo.bar.test", &flags);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(conn_a, "foo.bar.baz", NULL);
+	ASSERT_RETURN(ret == 0);
+
+	flags = KDBUS_NAME_QUEUE;
+	ret = kdbus_name_acquire(conn_b, "foo.bar.baz", &flags);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
+	ASSERT_RETURN(ret == -EALREADY);
+
+	ret = kdbus_name_release(conn_a, "foo.bar.double");
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_release(conn_a, "foo.bar.double");
+	ASSERT_RETURN(ret == -ESRCH);
+
+	ret = kdbus_name_list(conn_b, KDBUS_NAME_LIST_UNIQUE |
+				      KDBUS_NAME_LIST_NAMES  |
+				      KDBUS_NAME_LIST_QUEUED |
+				      KDBUS_NAME_LIST_ACTIVATORS);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(conn_a);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(conn_b);
+	ASSERT_RETURN(ret == 0);
+
+	cookie = 0;
+	ret = kdbus_msg_send(conn_b, NULL, 0xc0000000 | cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	fds[0].fd = conn_a->fd;
+	fds[1].fd = conn_b->fd;
+
+	kdbus_printf("-- entering poll loop ...\n");
+
+	for (count = 0;; count++) {
+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
+
+		for (i = 0; i < nfds; i++) {
+			fds[i].events = POLLIN | POLLPRI | POLLHUP;
+			fds[i].revents = 0;
+		}
+
+		ret = poll(fds, nfds, 3000);
+		ASSERT_RETURN(ret >= 0);
+
+		if (fds[0].revents & POLLIN) {
+			if (count > 2)
+				kdbus_name_release(conn_a, "foo.bar.baz");
+
+			ret = kdbus_msg_recv(conn_a, NULL, NULL);
+			ASSERT_RETURN(ret == 0);
+			ret = kdbus_msg_send(conn_a, NULL,
+					     0xc0000000 | cookie++,
+					     0, 0, 0, conn_b->id);
+			ASSERT_RETURN(ret == 0);
+		}
+
+		if (fds[1].revents & POLLIN) {
+			ret = kdbus_msg_recv(conn_b, NULL, NULL);
+			ASSERT_RETURN(ret == 0);
+			ret = kdbus_msg_send(conn_b, NULL,
+					     0xc0000000 | cookie++,
+					     0, 0, 0, conn_a->id);
+			ASSERT_RETURN(ret == 0);
+		}
+
+		ret = kdbus_name_list(conn_b, KDBUS_NAME_LIST_UNIQUE |
+					      KDBUS_NAME_LIST_NAMES  |
+					      KDBUS_NAME_LIST_QUEUED |
+					      KDBUS_NAME_LIST_ACTIVATORS);
+		ASSERT_RETURN(ret == 0);
+
+		if (count > 10)
+			break;
+	}
+
+	kdbus_printf("-- closing bus connections\n");
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-connection.c b/tools/testing/selftests/kdbus/test-connection.c
new file mode 100644
index 000000000000..db19b8163535
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-connection.c
@@ -0,0 +1,611 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <limits.h>
+#include <sys/types.h>
+#include <sys/capability.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+int kdbus_test_hello(struct kdbus_test_env *env)
+{
+	struct kdbus_cmd_free cmd_free = {};
+	struct kdbus_cmd_hello hello;
+	int fd, ret;
+
+	memset(&hello, 0, sizeof(hello));
+
+	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(fd >= 0);
+
+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
+	hello.size = sizeof(struct kdbus_cmd_hello);
+	hello.pool_size = POOL_SIZE;
+
+	/* an unaligned hello must result in -EFAULT */
+	ret = ioctl(fd, KDBUS_CMD_HELLO, (char *) &hello + 1);
+	ASSERT_RETURN(ret == -1 && errno == EFAULT);
+
+	/* a size of 0 must return EMSGSIZE */
+	hello.size = 1;
+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	hello.size = sizeof(struct kdbus_cmd_hello);
+
+	/* check faulty flags */
+	hello.flags = 1ULL << 32;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	/* kernel must have set its bit in the ioctl buffer */
+	ASSERT_RETURN(hello.kernel_flags & KDBUS_FLAG_KERNEL);
+
+	/* check for faulty pool sizes */
+	hello.pool_size = 0;
+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == EFAULT);
+
+	hello.pool_size = 4097;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == EFAULT);
+
+	hello.pool_size = POOL_SIZE;
+
+	/*
+	 * The connection created by the core requires ALL meta flags
+	 * to be sent. An attempt to send less that that should result
+	 * in -ECONNREFUSED.
+	 */
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL & ~KDBUS_ATTACH_TIMESTAMP;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == ECONNREFUSED);
+
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	hello.offset = (__u64)-1;
+
+	/* success test */
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == 0);
+
+	/* The kernel should have set KDBUS_FLAG_KERNEL */
+	ASSERT_RETURN(hello.attach_flags_send & KDBUS_FLAG_KERNEL);
+
+	/* The kernel should have returned some items */
+	ASSERT_RETURN(hello.offset != (__u64)-1);
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = hello.offset;
+	ret = ioctl(fd, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret >= 0);
+
+	close(fd);
+
+	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(fd >= 0);
+
+	/* no ACTIVATOR flag without a name */
+	hello.flags = KDBUS_HELLO_ACTIVATOR;
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	close(fd);
+
+	return TEST_OK;
+}
+
+int kdbus_test_byebye(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn;
+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
+	int ret;
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	ret = kdbus_add_match_empty(conn);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(env->conn);
+	ASSERT_RETURN(ret == 0);
+
+	/* send over 1st connection */
+	ret = kdbus_msg_send(env->conn, NULL, 0, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	/* say byebye on the 2nd, which must fail */
+	ret = ioctl(conn->fd, KDBUS_CMD_BYEBYE, 0);
+	ASSERT_RETURN(ret == -1 && errno == EBUSY);
+
+	/* receive the message */
+	ret = ioctl(conn->fd, KDBUS_CMD_RECV, &recv);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_free(conn, recv.msg.offset);
+	ASSERT_RETURN(ret == 0);
+
+	/* and try again */
+	ret = ioctl(conn->fd, KDBUS_CMD_BYEBYE, 0);
+	ASSERT_RETURN(ret == 0);
+
+	/* a 2nd try should result in -EALREADY */
+	ret = ioctl(conn->fd, KDBUS_CMD_BYEBYE, 0);
+	ASSERT_RETURN(ret == -1 && errno == EALREADY);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+/* Get only the first item */
+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
+					 uint64_t type)
+{
+	struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, info, items)
+		if (item->type == type)
+			return item;
+
+	return NULL;
+}
+
+static unsigned int kdbus_count_item(struct kdbus_info *info,
+				     uint64_t type)
+{
+	unsigned int i = 0;
+	const struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, info, items)
+		if (item->type == type)
+			i++;
+
+	return i;
+}
+
+static int kdbus_fuzz_conn_info(struct kdbus_test_env *env, int capable)
+{
+	int ret;
+	unsigned int cnt = 0;
+	uint64_t offset = 0;
+	uint64_t kdbus_flags_mask;
+	struct kdbus_info *info;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *privileged;
+	const struct kdbus_item *item;
+	uint64_t valid_flags_set;
+	uint64_t invalid_flags_set;
+	uint64_t valid_flags = KDBUS_ATTACH_NAMES |
+			       KDBUS_ATTACH_CREDS |
+			       KDBUS_ATTACH_PIDS |
+			       KDBUS_ATTACH_CONN_DESCRIPTION;
+
+	uint64_t invalid_flags = KDBUS_ATTACH_NAMES	|
+				 KDBUS_ATTACH_CREDS	|
+				 KDBUS_ATTACH_PIDS	|
+				 KDBUS_ATTACH_CAPS	|
+				 KDBUS_ATTACH_CGROUP	|
+				 KDBUS_ATTACH_CONN_DESCRIPTION;
+
+	struct kdbus_creds cached_creds;
+
+	getresuid(&cached_creds.uid, &cached_creds.euid, &cached_creds.suid);
+	getresgid(&cached_creds.gid, &cached_creds.egid, &cached_creds.sgid);
+
+	cached_creds.fsuid = cached_creds.uid;
+	cached_creds.fsgid = cached_creds.gid;
+
+	struct kdbus_pids cached_pids = {
+		.pid	= getpid(),
+		.tid	= syscall(SYS_gettid),
+		.ppid	= getppid(),
+	};
+
+	ret = kdbus_sysfs_get_parameter_mask(env->mask_param_path,
+					     &kdbus_flags_mask);
+	ASSERT_RETURN(ret == 0);
+
+	valid_flags_set = valid_flags & kdbus_flags_mask;
+	invalid_flags_set = invalid_flags & kdbus_flags_mask;
+
+	ret = kdbus_conn_info(env->conn, env->conn->id, NULL,
+			      valid_flags, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(env->conn->buf + offset);
+	ASSERT_RETURN(info->id == env->conn->id);
+
+	/* We do not have any well-known name */
+	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
+	ASSERT_RETURN(item == NULL);
+
+	item = kdbus_get_item(info, KDBUS_ITEM_CONN_DESCRIPTION);
+	if (valid_flags_set & KDBUS_ATTACH_CONN_DESCRIPTION) {
+		ASSERT_RETURN(item);
+	} else {
+		ASSERT_RETURN(item == NULL);
+	}
+
+	kdbus_free(env->conn, offset);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	privileged = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(privileged);
+
+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	/* We do not have any well-known name */
+	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
+	ASSERT_RETURN(item == NULL);
+
+	cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
+	if (valid_flags_set & KDBUS_ATTACH_CREDS) {
+		ASSERT_RETURN(cnt == 1);
+
+		item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
+		ASSERT_RETURN(item);
+
+		/* Compare received items with cached creds */
+		ASSERT_RETURN(memcmp(&item->creds, &cached_creds,
+				      sizeof(struct kdbus_creds)) == 0);
+	} else {
+		ASSERT_RETURN(cnt == 0);
+	}
+
+	item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
+	if (valid_flags_set & KDBUS_ATTACH_PIDS) {
+		ASSERT_RETURN(item);
+
+		/* Compare item->pids with cached PIDs */
+		ASSERT_RETURN(item->pids.pid == cached_pids.pid &&
+			      item->pids.tid == cached_pids.tid &&
+			      item->pids.ppid == cached_pids.ppid);
+	} else {
+		ASSERT_RETURN(item == NULL);
+	}
+
+	/* We did not request KDBUS_ITEM_CAPS */
+	item = kdbus_get_item(info, KDBUS_ITEM_CAPS);
+	ASSERT_RETURN(item == NULL);
+
+	kdbus_free(conn, offset);
+
+	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
+	if (valid_flags_set & KDBUS_ATTACH_NAMES) {
+		ASSERT_RETURN(item && !strcmp(item->name.name, "com.example.a"));
+	} else {
+		ASSERT_RETURN(item == NULL);
+	}
+
+	kdbus_free(conn, offset);
+
+	ret = kdbus_conn_info(conn, 0, "com.example.a", valid_flags, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	kdbus_free(conn, offset);
+
+	/* does not have the necessary caps to drop to unprivileged */
+	if (!capable)
+		goto continue_test;
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
+		ret = kdbus_conn_info(conn, conn->id, NULL,
+				      valid_flags, &offset);
+		ASSERT_EXIT(ret == 0);
+
+		info = (struct kdbus_info *)(conn->buf + offset);
+		ASSERT_EXIT(info->id == conn->id);
+
+		if (valid_flags_set & KDBUS_ATTACH_NAMES) {
+			item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
+			ASSERT_EXIT(item &&
+				    strcmp(item->name.name,
+				           "com.example.a") == 0);
+		}
+
+		if (valid_flags_set & KDBUS_ATTACH_CREDS) {
+			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
+			ASSERT_EXIT(item);
+
+			/* Compare received items with cached creds */
+			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
+				    sizeof(struct kdbus_creds)) == 0);
+		}
+
+		if (valid_flags_set & KDBUS_ATTACH_PIDS) {
+			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
+			ASSERT_EXIT(item);
+
+			/*
+			 * Compare item->pids with cached pids of
+			 * privileged one.
+			 *
+			 * cmd_info will always return cached pids.
+			 */
+			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
+				    item->pids.tid == cached_pids.tid);
+		}
+
+		kdbus_free(conn, offset);
+
+		/*
+		 * Use invalid_flags and make sure that userspace
+		 * do not play with us.
+		 */
+		ret = kdbus_conn_info(conn, conn->id, NULL,
+				      invalid_flags, &offset);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * Make sure that we return only one creds item and
+		 * it points to the cached creds.
+		 */
+		cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
+		if (invalid_flags_set & KDBUS_ATTACH_CREDS) {
+			ASSERT_EXIT(cnt == 1);
+
+			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
+			ASSERT_EXIT(item);
+
+			/* Compare received items with cached creds */
+			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
+				    sizeof(struct kdbus_creds)) == 0);
+		} else {
+			ASSERT_EXIT(cnt == 0);
+		}
+
+		if (invalid_flags_set & KDBUS_ATTACH_PIDS) {
+			cnt = kdbus_count_item(info, KDBUS_ITEM_PIDS);
+			ASSERT_EXIT(cnt == 1);
+
+			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
+			ASSERT_EXIT(item);
+
+			/* Compare item->pids with cached pids */
+			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
+				    item->pids.tid == cached_pids.tid);
+		}
+
+		cnt = kdbus_count_item(info, KDBUS_ITEM_CGROUP);
+		if (invalid_flags_set & KDBUS_ATTACH_CGROUP) {
+			ASSERT_EXIT(cnt == 1);
+		} else {
+			ASSERT_EXIT(cnt == 0);
+		}
+
+		cnt = kdbus_count_item(info, KDBUS_ITEM_CAPS);
+		if (invalid_flags_set & KDBUS_ATTACH_CAPS) {
+			ASSERT_EXIT(cnt == 1);
+		} else {
+			ASSERT_EXIT(cnt == 0);
+		}
+
+		kdbus_free(conn, offset);
+	}),
+	({ 0; }));
+	ASSERT_RETURN(ret == 0);
+
+continue_test:
+
+	/* A second name */
+	ret = kdbus_name_acquire(conn, "com.example.b", NULL);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
+	ASSERT_RETURN(ret == 0);
+
+	info = (struct kdbus_info *)(conn->buf + offset);
+	ASSERT_RETURN(info->id == conn->id);
+
+	cnt = kdbus_count_item(info, KDBUS_ITEM_OWNED_NAME);
+	if (valid_flags_set & KDBUS_ATTACH_NAMES) {
+		ASSERT_RETURN(cnt == 2);
+	} else {
+		ASSERT_RETURN(cnt == 0);
+	}
+
+	kdbus_free(conn, offset);
+
+	ASSERT_RETURN(ret == 0);
+
+	return 0;
+}
+
+int kdbus_test_conn_info(struct kdbus_test_env *env)
+{
+	int ret;
+	int have_caps;
+	struct {
+		struct kdbus_cmd_info cmd_info;
+
+		struct {
+			uint64_t size;
+			uint64_t type;
+			char str[64];
+		} name;
+	} buf;
+
+	buf.cmd_info.size = sizeof(struct kdbus_cmd_info);
+	buf.cmd_info.flags = 0;
+	buf.cmd_info.id = env->conn->id;
+
+	ret = kdbus_conn_info(env->conn, env->conn->id, NULL, 0, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* try to pass a name that is longer than the buffer's size */
+	buf.name.size = KDBUS_ITEM_HEADER_SIZE + 1;
+	buf.name.type = KDBUS_ITEM_NAME;
+	strcpy(buf.name.str, "foo.bar.bla");
+
+	buf.cmd_info.id = 0;
+	buf.cmd_info.size = sizeof(buf.cmd_info) + buf.name.size;
+	ret = ioctl(env->conn->fd, KDBUS_CMD_CONN_INFO, &buf);
+	ASSERT_RETURN(ret == -1 && errno == EINVAL);
+
+	/* Pass a non existent name */
+	ret = kdbus_conn_info(env->conn, 0, "non.existent.name", 0, NULL);
+	ASSERT_RETURN(ret == -ESRCH);
+
+	if (!all_uids_gids_are_mapped())
+		return TEST_SKIP;
+
+	/* Test for caps here, so we run the previous test */
+	have_caps = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
+	ASSERT_RETURN(have_caps >= 0);
+
+	ret = kdbus_fuzz_conn_info(env, have_caps);
+	ASSERT_RETURN(ret == 0);
+
+	/* Now if we have skipped some tests then let the user know */
+	if (!have_caps)
+		return TEST_SKIP;
+
+	return TEST_OK;
+}
+
+int kdbus_test_conn_update(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	int found = 0;
+	int ret;
+
+	/*
+	 * kdbus_hello() sets all attach flags. Receive a message by this
+	 * connection, and make sure a timestamp item (just to pick one) is
+	 * present.
+	 */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
+	ASSERT_RETURN(found == 1);
+
+	kdbus_msg_free(msg);
+
+	/*
+	 * Now, modify the attach flags and repeat the action. The item must
+	 * now be missing.
+	 */
+	found = 0;
+
+	ret = kdbus_conn_update_attach_flags(conn,
+					     _KDBUS_ATTACH_ALL,
+					     _KDBUS_ATTACH_ALL &
+					     ~KDBUS_ATTACH_TIMESTAMP);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
+	ASSERT_RETURN(found == 0);
+
+	/* Provide a bogus attach_flags value */
+	ret = kdbus_conn_update_attach_flags(conn,
+					     _KDBUS_ATTACH_ALL + 1,
+					     _KDBUS_ATTACH_ALL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	kdbus_msg_free(msg);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+int kdbus_test_writable_pool(struct kdbus_test_env *env)
+{
+	struct kdbus_cmd_free cmd_free = {};
+	struct kdbus_cmd_hello hello;
+	int fd, ret;
+	void *map;
+
+	fd = open(env->buspath, O_RDWR | O_CLOEXEC);
+	ASSERT_RETURN(fd >= 0);
+
+	memset(&hello, 0, sizeof(hello));
+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
+	hello.size = sizeof(struct kdbus_cmd_hello);
+	hello.pool_size = POOL_SIZE;
+	hello.offset = (__u64)-1;
+
+	/* success test */
+	ret = ioctl(fd, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == 0);
+
+	/* The kernel should have returned some items */
+	ASSERT_RETURN(hello.offset != (__u64)-1);
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = hello.offset;
+	ret = ioctl(fd, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret >= 0);
+
+	/* pools cannot be mapped writable */
+	map = mmap(NULL, POOL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	ASSERT_RETURN(map == MAP_FAILED);
+
+	/* pools can always be mapped readable */
+	map = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
+	ASSERT_RETURN(map != MAP_FAILED);
+
+	/* make sure we cannot change protection masks to writable */
+	ret = mprotect(map, POOL_SIZE, PROT_READ | PROT_WRITE);
+	ASSERT_RETURN(ret < 0);
+
+	munmap(map, POOL_SIZE);
+	close(fd);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-daemon.c b/tools/testing/selftests/kdbus/test-daemon.c
new file mode 100644
index 000000000000..9007e38d6a7a
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-daemon.c
@@ -0,0 +1,66 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+int kdbus_test_daemon(struct kdbus_test_env *env)
+{
+	struct pollfd fds[2];
+	int count;
+	int ret;
+
+	/* This test doesn't make any sense in non-interactive mode */
+	if (!kdbus_util_verbose)
+		return TEST_OK;
+
+	printf("Created connection %llu on bus '%s'\n",
+		(unsigned long long) env->conn->id, env->buspath);
+
+	ret = kdbus_name_acquire(env->conn, "com.example.kdbus-test", NULL);
+	ASSERT_RETURN(ret == 0);
+	printf("  Aquired name: com.example.kdbus-test\n");
+
+	fds[0].fd = env->conn->fd;
+	fds[1].fd = STDIN_FILENO;
+
+	printf("Monitoring connections:\n");
+
+	for (count = 0;; count++) {
+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
+
+		for (i = 0; i < nfds; i++) {
+			fds[i].events = POLLIN | POLLPRI | POLLHUP;
+			fds[i].revents = 0;
+		}
+
+		ret = poll(fds, nfds, -1);
+		if (ret <= 0)
+			break;
+
+		if (fds[0].revents & POLLIN) {
+			ret = kdbus_msg_recv(env->conn, NULL, NULL);
+			ASSERT_RETURN(ret == 0);
+		}
+
+		/* stdin */
+		if (fds[1].revents & POLLIN)
+			break;
+	}
+
+	printf("Closing bus connection\n");
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-endpoint.c b/tools/testing/selftests/kdbus/test-endpoint.c
new file mode 100644
index 000000000000..b9662a0a8f4a
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-endpoint.c
@@ -0,0 +1,344 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <libgen.h>
+#include <sys/capability.h>
+#include <sys/ioctl.h>
+#include <sys/wait.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+#define KDBUS_SYSNAME_MAX_LEN			63
+
+static int install_name_add_match(struct kdbus_conn *conn, const char *name)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_name_change chg;
+		} item;
+		char name[64];
+	} buf;
+	int ret;
+
+	/* install the match rule */
+	memset(&buf, 0, sizeof(buf));
+	buf.item.type = KDBUS_ITEM_NAME_ADD;
+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
+	strncpy(buf.name, name, sizeof(buf.name) - 1);
+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int create_endpoint(const char *buspath, uid_t uid, const char *name,
+			   uint64_t flags)
+{
+	struct {
+		struct kdbus_cmd_make head;
+
+		/* name item */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			/* max should be KDBUS_SYSNAME_MAX_LEN */
+			char str[128];
+		} name;
+	} ep_make;
+	int fd, ret;
+
+	fd = open(buspath, O_RDWR);
+	if (fd < 0)
+		return fd;
+
+	memset(&ep_make, 0, sizeof(ep_make));
+
+	snprintf(ep_make.name.str,
+		 /* Use the KDBUS_SYSNAME_MAX_LEN or sizeof(str) */
+		 KDBUS_SYSNAME_MAX_LEN > strlen(name) ?
+		 KDBUS_SYSNAME_MAX_LEN : sizeof(ep_make.name.str),
+		 "%u-%s", uid, name);
+
+	ep_make.name.type = KDBUS_ITEM_MAKE_NAME;
+	ep_make.name.size = KDBUS_ITEM_HEADER_SIZE +
+			    strlen(ep_make.name.str) + 1;
+
+	ep_make.head.flags = flags;
+	ep_make.head.size = sizeof(ep_make.head) +
+			    ep_make.name.size;
+
+	ret = ioctl(fd, KDBUS_CMD_ENDPOINT_MAKE, &ep_make);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error creating endpoint: %d (%m)\n", ret);
+		return ret;
+	}
+
+	return fd;
+}
+
+static int unpriv_test_custom_ep(const char *buspath)
+{
+	int ret, ep_fd1, ep_fd2;
+	char *ep1, *ep2, *tmp1, *tmp2;
+
+	tmp1 = strdup(buspath);
+	tmp2 = strdup(buspath);
+	ASSERT_RETURN(tmp1 && tmp2);
+
+	ret = asprintf(&ep1, "%s/%u-%s", dirname(tmp1), getuid(), "apps1");
+	ASSERT_RETURN(ret >= 0);
+
+	ret = asprintf(&ep2, "%s/%u-%s", dirname(tmp2), getuid(), "apps2");
+	ASSERT_RETURN(ret >= 0);
+
+	free(tmp1);
+	free(tmp2);
+
+	/* endpoint only accessible to current uid */
+	ep_fd1 = create_endpoint(buspath, getuid(), "apps1", 0);
+	ASSERT_RETURN(ep_fd1 >= 0);
+
+	/* endpoint world accessible */
+	ep_fd2 = create_endpoint(buspath, getuid(), "apps2",
+				  KDBUS_MAKE_ACCESS_WORLD);
+	ASSERT_RETURN(ep_fd2 >= 0);
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
+		int ep_fd;
+		struct kdbus_conn *ep_conn;
+
+		/*
+		 * Make sure that we are not able to create custom
+		 * endpoints
+		 */
+		ep_fd = create_endpoint(buspath, getuid(),
+					"unpriv_costum_ep", 0);
+		ASSERT_EXIT(ep_fd == -EPERM);
+
+		/*
+		 * Endpoint "apps1" only accessible to same users,
+		 * that own the endpoint. Access denied by VFS
+		 */
+		ep_conn = kdbus_hello(ep1, 0, NULL, 0);
+		ASSERT_EXIT(!ep_conn && errno == EACCES);
+
+		/* Endpoint "apps2" world accessible */
+		ep_conn = kdbus_hello(ep2, 0, NULL, 0);
+		ASSERT_EXIT(ep_conn);
+
+		kdbus_conn_free(ep_conn);
+
+		_exit(EXIT_SUCCESS);
+	}),
+	({ 0; }));
+	ASSERT_RETURN(ret == 0);
+
+	close(ep_fd1);
+	close(ep_fd2);
+	free(ep1);
+	free(ep2);
+
+	return 0;
+}
+
+static int update_endpoint(int fd, const char *name)
+{
+	int len = strlen(name) + 1;
+	struct {
+		struct kdbus_cmd_update head;
+
+		/* name item */
+		struct {
+			uint64_t size;
+			uint64_t type;
+			char str[KDBUS_ALIGN8(len)];
+		} name;
+
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_policy_access access;
+		} access;
+	} ep_update;
+	int ret;
+
+	memset(&ep_update, 0, sizeof(ep_update));
+
+	ep_update.name.size = KDBUS_ITEM_HEADER_SIZE + len;
+	ep_update.name.type = KDBUS_ITEM_NAME;
+	strncpy(ep_update.name.str, name, sizeof(ep_update.name.str) - 1);
+
+	ep_update.access.size = sizeof(ep_update.access);
+	ep_update.access.type = KDBUS_ITEM_POLICY_ACCESS;
+	ep_update.access.access.type = KDBUS_POLICY_ACCESS_WORLD;
+	ep_update.access.access.access = KDBUS_POLICY_SEE;
+
+	ep_update.head.size = sizeof(ep_update);
+
+	ret = ioctl(fd, KDBUS_CMD_ENDPOINT_UPDATE, &ep_update);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error updating endpoint: %d (%m)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int kdbus_test_custom_endpoint(struct kdbus_test_env *env)
+{
+	char *ep, *tmp;
+	int ret, ep_fd;
+	struct kdbus_msg *msg;
+	struct kdbus_conn *ep_conn;
+	struct kdbus_conn *reader;
+	const char *name = "foo.bar.baz";
+	const char *epname = "foo";
+	char fake_ep[KDBUS_SYSNAME_MAX_LEN + 1] = {'\0'};
+
+	memset(fake_ep, 'X', sizeof(fake_ep) - 1);
+
+	/* Try to create a custom endpoint with a long name */
+	ret = create_endpoint(env->buspath, getuid(), fake_ep, 0);
+	ASSERT_RETURN(ret == -ENAMETOOLONG);
+
+	/* Try to create a custom endpoint with a different uid */
+	ret = create_endpoint(env->buspath, getuid() + 1, "foobar", 0);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	/* create a custom endpoint, and open a connection on it */
+	ep_fd = create_endpoint(env->buspath, getuid(), "foo", 0);
+	ASSERT_RETURN(ep_fd >= 0);
+
+	tmp = strdup(env->buspath);
+	ASSERT_RETURN(tmp);
+
+	ret = asprintf(&ep, "%s/%u-%s", dirname(tmp), getuid(), epname);
+	free(tmp);
+	ASSERT_RETURN(ret >= 0);
+
+	/* Register a connection that listen to broadcasts */
+	reader = kdbus_hello(ep, 0, NULL, 0);
+	ASSERT_RETURN(reader);
+
+	/* Register to kernel signals */
+	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	ret = install_name_add_match(reader, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* Monitor connections are not supported on custom endpoints */
+	ep_conn = kdbus_hello(ep, KDBUS_HELLO_MONITOR, NULL, 0);
+	ASSERT_RETURN(!ep_conn && errno == EOPNOTSUPP);
+
+	ep_conn = kdbus_hello(ep, 0, NULL, 0);
+	ASSERT_RETURN(ep_conn);
+
+	/*
+	 * Add a name add match on the endpoint connection, acquire name from
+	 * the unfiltered connection, and make sure the filtered connection
+	 * did not get the notification on the name owner change. Also, the
+	 * endpoint connection may not be able to call conn_info, neither on
+	 * the name nor on the ID.
+	 */
+	ret = install_name_add_match(ep_conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(ep_conn, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
+	ASSERT_RETURN(ret == -ESRCH);
+
+	ret = kdbus_conn_info(ep_conn, 0, "random.crappy.name", 0, NULL);
+	ASSERT_RETURN(ret == -ESRCH);
+
+	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
+	ASSERT_RETURN(ret == -ENXIO);
+
+	ret = kdbus_conn_info(ep_conn, 0x0fffffffffffffffULL, NULL, 0, NULL);
+	ASSERT_RETURN(ret == -ENXIO);
+
+	/* Check that the reader did not receive anything */
+	ret = kdbus_msg_recv(reader, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/*
+	 * Release the name again, update the custom endpoint policy,
+	 * and try again. This time, the connection on the custom endpoint
+	 * should have gotten it.
+	 */
+	ret = kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	ret = update_endpoint(ep_fd, name);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(ep_conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
+	kdbus_msg_free(msg);
+
+	ret = kdbus_msg_recv(reader, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
+
+	kdbus_msg_free(msg);
+
+	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* If we have privileges test custom endpoints */
+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * All uids/gids are mapped and we have the necessary caps
+	 */
+	if (ret && all_uids_gids_are_mapped()) {
+		ret = unpriv_test_custom_ep(env->buspath);
+		ASSERT_RETURN(ret == 0);
+	}
+
+	kdbus_conn_free(reader);
+	kdbus_conn_free(ep_conn);
+	close(ep_fd);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-fd.c b/tools/testing/selftests/kdbus/test-fd.c
new file mode 100644
index 000000000000..261cfc8aee6b
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-fd.c
@@ -0,0 +1,710 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/wait.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+#define KDBUS_MSG_MAX_ITEMS     128
+#define KDBUS_MSG_MAX_FDS       253
+#define KDBUS_USER_MAX_CONN	256
+
+static int make_msg_payload_dbus(uint64_t src_id, uint64_t dst_id,
+				 uint64_t msg_size,
+				 struct kdbus_msg **msg_dbus)
+{
+	struct kdbus_msg *msg;
+
+	msg = malloc(msg_size);
+	ASSERT_RETURN_VAL(msg, -ENOMEM);
+
+	memset(msg, 0, msg_size);
+	msg->size = msg_size;
+	msg->src_id = src_id;
+	msg->dst_id = dst_id;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+
+	*msg_dbus = msg;
+
+	return 0;
+}
+
+static void make_item_memfds(struct kdbus_item *item,
+			     int *memfds, size_t memfd_size)
+{
+	size_t i;
+
+	for (i = 0; i < memfd_size; i++) {
+		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
+		item->size = KDBUS_ITEM_HEADER_SIZE +
+			     sizeof(struct kdbus_memfd);
+		item->memfd.fd = memfds[i];
+		item->memfd.size = sizeof(uint64_t); /* const size */
+		item = KDBUS_ITEM_NEXT(item);
+	}
+}
+
+static void make_item_fds(struct kdbus_item *item,
+			  int *fd_array, size_t fd_size)
+{
+	size_t i;
+	item->type = KDBUS_ITEM_FDS;
+	item->size = KDBUS_ITEM_HEADER_SIZE + (sizeof(int) * fd_size);
+
+	for (i = 0; i < fd_size; i++)
+		item->fds[i] = fd_array[i];
+}
+
+static int memfd_write(const char *name, void *buf, size_t bufsize)
+{
+	ssize_t ret;
+	int memfd;
+
+	memfd = sys_memfd_create(name, 0);
+	ASSERT_RETURN_VAL(memfd >= 0, memfd);
+
+	ret = write(memfd, buf, bufsize);
+	ASSERT_RETURN_VAL(ret == (ssize_t)bufsize, -EAGAIN);
+
+	ret = sys_memfd_seal_set(memfd);
+	ASSERT_RETURN_VAL(ret == 0, -errno);
+
+	return memfd;
+}
+
+static int send_memfds(struct kdbus_conn *conn, uint64_t dst_id,
+		       int *memfds_array, size_t memfd_count)
+{
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_item *item;
+	struct kdbus_msg *msg;
+	uint64_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_msg);
+	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST)
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+
+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	item = msg->items;
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
+		item->type = KDBUS_ITEM_BLOOM_FILTER;
+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+		item = KDBUS_ITEM_NEXT(item);
+
+		msg->flags |= KDBUS_MSG_SIGNAL;
+	}
+
+	make_item_memfds(item, memfds_array, memfd_count);
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+		return ret;
+	}
+
+	free(msg);
+	return 0;
+}
+
+static int send_fds(struct kdbus_conn *conn, uint64_t dst_id,
+		    int *fd_array, size_t fd_count)
+{
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_item *item;
+	struct kdbus_msg *msg;
+	uint64_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_msg);
+	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST)
+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+
+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	item = msg->items;
+
+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
+		item->type = KDBUS_ITEM_BLOOM_FILTER;
+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
+		item = KDBUS_ITEM_NEXT(item);
+
+		msg->flags |= KDBUS_MSG_SIGNAL;
+	}
+
+	make_item_fds(item, fd_array, fd_count);
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+		return ret;
+	}
+
+	free(msg);
+	return ret;
+}
+
+static int send_fds_memfds(struct kdbus_conn *conn, uint64_t dst_id,
+			   int *fds_array, size_t fd_count,
+			   int *memfds_array, size_t memfd_count)
+{
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_item *item;
+	struct kdbus_msg *msg;
+	uint64_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_msg);
+	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
+	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
+
+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	item = msg->items;
+
+	make_item_fds(item, fds_array, fd_count);
+	item = KDBUS_ITEM_NEXT(item);
+	make_item_memfds(item, memfds_array, memfd_count);
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+		return ret;
+	}
+
+	free(msg);
+	return ret;
+}
+
+/* Return the number of received fds */
+static unsigned int kdbus_item_get_nfds(struct kdbus_msg *msg)
+{
+	unsigned int fds = 0;
+	const struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, msg, items) {
+		switch (item->type) {
+		case KDBUS_ITEM_FDS: {
+			fds += (item->size - KDBUS_ITEM_HEADER_SIZE) /
+				sizeof(int);
+			break;
+		}
+
+		case KDBUS_ITEM_PAYLOAD_MEMFD:
+			fds++;
+			break;
+
+		default:
+			break;
+		}
+	}
+
+	return fds;
+}
+
+static struct kdbus_msg *
+get_kdbus_msg_with_fd(struct kdbus_conn *conn_src,
+		      uint64_t dst_id, uint64_t cookie, int fd)
+{
+	int ret;
+	uint64_t size;
+	struct kdbus_item *item;
+	struct kdbus_msg *msg;
+
+	size = sizeof(struct kdbus_msg);
+	if (fd >= 0)
+		size += KDBUS_ITEM_SIZE(sizeof(int));
+
+	ret = make_msg_payload_dbus(conn_src->id, dst_id, size, &msg);
+	ASSERT_RETURN_VAL(ret == 0, NULL);
+
+	msg->cookie = cookie;
+
+	if (fd >= 0) {
+		item = msg->items;
+
+		make_item_fds(item, (int *)&fd, 1);
+	}
+
+	return msg;
+}
+
+static int kdbus_test_no_fds(struct kdbus_test_env *env,
+			     int *fds, int *memfd)
+{
+	pid_t pid;
+	int ret, status;
+	uint64_t cookie;
+	int connfd1, connfd2;
+	struct kdbus_msg *msg, *msg_sync_reply;
+	struct kdbus_cmd_hello hello;
+	struct kdbus_conn *conn_src, *conn_dst, *conn_dummy;
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_cmd_free cmd_free = {};
+
+	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_src);
+
+	connfd1 = open(env->buspath, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(connfd1 >= 0);
+
+	connfd2 = open(env->buspath, O_RDWR|O_CLOEXEC);
+	ASSERT_RETURN(connfd2 >= 0);
+
+	/*
+	 * Create connections without KDBUS_HELLO_ACCEPT_FD
+	 * to test if send fd operations are blocked
+	 */
+	conn_dst = malloc(sizeof(*conn_dst));
+	ASSERT_RETURN(conn_dst);
+
+	conn_dummy = malloc(sizeof(*conn_dummy));
+	ASSERT_RETURN(conn_dummy);
+
+	memset(&hello, 0, sizeof(hello));
+	hello.size = sizeof(struct kdbus_cmd_hello);
+	hello.pool_size = POOL_SIZE;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+
+	ret = ioctl(connfd1, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == 0);
+
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = hello.offset;
+	ret = ioctl(connfd1, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret >= 0);
+
+	conn_dst->fd = connfd1;
+	conn_dst->id = hello.id;
+
+	memset(&hello, 0, sizeof(hello));
+	hello.size = sizeof(struct kdbus_cmd_hello);
+	hello.pool_size = POOL_SIZE;
+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
+
+	ret = ioctl(connfd2, KDBUS_CMD_HELLO, &hello);
+	ASSERT_RETURN(ret == 0);
+
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = hello.offset;
+	ret = ioctl(connfd2, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret >= 0);
+
+	conn_dummy->fd = connfd2;
+	conn_dummy->id = hello.id;
+
+	conn_dst->buf = mmap(NULL, POOL_SIZE, PROT_READ,
+			     MAP_SHARED, connfd1, 0);
+	ASSERT_RETURN(conn_dst->buf != MAP_FAILED);
+
+	conn_dummy->buf = mmap(NULL, POOL_SIZE, PROT_READ,
+			       MAP_SHARED, connfd2, 0);
+	ASSERT_RETURN(conn_dummy->buf != MAP_FAILED);
+
+	/*
+	 * Send fds to connection that do not accept fd passing
+	 */
+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
+	ASSERT_RETURN(ret == -ECOMM);
+
+	/*
+	 * memfd are kdbus payload
+	 */
+	ret = send_memfds(conn_src, conn_dst->id, memfd, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv_poll(conn_dst, 100, NULL, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	cookie = time(NULL);
+
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		struct timespec now;
+
+		/*
+		 * A sync send/reply to a connection that do not
+		 * accept fds should fail if it contains an fd
+		 */
+		msg_sync_reply = get_kdbus_msg_with_fd(conn_dst,
+						       conn_dummy->id,
+						       cookie, fds[0]);
+		ASSERT_EXIT(msg_sync_reply);
+
+		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
+		ASSERT_EXIT(ret == 0);
+
+		msg_sync_reply->timeout_ns = now.tv_sec * 1000000000ULL +
+					     now.tv_nsec + 100000000ULL;
+		msg_sync_reply->flags = KDBUS_MSG_EXPECT_REPLY;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.size = sizeof(cmd);
+		cmd.msg_address = (uintptr_t)msg_sync_reply;
+		cmd.flags = KDBUS_SEND_SYNC_REPLY;
+
+		ret = ioctl(conn_dst->fd, KDBUS_CMD_SEND, &cmd);
+		ASSERT_EXIT(ret < 0 && -errno == -ECOMM);
+
+		/*
+		 * Now send a normal message, but the sync reply
+		 * will fail since it contains an fd that the
+		 * original sender do not want.
+		 *
+		 * The original sender will fail with -ETIMEDOUT
+		 */
+		cookie++;
+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
+					  KDBUS_MSG_EXPECT_REPLY,
+					  5000000000ULL, 0, conn_src->id, -1);
+		ASSERT_EXIT(ret == -EREMOTEIO);
+
+		cookie++;
+		ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
+		ASSERT_EXIT(ret == 0);
+		ASSERT_EXIT(msg->cookie == cookie);
+
+		free(msg_sync_reply);
+		kdbus_msg_free(msg);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = kdbus_msg_recv_poll(conn_dummy, 100, NULL, NULL);
+	ASSERT_RETURN(ret == -ETIMEDOUT);
+
+	cookie++;
+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	/*
+	 * Try to reply with a kdbus connection handle, this should
+	 * fail with -EOPNOTSUPP
+	 */
+	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
+					       conn_dst->id,
+					       cookie, conn_dst->fd);
+	ASSERT_RETURN(msg_sync_reply);
+
+	msg_sync_reply->cookie_reply = cookie;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg_sync_reply;
+
+	ret = ioctl(conn_src->fd, KDBUS_CMD_SEND, &cmd);
+	ASSERT_RETURN(ret < 0 && -errno == -EOPNOTSUPP);
+
+	free(msg_sync_reply);
+
+	/*
+	 * Try to reply with a normal fd, this should fail even
+	 * if the response is a sync reply
+	 *
+	 * From the sender view we fail with -ECOMM
+	 */
+	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
+					       conn_dst->id,
+					       cookie, fds[0]);
+	ASSERT_RETURN(msg_sync_reply);
+
+	msg_sync_reply->cookie_reply = cookie;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg_sync_reply;
+
+	ret = ioctl(conn_src->fd, KDBUS_CMD_SEND, &cmd);
+	ASSERT_RETURN(ret < 0 && -errno == -ECOMM);
+
+	free(msg_sync_reply);
+
+	/*
+	 * Resend another normal message and check if the queue
+	 * is clear
+	 */
+	cookie++;
+	ret = kdbus_msg_send(conn_src, NULL, cookie, 0, 0, 0,
+			     conn_dst->id);
+	ASSERT_RETURN(ret == 0);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	kdbus_conn_free(conn_dummy);
+	kdbus_conn_free(conn_dst);
+	kdbus_conn_free(conn_src);
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+static int kdbus_send_multiple_fds(struct kdbus_conn *conn_src,
+				   struct kdbus_conn *conn_dst)
+{
+	int ret, i;
+	unsigned int nfds;
+	int fds[KDBUS_MSG_MAX_FDS + 1];
+	int memfds[KDBUS_MSG_MAX_ITEMS + 1];
+	struct kdbus_msg *msg;
+	uint64_t dummy_value;
+
+	dummy_value = time(NULL);
+
+	for (i = 0; i < KDBUS_MSG_MAX_FDS + 1; i++) {
+		fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
+		ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
+	}
+
+	/* Send KDBUS_MSG_MAX_FDS with one more fd */
+	ret = send_fds(conn_src, conn_dst->id, fds, KDBUS_MSG_MAX_FDS + 1);
+	ASSERT_RETURN(ret == -EMFILE);
+
+	/* Retry with the correct KDBUS_MSG_MAX_FDS */
+	ret = send_fds(conn_src, conn_dst->id, fds, KDBUS_MSG_MAX_FDS);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Check we got the right number of fds */
+	nfds = kdbus_item_get_nfds(msg);
+	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_FDS);
+
+	kdbus_msg_free(msg);
+
+	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++, dummy_value++) {
+		memfds[i] = memfd_write("memfd-name",
+					&dummy_value,
+					sizeof(dummy_value));
+		ASSERT_RETURN_VAL(memfds[i] >= 0, memfds[i]);
+	}
+
+	/* Send KDBUS_MSG_MAX_FDS with one more memfd */
+	ret = send_memfds(conn_src, conn_dst->id,
+			  memfds, KDBUS_MSG_MAX_ITEMS + 1);
+	ASSERT_RETURN(ret == -E2BIG);
+
+	/* Retry with the correct KDBUS_MSG_MAX_ITEMS */
+	ret = send_memfds(conn_src, conn_dst->id,
+			  memfds, KDBUS_MSG_MAX_ITEMS);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Check we got the right number of fds */
+	nfds = kdbus_item_get_nfds(msg);
+	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_ITEMS);
+
+	kdbus_msg_free(msg);
+
+
+	/* Combine multiple 254 fds and 100 memfds */
+	ret = send_fds_memfds(conn_src, conn_dst->id,
+			      fds, KDBUS_MSG_MAX_FDS + 1,
+			      memfds, 100);
+	ASSERT_RETURN(ret == -EMFILE);
+
+	/* Combine multiple 253 fds and 128 + 1 memfds */
+	ret = send_fds_memfds(conn_src, conn_dst->id,
+			      fds, KDBUS_MSG_MAX_FDS,
+			      memfds, KDBUS_MSG_MAX_ITEMS + 1);
+	ASSERT_RETURN(ret == -E2BIG);
+
+	ret = send_fds_memfds(conn_src, conn_dst->id,
+			      fds, 153, memfds, 100);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Check we got the right number of fds */
+	nfds = kdbus_item_get_nfds(msg);
+	ASSERT_RETURN(nfds == 253);
+
+	kdbus_msg_free(msg);
+
+	for (i = 0; i < KDBUS_MSG_MAX_FDS + 1; i++)
+		close(fds[i]);
+
+	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++)
+		close(memfds[i]);
+
+	return 0;
+}
+
+int kdbus_test_fd_passing(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn_src, *conn_dst;
+	const char *str = "stackenblocken";
+	const struct kdbus_item *item;
+	struct kdbus_msg *msg;
+	unsigned int i;
+	uint64_t now;
+	int fds_conn[2];
+	int sock_pair[2];
+	int fds[2];
+	int memfd;
+	int ret;
+
+	now = (uint64_t) time(NULL);
+
+	/* create two connections */
+	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_dst = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_src && conn_dst);
+
+	fds_conn[0] = conn_src->fd;
+	fds_conn[1] = conn_dst->fd;
+
+	ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_pair);
+	ASSERT_RETURN(ret == 0);
+
+	/* Setup memfd */
+	memfd = memfd_write("memfd-name", &now, sizeof(now));
+	ASSERT_RETURN(memfd >= 0);
+
+	/* Setup pipes */
+	ret = pipe(fds);
+	ASSERT_RETURN(ret == 0);
+
+	i = write(fds[1], str, strlen(str));
+	ASSERT_RETURN(i == strlen(str));
+
+	/*
+	 * Try to ass the handle of a connection as message payload.
+	 * This must fail.
+	 */
+	ret = send_fds(conn_src, conn_dst->id, fds_conn, 2);
+	ASSERT_RETURN(ret == -ENOTSUP);
+
+	ret = send_fds(conn_dst, conn_src->id, fds_conn, 2);
+	ASSERT_RETURN(ret == -ENOTSUP);
+
+	ret = send_fds(conn_src, conn_dst->id, sock_pair, 2);
+	ASSERT_RETURN(ret == -ENOTSUP);
+
+	/*
+	 * Send fds and memfds to connection that do not accept fds
+	 */
+	ret = kdbus_test_no_fds(env, fds, (int *)&memfd);
+	ASSERT_RETURN(ret == 0);
+
+	/* Try to broadcast file descriptors. This must fail. */
+	ret = send_fds(conn_src, KDBUS_DST_ID_BROADCAST, fds, 1);
+	ASSERT_RETURN(ret == -ENOTUNIQ);
+
+	/* Try to broadcast memfd. This must succeed. */
+	ret = send_memfds(conn_src, KDBUS_DST_ID_BROADCAST, (int *)&memfd, 1);
+	ASSERT_RETURN(ret == 0);
+
+	/* Open code this loop */
+loop_send_fds:
+
+	/*
+	 * Send the read end of the pipe and close it.
+	 */
+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
+	ASSERT_RETURN(ret == 0);
+	close(fds[0]);
+
+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	KDBUS_ITEM_FOREACH(item, msg, items) {
+		if (item->type == KDBUS_ITEM_FDS) {
+			char tmp[14];
+			int nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
+					sizeof(int);
+			ASSERT_RETURN(nfds == 1);
+
+			i = read(item->fds[0], tmp, sizeof(tmp));
+			if (i != 0) {
+				ASSERT_RETURN(i == sizeof(tmp));
+				ASSERT_RETURN(memcmp(tmp, str, sizeof(tmp)) == 0);
+
+				/* Write EOF */
+				close(fds[1]);
+
+				/*
+				 * Resend the read end of the pipe,
+				 * the receiver still holds a reference
+				 * to it...
+				 */
+				goto loop_send_fds;
+			}
+
+			/* Got EOF */
+
+			/*
+			 * Close the last reference to the read end
+			 * of the pipe, other references are
+			 * automatically closed just after send.
+			 */
+			close(item->fds[0]);
+		}
+	}
+
+	/*
+	 * Try to resend the read end of the pipe. Must fail with
+	 * -EBADF since both the sender and receiver closed their
+	 * references to it. We assume the above since sender and
+	 * receiver are on the same process.
+	 */
+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
+	ASSERT_RETURN(ret == -EBADF);
+
+	/* Then we clear out received any data... */
+	kdbus_msg_free(msg);
+
+	ret = kdbus_send_multiple_fds(conn_src, conn_dst);
+	ASSERT_RETURN(ret == 0);
+
+	close(sock_pair[0]);
+	close(sock_pair[1]);
+	close(memfd);
+
+	kdbus_conn_free(conn_src);
+	kdbus_conn_free(conn_dst);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-free.c b/tools/testing/selftests/kdbus/test-free.c
new file mode 100644
index 000000000000..e3a280a6daf2
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-free.c
@@ -0,0 +1,36 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+int kdbus_test_free(struct kdbus_test_env *env)
+{
+	int ret;
+	struct kdbus_cmd_free cmd_free = {};
+
+	/* free an unallocated buffer */
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.flags = 0;
+	cmd_free.offset = 0;
+	ret = ioctl(env->conn->fd, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret == -1 && errno == ENXIO);
+
+	/* free a buffer out of the pool's bounds */
+	cmd_free.size = sizeof(cmd_free);
+	cmd_free.offset = POOL_SIZE + 1;
+	ret = ioctl(env->conn->fd, KDBUS_CMD_FREE, &cmd_free);
+	ASSERT_RETURN(ret == -1 && errno == ENXIO);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-match.c b/tools/testing/selftests/kdbus/test-match.c
new file mode 100644
index 000000000000..d40c3388500a
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-match.c
@@ -0,0 +1,442 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+int kdbus_test_match_id_add(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_id_change chg;
+		} item;
+	} buf;
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	int ret;
+
+	memset(&buf, 0, sizeof(buf));
+
+	buf.cmd.size = sizeof(buf);
+	buf.cmd.cookie = 0xdeafbeefdeaddead;
+	buf.item.size = sizeof(buf.item);
+	buf.item.type = KDBUS_ITEM_ID_ADD;
+	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
+
+	/* match on id add */
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* create 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	/* 1st connection should have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
+	ASSERT_RETURN(msg->items[0].id_change.id == conn->id);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+int kdbus_test_match_id_remove(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_id_change chg;
+		} item;
+	} buf;
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	size_t id;
+	int ret;
+
+	/* create 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+	id = conn->id;
+
+	memset(&buf, 0, sizeof(buf));
+	buf.cmd.size = sizeof(buf);
+	buf.cmd.cookie = 0xdeafbeefdeaddead;
+	buf.item.size = sizeof(buf.item);
+	buf.item.type = KDBUS_ITEM_ID_REMOVE;
+	buf.item.chg.id = id;
+
+	/* register match on 2nd connection */
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* remove 2nd connection again */
+	kdbus_conn_free(conn);
+
+	/* 1st connection should have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
+	ASSERT_RETURN(msg->items[0].id_change.id == id);
+
+	return TEST_OK;
+}
+
+int kdbus_test_match_replace(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_id_change chg;
+		} item;
+	} buf;
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	size_t id;
+	int ret;
+
+	/* add a match to id_add */
+	ASSERT_RETURN(kdbus_test_match_id_add(env) == TEST_OK);
+
+	/* do a replace of the match from id_add to id_remove */
+	memset(&buf, 0, sizeof(buf));
+
+	buf.cmd.size = sizeof(buf);
+	buf.cmd.cookie = 0xdeafbeefdeaddead;
+	buf.cmd.flags = KDBUS_MATCH_REPLACE;
+	buf.item.size = sizeof(buf.item);
+	buf.item.type = KDBUS_ITEM_ID_REMOVE;
+	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
+
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+
+	/* create 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+	id = conn->id;
+
+	/* 1st connection should _not_ have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret != 0);
+
+	/* remove 2nd connection */
+	kdbus_conn_free(conn);
+
+	/* 1st connection should _now_ have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
+	ASSERT_RETURN(msg->items[0].id_change.id == id);
+
+	return TEST_OK;
+}
+
+int kdbus_test_match_name_add(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_name_change chg;
+		} item;
+		char name[64];
+	} buf;
+	struct kdbus_msg *msg;
+	char *name;
+	int ret;
+
+	name = "foo.bla.blaz";
+
+	/* install the match rule */
+	memset(&buf, 0, sizeof(buf));
+	buf.item.type = KDBUS_ITEM_NAME_ADD;
+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
+	strncpy(buf.name, name, sizeof(buf.name) - 1);
+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
+
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* acquire the name */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* we should have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
+
+	return TEST_OK;
+}
+
+int kdbus_test_match_name_remove(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_name_change chg;
+		} item;
+		char name[64];
+	} buf;
+	struct kdbus_msg *msg;
+	char *name;
+	int ret;
+
+	name = "foo.bla.blaz";
+
+	/* acquire the name */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* install the match rule */
+	memset(&buf, 0, sizeof(buf));
+	buf.item.type = KDBUS_ITEM_NAME_REMOVE;
+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
+	strncpy(buf.name, name, sizeof(buf.name) - 1);
+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
+
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* release the name again */
+	kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* we should have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_REMOVE);
+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == 0);
+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
+
+	return TEST_OK;
+}
+
+int kdbus_test_match_name_change(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			struct kdbus_notify_name_change chg;
+		} item;
+		char name[64];
+	} buf;
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	uint64_t flags;
+	char *name = "foo.bla.baz";
+	int ret;
+
+	/* acquire the name */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* install the match rule */
+	memset(&buf, 0, sizeof(buf));
+	buf.item.type = KDBUS_ITEM_NAME_CHANGE;
+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
+	strncpy(buf.name, name, sizeof(buf.name) - 1);
+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
+
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	/* allow the new connection to own the same name */
+	/* queue the 2nd connection as waiting owner */
+	flags = KDBUS_NAME_QUEUE;
+	ret = kdbus_name_acquire(conn, name, &flags);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
+
+	/* release name from 1st connection */
+	ret = kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* we should have received a notification */
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_CHANGE);
+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == conn->id);
+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+static int send_bloom_filter(const struct kdbus_conn *conn,
+			     uint64_t cookie,
+			     const uint8_t *filter,
+			     size_t filter_size,
+			     uint64_t filter_generation)
+{
+	struct kdbus_cmd_send cmd = {};
+	struct kdbus_msg *msg;
+	struct kdbus_item *item;
+	uint64_t size;
+	int ret;
+
+	size = sizeof(struct kdbus_msg);
+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + filter_size;
+
+	msg = alloca(size);
+
+	memset(msg, 0, size);
+	msg->size = size;
+	msg->src_id = conn->id;
+	msg->dst_id = KDBUS_DST_ID_BROADCAST;
+	msg->flags = KDBUS_MSG_SIGNAL;
+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
+	msg->cookie = cookie;
+
+	item = msg->items;
+	item->type = KDBUS_ITEM_BLOOM_FILTER;
+	item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) +
+				filter_size;
+
+	item->bloom_filter.generation = filter_generation;
+	memcpy(item->bloom_filter.data, filter, filter_size);
+
+	cmd.size = sizeof(cmd);
+	cmd.msg_address = (uintptr_t)msg;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_SEND, &cmd);
+	if (ret < 0) {
+		ret = -errno;
+		kdbus_printf("error sending message: %d (%m)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int kdbus_test_match_bloom(struct kdbus_test_env *env)
+{
+	struct {
+		struct kdbus_cmd_match cmd;
+		struct {
+			uint64_t size;
+			uint64_t type;
+			uint8_t data_gen0[64];
+			uint8_t data_gen1[64];
+		} item;
+	} buf;
+	struct kdbus_conn *conn;
+	struct kdbus_msg *msg;
+	uint64_t cookie = 0xf000f00f;
+	uint8_t filter[64];
+	int ret;
+
+	/* install the match rule */
+	memset(&buf, 0, sizeof(buf));
+	buf.cmd.size = sizeof(buf);
+
+	buf.item.size = sizeof(buf.item);
+	buf.item.type = KDBUS_ITEM_BLOOM_MASK;
+	buf.item.data_gen0[0] = 0x55;
+	buf.item.data_gen0[63] = 0x80;
+
+	buf.item.data_gen1[1] = 0xaa;
+	buf.item.data_gen1[9] = 0x02;
+
+	ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_ADD, &buf);
+	ASSERT_RETURN(ret == 0);
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	/* a message with a 0'ed out filter must not reach the other peer */
+	memset(filter, 0, sizeof(filter));
+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/* now set the filter to the connection's mask and expect success */
+	filter[0] = 0x55;
+	filter[63] = 0x80;
+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	/* broaden the filter and try again. this should also succeed. */
+	filter[0] = 0xff;
+	filter[8] = 0xff;
+	filter[63] = 0xff;
+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	/* the same filter must not match against bloom generation 1 */
+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/* set a different filter and try again */
+	filter[1] = 0xaa;
+	filter[9] = 0x02;
+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-message.c b/tools/testing/selftests/kdbus/test-message.c
new file mode 100644
index 000000000000..049e56786b0c
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-message.c
@@ -0,0 +1,658 @@
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <time.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+#include <sys/eventfd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+/*
+ * maximum number of queued messages wich will not be user accounted.
+ * after this value is reached each user will have an individual limit.
+ */
+#define KDBUS_CONN_MAX_MSGS_UNACCOUNTED		16
+
+/*
+ * maximum number of queued messages from the same indvidual user after the
+ * the un-accounted value has been hit
+ */
+#define KDBUS_CONN_MAX_MSGS_PER_USER		16
+
+#define MAX_USER_TOTAL_MSGS  (KDBUS_CONN_MAX_MSGS_UNACCOUNTED + \
+				KDBUS_CONN_MAX_MSGS_PER_USER)
+/* maximum number of queued messages in a connection */
+#define KDBUS_CONN_MAX_MSGS			256
+
+/* maximum number of queued requests waiting for a reply */
+#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
+
+int kdbus_test_message_basic(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn;
+	struct kdbus_conn *sender;
+	struct kdbus_msg *msg;
+	uint64_t cookie = 0x1234abcd5678eeff;
+	uint64_t offset;
+	int ret;
+
+	sender = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(sender != NULL);
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	ret = kdbus_add_match_empty(conn);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(sender);
+	ASSERT_RETURN(ret == 0);
+
+	/* send over 1st connection */
+	ret = kdbus_msg_send(sender, NULL, cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	/* Make sure that we do not get our own broadcasts */
+	ret = kdbus_msg_recv(sender, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/* ... and receive on the 2nd */
+	ret = kdbus_msg_recv_poll(conn, 100, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	/* Msgs that expect a reply must have timeout and cookie */
+	ret = kdbus_msg_send(sender, NULL, 0, KDBUS_MSG_EXPECT_REPLY,
+			     0, 0, conn->id);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	/* Faked replies with a valid reply cookie are rejected */
+	ret = kdbus_msg_send_reply(conn, time(NULL) ^ cookie, sender->id);
+	ASSERT_RETURN(ret == -EPERM);
+
+	ret = kdbus_free(conn, offset);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(sender);
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+static int msg_recv_prio(struct kdbus_conn *conn,
+			 int64_t requested_prio,
+			 int64_t expected_prio)
+{
+	struct kdbus_cmd_recv recv = {
+		.size = sizeof(recv),
+		.flags = KDBUS_RECV_USE_PRIORITY,
+		.priority = requested_prio,
+	};
+	struct kdbus_msg *msg;
+	int ret;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_RECV, &recv);
+	if (ret < 0) {
+		kdbus_printf("error receiving message: %d (%m)\n", -errno);
+		return -errno;
+	}
+
+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
+	kdbus_msg_dump(conn, msg);
+
+	if (msg->priority != expected_prio) {
+		kdbus_printf("expected message prio %lld, got %lld\n",
+			     (unsigned long long) expected_prio,
+			     (unsigned long long) msg->priority);
+		return -EINVAL;
+	}
+
+	kdbus_msg_free(msg);
+	ret = kdbus_free(conn, recv.msg.offset);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+int kdbus_test_message_prio(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *a, *b;
+	uint64_t cookie = 0;
+
+	a = kdbus_hello(env->buspath, 0, NULL, 0);
+	b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(a && b);
+
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   25, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -600, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -35, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -100, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   20, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -15, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -150, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -10, a->id) == 0);
+
+	ASSERT_RETURN(msg_recv_prio(a, -200, -800) == 0);
+	ASSERT_RETURN(msg_recv_prio(a, -100, -800) == 0);
+	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == 0);
+	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == -ENOMSG);
+	ASSERT_RETURN(msg_recv_prio(a, 10, -150) == 0);
+	ASSERT_RETURN(msg_recv_prio(a, 10, -100) == 0);
+
+	kdbus_printf("--- get priority (all)\n");
+	ASSERT_RETURN(kdbus_msg_recv(a, NULL, NULL) == 0);
+
+	kdbus_conn_free(a);
+	kdbus_conn_free(b);
+
+	return TEST_OK;
+}
+
+static int kdbus_test_notify_kernel_quota(struct kdbus_test_env *env)
+{
+	int ret;
+	unsigned int i;
+	uint64_t offset;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *reader;
+	struct kdbus_msg *msg = NULL;
+
+	reader = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(reader);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	/* Register for ID signals */
+	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	/* Each iteration two notifications: add and remove ID */
+	for (i = 0; i < KDBUS_CONN_MAX_MSGS / 2; i++) {
+		struct kdbus_conn *notifier;
+
+		notifier = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(notifier);
+
+		kdbus_conn_free(notifier);
+
+	}
+
+	/*
+	 * Now the reader queue is full, message will be lost
+	 * but it will not be accounted in dropped msgs
+	 */
+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0, 0, reader->id);
+	ASSERT_RETURN(ret == -ENOBUFS);
+
+	/* More ID kernel notifications that will be lost */
+	kdbus_conn_free(conn);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	kdbus_conn_free(conn);
+
+	ret = kdbus_msg_recv(reader, &msg, &offset);
+	ASSERT_RETURN(ret == -EOVERFLOW);
+
+	/*
+	 * We lost only 3 packet since only broadcast mesg
+	 * are accounted. The connection ID add/remove notification
+	 */
+	ASSERT_RETURN(offset == 3);
+
+	kdbus_msg_free(msg);
+
+	/* Read our queue */
+	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
+		ret = kdbus_msg_recv_poll(reader, 100, &msg, NULL);
+		ASSERT_RETURN(ret == 0);
+
+		kdbus_msg_free(msg);
+	}
+
+	ret = kdbus_msg_recv(reader, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	kdbus_conn_free(reader);
+
+	return 0;
+}
+
+/* Return the number of message successfully sent */
+static int kdbus_fill_conn_queue(struct kdbus_conn *conn_src,
+				 uint64_t dst_id,
+				 unsigned int max_msgs)
+{
+	unsigned int i;
+	uint64_t cookie = 0;
+	int ret;
+
+	for (i = 0; i < max_msgs; i++) {
+		ret = kdbus_msg_send(conn_src, NULL, ++cookie,
+				     0, 0, 0, dst_id);
+		if (ret < 0)
+			break;
+	}
+
+	return i;
+}
+
+static int kdbus_test_broadcast_quota(struct kdbus_test_env *env)
+{
+	int ret;
+	uint64_t offset;
+	unsigned int i;
+	struct kdbus_msg *msg;
+	struct kdbus_conn *privileged_a;
+	struct kdbus_conn *privileged_b;
+	struct kdbus_conn *holder;
+	struct kdbus_policy_access access = {
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = getuid(),
+		.access = KDBUS_POLICY_TALK,
+	};
+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
+
+	holder = kdbus_hello_registrar(env->buspath, "com.example.a",
+				       &access, 1,
+				       KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder);
+
+	privileged_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(privileged_a);
+
+	privileged_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(privileged_b);
+
+	/* Acquire name with access world so they can talk to us */
+	ret = kdbus_name_acquire(privileged_a, "com.example.a", NULL);
+	ASSERT_RETURN(ret >= 0);
+
+	/* Broadcast matches for privileged connections */
+	ret = kdbus_add_match_empty(privileged_a);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_empty(privileged_b);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * We start accouting after KDBUS_CONN_MAX_MSGS_UNACCOUNTED
+	 * so the first sender will at least send
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED + KDBUS_CONN_MAX_MSGS_PER_USER
+	 */
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		unsigned int cnt;
+
+		cnt = kdbus_fill_conn_queue(unpriv, KDBUS_DST_ID_BROADCAST,
+					    MAX_USER_TOTAL_MSGS);
+		ASSERT_EXIT(cnt == MAX_USER_TOTAL_MSGS);
+
+		/*
+		 * Another message that will trigger the lost count
+		 *
+		 * Broadcasts always succeed
+		 */
+		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
+				     0, KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == 0);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	expected_cookie++;
+	/* Now try to send a legitimate message from B to A */
+	ret = kdbus_msg_send(privileged_b, NULL, expected_cookie, 0,
+			     0, 0, privileged_a->id);
+	ASSERT_RETURN(ret == 0);
+
+	expected_cookie++;
+	ret = kdbus_msg_send(privileged_b, NULL, expected_cookie, 0,
+			     0, 0, KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	/* Privileged service A tries to read its messages now */
+	ret = kdbus_msg_recv_poll(privileged_a, 100, &msg, &offset);
+	ASSERT_RETURN(ret == -EOVERFLOW);
+
+	/*
+	 * We have lost 1 broadcast messages, the one from unprivileged
+	 * the privileged broadcast was queued, our quota is per user
+	 */
+	ASSERT_RETURN(offset == 1);
+
+	/* Read our queue */
+	for (i = 0; i < MAX_USER_TOTAL_MSGS; i++) {
+		ret = kdbus_msg_recv_poll(privileged_a, 100, &msg, NULL);
+		ASSERT_RETURN(ret == 0);
+
+		ASSERT_RETURN(msg->dst_id == KDBUS_DST_ID_BROADCAST);
+
+		kdbus_msg_free(msg);
+	}
+
+	ret = kdbus_msg_recv_poll(privileged_a, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Unicast message */
+	ASSERT_RETURN(msg->cookie == expected_cookie - 1);
+	ASSERT_RETURN(msg->src_id == privileged_b->id &&
+		      msg->dst_id == privileged_a->id);
+
+	kdbus_msg_free(msg);
+
+	ret = kdbus_msg_recv_poll(privileged_a, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* Broadcast message */
+	ASSERT_RETURN(msg->cookie == expected_cookie);
+	ASSERT_RETURN(msg->src_id == privileged_b->id &&
+		      msg->dst_id == KDBUS_DST_ID_BROADCAST);
+
+	kdbus_msg_free(msg);
+
+	/* Queue empty */
+	ret = kdbus_msg_recv(privileged_a, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	kdbus_conn_free(holder);
+	kdbus_conn_free(privileged_a);
+	kdbus_conn_free(privileged_b);
+
+	return 0;
+}
+
+static int kdbus_test_expected_reply_quota(struct kdbus_test_env *env)
+{
+	int ret;
+	unsigned int i, n;
+	unsigned int count;
+	uint64_t cookie = 0x1234abcd5678eeff;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *connections[9];
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	for (i = 0; i < 9; i++) {
+		connections[i] = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(connections[i]);
+	}
+
+	count = 0;
+	/* Send 16 messages to 8 different connections */
+	for (i = 0; i < 8; i++) {
+		for (n = 0; n < KDBUS_CONN_MAX_MSGS_PER_USER; n++, count++) {
+			ret = kdbus_msg_send(conn, NULL, cookie++,
+					     KDBUS_MSG_EXPECT_REPLY,
+					     100000000ULL, 0,
+					     connections[i]->id);
+			ASSERT_RETURN(ret == 0);
+		}
+	}
+
+	ASSERT_RETURN(count == KDBUS_CONN_MAX_REQUESTS_PENDING);
+
+	/*
+	 * Now try to send a message to the last connection,
+	 * if we have reached KDBUS_CONN_MAX_REQUESTS_PENDING
+	 * no further requests are allowed
+	 */
+	ret = kdbus_msg_send(conn, NULL, cookie++, KDBUS_MSG_EXPECT_REPLY,
+			     1000000000ULL, 0, connections[i]->id);
+	ASSERT_RETURN(ret == -EMLINK);
+
+	for (i = 0; i < 9; i++)
+		kdbus_conn_free(connections[i]);
+
+	kdbus_conn_free(conn);
+
+	return 0;
+}
+
+static int kdbus_test_multi_users_quota(struct kdbus_test_env *env)
+{
+	int ret, efd1, efd2;
+	unsigned int cnt, recved_count;
+	struct kdbus_conn *conn;
+	struct kdbus_conn *privileged;
+	struct kdbus_conn *holder;
+	eventfd_t child1_count = 0, child2_count = 0;
+	struct kdbus_policy_access access = {
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	holder = kdbus_hello_registrar(env->buspath, "com.example.a",
+				       &access, 1,
+				       KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder);
+
+	privileged = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(privileged);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	/* Acquire name with access world so they can talk to us */
+	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/* Use this to tell parent how many messages have bee sent */
+	efd1 = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd1 >= 0, efd1);
+
+	efd2 = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd2 >= 0, efd2);
+
+	/*
+	 * Queue multiple messages as different users at the
+	 * same time.
+	 *
+	 * When the receiver queue count is below
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED messages are not accounted.
+	 *
+	 * So we start two threads running under different uid, they
+	 * race and each one will try to send:
+	 * (KDBUS_CONN_MAX_MSGS_UNACCOUNTED + KDBUS_CONN_MAX_MSGS_PER_USER) + 1
+	 * msg
+	 *
+	 * Both threads will return how many message was successfull
+	 * queued, later we compute and try to validate the user quota
+	 * checks.
+	 */
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
+		struct kdbus_conn *unpriv;
+
+		unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_EXIT(unpriv);
+
+		cnt = kdbus_fill_conn_queue(unpriv, conn->id,
+					    MAX_USER_TOTAL_MSGS + 1);
+		/* Explicitly check for 0 we can't send it to eventfd */
+		ASSERT_EXIT(cnt > 0);
+
+		ret = eventfd_write(efd1, cnt);
+		ASSERT_EXIT(ret == 0);
+	}),
+	({;
+		/* Queue other messages as a different user */
+		ret = RUN_UNPRIVILEGED(UNPRIV_UID - 1, UNPRIV_GID - 1, ({
+			struct kdbus_conn *unpriv;
+
+			unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
+			ASSERT_EXIT(unpriv);
+
+			cnt = kdbus_fill_conn_queue(unpriv, conn->id,
+						    MAX_USER_TOTAL_MSGS + 1);
+			/* Explicitly check for 0 */
+			ASSERT_EXIT(cnt > 0);
+
+			ret = eventfd_write(efd2, cnt);
+			ASSERT_EXIT(ret == 0);
+		}),
+		({ 0; }));
+		ASSERT_RETURN(ret == 0);
+
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	/* Delay reading, so if children die we are not blocked */
+	ret = eventfd_read(efd1, &child1_count);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = eventfd_read(efd2, &child2_count);
+	ASSERT_RETURN(ret >= 0);
+
+	recved_count = child1_count + child2_count;
+
+	/* Validate how many messages have been sent */
+	ASSERT_RETURN(recved_count > 0);
+
+	/*
+	 * We start accounting after KDBUS_CONN_MAX_MSGS_UNACCOUNTED so now we
+	 * have a KDBUS_CONN_MAX_MSGS_UNACCOUNTED not accounted, and given we
+	 * have at least sent (KDBUS_CONN_MAX_MSGS_UNACCOUNTED +
+	 * KDBUS_CONN_MAX_MSGS_PER_USER) + 1 for the two threads: recved_count
+	 * for both treads will for sure exceed that value.
+	 *
+	 * 1) Both thread1 msgs + threads2 msgs exceed
+	 *    KDBUS_CONN_MAX_MSGS_UNACCOUNTED. Accounting is started.
+	 * 2) Now both of them will be able to send only his quota
+	 *    which is KDBUS_CONN_MAX_MSGS_PER_USER
+	 *    (previous sent messages of 1) were not accounted)
+	 */
+	ASSERT_RETURN(recved_count > MAX_USER_TOTAL_MSGS + 1)
+
+	/*
+	 * A process should never receive more than
+	 * (KDBUS_CONN_MAX_MSGS_UNACCOUNTED + KDBUS_CONN_MAX_MSGS_PER_USER) + 1)
+	 */
+	ASSERT_RETURN(child1_count < MAX_USER_TOTAL_MSGS + 1)
+
+	/*
+	 * Now both no accounted messages should give us
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED when the accounting
+	 * started.
+	 *
+	 * child1 non accounted + child2 non accounted =
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED
+	 */
+	ASSERT_RETURN(KDBUS_CONN_MAX_MSGS_UNACCOUNTED ==
+		((child1_count - KDBUS_CONN_MAX_MSGS_PER_USER) +
+		 ((recved_count - child1_count) -
+		  KDBUS_CONN_MAX_MSGS_PER_USER)));
+
+	/*
+	 * A process should never receive more than
+	 * (KDBUS_CONN_MAX_MSGS_UNACCOUNTED + KDBUS_CONN_MAX_MSGS_PER_USER) + 1)
+	 */
+	ASSERT_RETURN(child2_count < MAX_USER_TOTAL_MSGS + 1)
+
+	/*
+	 * Now both no accounted messages should give us
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED when the accounting
+	 * started.
+	 *
+	 * child1 non accounted + child2 non accounted =
+	 * KDBUS_CONN_MAX_MSGS_UNACCOUNTED
+	 */
+	ASSERT_RETURN(KDBUS_CONN_MAX_MSGS_UNACCOUNTED ==
+		((child2_count - KDBUS_CONN_MAX_MSGS_PER_USER) +
+		 ((recved_count - child2_count) -
+		  KDBUS_CONN_MAX_MSGS_PER_USER)));
+
+	/* Try to queue up more, but we fail no space in the pool */
+	cnt = kdbus_fill_conn_queue(privileged, conn->id, KDBUS_CONN_MAX_MSGS);
+	ASSERT_RETURN(cnt > 0 && cnt < KDBUS_CONN_MAX_MSGS);
+
+	ret = kdbus_msg_send(privileged, NULL, 0xdeadbeef, 0, 0,
+			     0, conn->id);
+	ASSERT_RETURN(ret == -ENOBUFS);
+
+	close(efd1);
+	close(efd2);
+
+	kdbus_conn_free(privileged);
+	kdbus_conn_free(holder);
+	kdbus_conn_free(conn);
+
+	return 0;
+}
+
+int kdbus_test_message_quota(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *a, *b;
+	uint64_t cookie = 0;
+	int ret;
+	int i;
+
+	ret = kdbus_test_notify_kernel_quota(env);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_test_expected_reply_quota(env);
+	ASSERT_RETURN(ret == 0);
+
+	if (geteuid() == 0 && all_uids_gids_are_mapped()) {
+		ret = kdbus_test_multi_users_quota(env);
+		ASSERT_RETURN(ret == 0);
+
+		ret = kdbus_test_broadcast_quota(env);
+		ASSERT_RETURN(ret == 0);
+
+		/* Drop to 'nobody' and continue test */
+		ret = setresuid(UNPRIV_UID, UNPRIV_UID, UNPRIV_UID);
+		ASSERT_RETURN(ret == 0);
+	}
+
+	a = kdbus_hello(env->buspath, 0, NULL, 0);
+	b = kdbus_hello(env->buspath, 0, NULL, 0);
+
+	ret = kdbus_fill_conn_queue(b, a->id, MAX_USER_TOTAL_MSGS);
+	ASSERT_RETURN(ret == MAX_USER_TOTAL_MSGS);
+
+	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
+	ASSERT_RETURN(ret == -ENOBUFS);
+
+	for (i = 0; i < MAX_USER_TOTAL_MSGS; ++i) {
+		ret = kdbus_msg_recv(a, NULL, NULL);
+		ASSERT_RETURN(ret == 0);
+	}
+
+	ret = kdbus_fill_conn_queue(b, a->id, MAX_USER_TOTAL_MSGS);
+	ASSERT_RETURN(ret == MAX_USER_TOTAL_MSGS);
+
+	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
+	ASSERT_RETURN(ret == -ENOBUFS);
+
+	kdbus_conn_free(a);
+	kdbus_conn_free(b);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-metadata-ns.c b/tools/testing/selftests/kdbus/test-metadata-ns.c
new file mode 100644
index 000000000000..0d1e7edf7d84
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-metadata-ns.c
@@ -0,0 +1,507 @@
+/*
+ * Test metadata in new namespaces. Even if our tests can run
+ * in a namespaced setup, this test is necessary so we can inspect
+ * metadata on the same kdbusfs but between multiple namespaces
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <sched.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <signal.h>
+#include <sys/wait.h>
+#include <sys/ioctl.h>
+#include <sys/prctl.h>
+#include <sys/eventfd.h>
+#include <sys/syscall.h>
+#include <sys/capability.h>
+#include <linux/sched.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+static const struct kdbus_creds privileged_creds = {};
+
+static const struct kdbus_creds unmapped_creds = {
+	.uid	= UNPRIV_UID,
+	.euid	= UNPRIV_UID,
+	.suid	= UNPRIV_UID,
+	.fsuid	= UNPRIV_UID,
+	.gid	= UNPRIV_GID,
+	.egid	= UNPRIV_GID,
+	.sgid	= UNPRIV_GID,
+	.fsgid	= UNPRIV_GID,
+};
+
+static const struct kdbus_pids unmapped_pids = {};
+
+/* Get only the first item */
+static struct kdbus_item *kdbus_get_item(struct kdbus_msg *msg,
+					 uint64_t type)
+{
+	struct kdbus_item *item;
+
+	KDBUS_ITEM_FOREACH(item, msg, items)
+		if (item->type == type)
+			return item;
+
+	return NULL;
+}
+
+static int kdbus_match_kdbus_creds(struct kdbus_msg *msg,
+				   const struct kdbus_creds *expected_creds)
+{
+	struct kdbus_item *item;
+
+	item = kdbus_get_item(msg, KDBUS_ITEM_CREDS);
+	ASSERT_RETURN(item);
+
+	ASSERT_RETURN(memcmp(&item->creds, expected_creds,
+			     sizeof(struct kdbus_creds)) == 0);
+
+	return 0;
+}
+
+static int kdbus_match_kdbus_pids(struct kdbus_msg *msg,
+				  const struct kdbus_pids *expected_pids)
+{
+	struct kdbus_item *item;
+
+	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
+	ASSERT_RETURN(item);
+
+	ASSERT_RETURN(memcmp(&item->pids, expected_pids,
+			     sizeof(struct kdbus_pids)) == 0);
+
+	return 0;
+}
+
+static int __kdbus_clone_userns_test(const char *bus,
+				     struct kdbus_conn *conn,
+				     uint64_t grandpa_pid,
+				     int signal_fd)
+{
+	int clone_ret;
+	int ret;
+	struct kdbus_msg *msg = NULL;
+	const struct kdbus_item *item;
+	uint64_t cookie = time(NULL) ^ 0xdeadbeef;
+	struct kdbus_conn *unpriv_conn = NULL;
+	struct kdbus_pids parent_pids = {
+		.pid = getppid(),
+		.tid = getppid(),
+		.ppid = grandpa_pid,
+	};
+
+	ret = drop_privileges(UNPRIV_UID, UNPRIV_GID);
+	ASSERT_EXIT(ret == 0);
+
+	unpriv_conn = kdbus_hello(bus, 0, NULL, 0);
+	ASSERT_EXIT(unpriv_conn);
+
+	ret = kdbus_add_match_empty(unpriv_conn);
+	ASSERT_EXIT(ret == 0);
+
+	/*
+	 * ping privileged connection from this new unprivileged
+	 * one
+	 */
+
+	ret = kdbus_msg_send(unpriv_conn, NULL, cookie, 0, 0,
+			     0, conn->id);
+	ASSERT_EXIT(ret == 0);
+
+	/*
+	 * Since we just dropped privileges, the dumpable flag
+	 * was just cleared which makes the /proc/$clone_child/uid_map
+	 * to be owned by root, hence any userns uid mapping will fail
+	 * with -EPERM since the mapping will be done by uid 65534.
+	 *
+	 * To avoid this set the dumpable flag again which makes
+	 * procfs update the /proc/$clone_child/ inodes owner to 65534.
+	 *
+	 * Using this we will be able write to /proc/$clone_child/uid_map
+	 * as uid 65534 and map the uid 65534 to 0 inside the user namespace.
+	 */
+	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
+	ASSERT_EXIT(ret == 0);
+
+	/* Make child privileged in its new userns and run tests */
+
+	ret = RUN_CLONE_CHILD(&clone_ret,
+			      SIGCHLD | CLONE_NEWUSER | CLONE_NEWPID,
+	({ 0;  /* Clone setup, nothing */ }),
+	({
+		eventfd_t event_status = 0;
+		struct kdbus_conn *userns_conn;
+
+		/* ping connection from the new user namespace */
+		userns_conn = kdbus_hello(bus, 0, NULL, 0);
+		ASSERT_EXIT(userns_conn);
+
+		ret = kdbus_add_match_empty(userns_conn);
+		ASSERT_EXIT(ret == 0);
+
+		cookie++;
+		ret = kdbus_msg_send(userns_conn, NULL, cookie,
+				     0, 0, 0, conn->id);
+		ASSERT_EXIT(ret == 0);
+
+		/* Parent did send */
+		ret = eventfd_read(signal_fd, &event_status);
+		ASSERT_RETURN(ret >= 0 && event_status == 1);
+
+		/*
+		 * Receive from privileged connection
+		 */
+		kdbus_printf("Privileged → unprivileged/privileged "
+			     "in its userns "
+			     "(different userns and pidns):\n");
+		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
+		ASSERT_EXIT(ret == 0);
+		ASSERT_EXIT(msg->dst_id == userns_conn->id);
+
+		/* Different namespaces no CAPS */
+		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
+		ASSERT_EXIT(item == NULL);
+
+		/* uid/gid not mapped, so we have unpriv cached creds */
+		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * Diffent pid namepsaces. This is the child pidns
+		 * so it should not see its parent kdbus_pids
+		 */
+		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
+		ASSERT_EXIT(ret == 0);
+
+		kdbus_msg_free(msg);
+
+
+		/*
+		 * Receive broadcast from privileged connection
+		 */
+		kdbus_printf("Privileged → unprivileged/privileged "
+			     "in its userns "
+			     "(different userns and pidns):\n");
+		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
+		ASSERT_EXIT(ret == 0);
+		ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
+
+		/* Different namespaces no CAPS */
+		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
+		ASSERT_EXIT(item == NULL);
+
+		/* uid/gid not mapped, so we have unpriv cached creds */
+		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * Diffent pid namepsaces. This is the child pidns
+		 * so it should not see its parent kdbus_pids
+		 */
+		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
+		ASSERT_EXIT(ret == 0);
+
+		kdbus_msg_free(msg);
+
+		kdbus_conn_free(userns_conn);
+	}),
+	({
+		/* Parent setup map child uid/gid */
+		ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
+		ASSERT_EXIT(ret == 0);
+	}),
+	({ 0; }));
+	/* Unprivileged was not able to create user namespace */
+	if (clone_ret == -EPERM) {
+		kdbus_printf("-- CLONE_NEWUSER TEST Failed for "
+			     "uid: %u\n -- Make sure that your kernel "
+			     "do not allow CLONE_NEWUSER for "
+			     "unprivileged users\n", UNPRIV_UID);
+		ret = 0;
+		goto out;
+	}
+
+	ASSERT_EXIT(ret == 0);
+
+
+	/*
+	 * Receive from privileged connection
+	 */
+	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
+	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
+
+	ASSERT_EXIT(ret == 0);
+	ASSERT_EXIT(msg->dst_id == unpriv_conn->id);
+
+	/* will get the privileged creds */
+	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
+	ASSERT_EXIT(ret == 0);
+
+	/* Same pidns so will get the kdbus_pids */
+	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_msg_free(msg);
+
+
+	/*
+	 * Receive broadcast from privileged connection
+	 */
+	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
+	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
+
+	ASSERT_EXIT(ret == 0);
+	ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
+
+	/* will get the privileged creds */
+	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
+	ASSERT_EXIT(ret == 0);
+
+	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_msg_free(msg);
+
+out:
+	kdbus_conn_free(unpriv_conn);
+
+	return ret;
+}
+
+static int kdbus_clone_userns_test(const char *bus,
+				   struct kdbus_conn *conn)
+{
+	int ret;
+	int status;
+	int efd = -1;
+	pid_t pid, ppid;
+	uint64_t unpriv_conn_id = 0;
+	uint64_t userns_conn_id = 0;
+	struct kdbus_msg *msg;
+	const struct kdbus_item *item;
+	struct kdbus_pids expected_pids;
+	struct kdbus_conn *monitor = NULL;
+
+	kdbus_printf("STARTING TEST 'metadata-ns'.\n");
+
+	monitor = kdbus_hello(bus, KDBUS_HELLO_MONITOR, NULL, 0);
+	ASSERT_EXIT(monitor);
+
+	/*
+	 * parent will signal to child that is in its
+	 * userns to read its queue
+	 */
+	efd = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd >= 0, efd);
+
+	ppid = getppid();
+
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, -errno);
+
+	if (pid == 0) {
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		ASSERT_EXIT_VAL(ret == 0, -errno);
+
+		ret = __kdbus_clone_userns_test(bus, conn, ppid, efd);
+		_exit(ret);
+	}
+
+
+	/* Phase 1) privileged receives from unprivileged */
+
+	/*
+	 * Receive from the unprivileged child
+	 */
+	kdbus_printf("\nUnprivileged → privileged (same namespaces):\n");
+	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	unpriv_conn_id = msg->src_id;
+
+	/* Unprivileged user */
+	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
+	ASSERT_RETURN(ret == 0);
+
+	/* Set the expected creds_pids */
+	expected_pids = (struct kdbus_pids) {
+		.pid = pid,
+		.tid = pid,
+		.ppid = getpid(),
+	};
+	ret = kdbus_match_kdbus_pids(msg, &expected_pids);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_msg_free(msg);
+
+
+	/*
+	 * Receive from the unprivileged that is in his own
+	 * userns and pidns
+	 */
+
+	kdbus_printf("\nUnprivileged/privileged in its userns → privileged "
+		     "(different userns and pidns)\n");
+	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
+	if (ret == -ETIMEDOUT)
+		/* perhaps unprivileged userns is not allowed */
+		goto wait;
+
+	ASSERT_RETURN(ret == 0);
+
+	userns_conn_id = msg->src_id;
+
+	/* We do not share the userns, os no KDBUS_ITEM_CAPS */
+	item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
+	ASSERT_RETURN(item == NULL);
+
+	/*
+	 * Compare received items, creds must be translated into
+	 * the receiver user namespace, so the user is unprivileged
+	 */
+	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * We should have the kdbus_pids since we are the parent
+	 * pidns
+	 */
+	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
+	ASSERT_RETURN(item);
+
+	ASSERT_RETURN(memcmp(&item->pids, &unmapped_pids,
+			     sizeof(struct kdbus_pids)) != 0);
+
+	/*
+	 * Parent pid of the unprivileged/privileged in its userns
+	 * is the unprivileged child pid that was forked here.
+	 */
+	ASSERT_RETURN((uint64_t)pid == item->pids.ppid);
+
+	kdbus_msg_free(msg);
+
+
+	/* Phase 2) Privileged connection sends now 3 packets */
+
+	/*
+	 * Sending to unprivileged connections a unicast
+	 */
+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
+			     0, unpriv_conn_id);
+	ASSERT_RETURN(ret == 0);
+
+	/* signal to child that is in its userns */
+	ret = eventfd_write(efd, 1);
+	ASSERT_EXIT(ret == 0);
+
+	/*
+	 * Sending to unprivileged/privilged in its userns
+	 * connections a unicast
+	 */
+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
+			     0, userns_conn_id);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Sending to unprivileged connections a broadcast
+	 */
+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
+			     0, KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+
+wait:
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN(ret >= 0);
+
+	ASSERT_RETURN(WIFEXITED(status))
+	ASSERT_RETURN(!WEXITSTATUS(status));
+
+	/* Dump monitor queue */
+	kdbus_printf("\n\nMonitor queue:\n");
+	for (;;) {
+		ret = kdbus_msg_recv_poll(monitor, 100, &msg, NULL);
+		if (ret < 0)
+			break;
+
+		if (msg->payload_type == KDBUS_PAYLOAD_DBUS) {
+			/*
+			 * Parent pidns should see all the
+			 * pids
+			 */
+			item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
+			ASSERT_RETURN(item);
+
+			ASSERT_RETURN(item->pids.pid != 0 &&
+				      item->pids.tid != 0 &&
+				      item->pids.ppid != 0);
+		}
+
+		kdbus_msg_free(msg);
+	}
+
+	kdbus_conn_free(monitor);
+	close(efd);
+
+	return 0;
+}
+
+int kdbus_test_metadata_ns(struct kdbus_test_env *env)
+{
+	int ret;
+	struct kdbus_conn *holder, *conn;
+	struct kdbus_policy_access policy_access = {
+		/* Allow world so we can inspect metadata in namespace */
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	/*
+	 * We require user-namespaces and all uids/gids
+	 * should be mapped (we can just require the necessary ones)
+	 */
+	if (!config_user_ns_is_enabled() ||
+	    !all_uids_gids_are_mapped())
+		return TEST_SKIP;
+
+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, CAP_SYS_ADMIN, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	/* no enough privileges, SKIP test */
+	if (!ret)
+		return TEST_SKIP;
+
+	holder = kdbus_hello_registrar(env->buspath, "com.example.metadata",
+				       &policy_access, 1,
+				       KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_add_match_empty(conn);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(conn, "com.example.metadata", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	ret = kdbus_clone_userns_test(env->buspath, conn);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(holder);
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-monitor.c b/tools/testing/selftests/kdbus/test-monitor.c
new file mode 100644
index 000000000000..30e8f6305f8f
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-monitor.c
@@ -0,0 +1,158 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <assert.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+int kdbus_test_monitor(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *monitor, *conn;
+	unsigned int cookie = 0xdeadbeef;
+	struct kdbus_msg *msg;
+	uint64_t offset = 0;
+	int ret;
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	/* add matches to make sure the monitor do not trigger an item add or
+	 * remove on connect and disconnect, respectively.
+	 */
+	ret = kdbus_add_match_id(conn, 0x1, KDBUS_ITEM_ID_ADD,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_add_match_id(conn, 0x2, KDBUS_ITEM_ID_REMOVE,
+				 KDBUS_MATCH_ID_ANY);
+	ASSERT_RETURN(ret == 0);
+
+	/* register a monitor */
+	monitor = kdbus_hello(env->buspath, KDBUS_HELLO_MONITOR, NULL, 0);
+	ASSERT_RETURN(monitor);
+
+	/* make sure we did not receive a monitor connect notification */
+	ret = kdbus_msg_recv(conn, &msg, &offset);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/* check that a monitor cannot acquire a name */
+	ret = kdbus_name_acquire(monitor, "foo.bar.baz", NULL);
+	ASSERT_RETURN(ret == -EOPNOTSUPP);
+
+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0,  0, conn->id);
+	ASSERT_RETURN(ret == 0);
+
+	/* the recipient should have gotten the message */
+	ret = kdbus_msg_recv(conn, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+	kdbus_msg_free(msg);
+	kdbus_free(conn, offset);
+
+	/* and so should the monitor */
+	ret = kdbus_msg_recv(monitor, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+	kdbus_free(monitor, offset);
+
+	/* Installing matches for monitors must fais must fail */
+	ret = kdbus_add_match_empty(monitor);
+	ASSERT_RETURN(ret == -EOPNOTSUPP);
+
+	cookie++;
+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	/* The monitor should get the message. */
+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+	kdbus_free(monitor, offset);
+
+	/*
+	 * Since we are the only monitor, update the attach flags
+	 * and tell we are not interessted in attach flags recv
+	 */
+
+	ret = kdbus_conn_update_attach_flags(monitor,
+					     _KDBUS_ATTACH_ALL,
+					     0);
+	ASSERT_RETURN(ret == 0);
+
+	cookie++;
+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_msg_free(msg);
+	kdbus_free(monitor, offset);
+
+	/*
+	 * Now we are interested in KDBUS_ITEM_TIMESTAMP and
+	 * KDBUS_ITEM_CREDS
+	 */
+	ret = kdbus_conn_update_attach_flags(monitor,
+					     _KDBUS_ATTACH_ALL,
+					     KDBUS_ATTACH_TIMESTAMP |
+					     KDBUS_ATTACH_CREDS);
+	ASSERT_RETURN(ret == 0);
+
+	cookie++;
+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == cookie);
+
+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
+	ASSERT_RETURN(ret == 1);
+
+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_CREDS);
+	ASSERT_RETURN(ret == 1);
+
+	/* the KDBUS_ITEM_PID_COMM was not requested */
+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_PID_COMM);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_msg_free(msg);
+	kdbus_free(monitor, offset);
+
+	kdbus_conn_free(monitor);
+	/* make sure we did not receive a monitor disconnect notification */
+	ret = kdbus_msg_recv(conn, &msg, &offset);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-names.c b/tools/testing/selftests/kdbus/test-names.c
new file mode 100644
index 000000000000..968a7cde9afe
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-names.c
@@ -0,0 +1,184 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <limits.h>
+#include <sys/ioctl.h>
+#include <getopt.h>
+#include <stdbool.h>
+
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+#include "kdbus-test.h"
+
+static int conn_is_name_owner(const struct kdbus_conn *conn,
+			      const char *needle)
+{
+	struct kdbus_cmd_name_list cmd_list = { .size = sizeof(cmd_list) };
+	struct kdbus_name_list *list;
+	struct kdbus_name_info *name;
+	bool found = false;
+	int ret;
+
+	cmd_list.flags = KDBUS_NAME_LIST_NAMES;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_NAME_LIST, &cmd_list);
+	ASSERT_RETURN(ret == 0);
+
+	list = (struct kdbus_name_list *)(conn->buf + cmd_list.offset);
+	KDBUS_ITEM_FOREACH(name, list, names) {
+		struct kdbus_item *item;
+		const char *n = NULL;
+
+		KDBUS_ITEM_FOREACH(item, name, items)
+			if (item->type == KDBUS_ITEM_OWNED_NAME)
+				n = item->name.name;
+
+		if (name->owner_id == conn->id &&
+		    n && strcmp(needle, n) == 0) {
+			found = true;
+			break;
+		}
+	}
+
+	ret = kdbus_free(conn, cmd_list.offset);
+	ASSERT_RETURN(ret == 0);
+
+	return found ? 0 : -1;
+}
+
+int kdbus_test_name_basic(struct kdbus_test_env *env)
+{
+	char *name, *dot_name, *invalid_name, *wildcard_name;
+	int ret;
+
+	name = "foo.bla.blaz";
+	dot_name = ".bla.blaz";
+	invalid_name = "foo";
+	wildcard_name = "foo.bla.bl.*";
+
+	/* Name is not valid, must fail */
+	ret = kdbus_name_acquire(env->conn, dot_name, NULL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_name_acquire(env->conn, invalid_name, NULL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_name_acquire(env->conn, wildcard_name, NULL);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	/* check that we can acquire a name */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = conn_is_name_owner(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* ... and release it again */
+	ret = kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	ret = conn_is_name_owner(env->conn, name);
+	ASSERT_RETURN(ret != 0);
+
+	/* check that we can't release it again */
+	ret = kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == -ESRCH);
+
+	/* check that we can't release a name that we don't own */
+	ret = kdbus_name_release(env->conn, "foo.bar.xxx");
+	ASSERT_RETURN(ret == -ESRCH);
+
+	/* Name is not valid, must fail */
+	ret = kdbus_name_release(env->conn, dot_name);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_name_release(env->conn, invalid_name);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_name_release(env->conn, wildcard_name);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	return TEST_OK;
+}
+
+int kdbus_test_name_conflict(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn;
+	char *name;
+	int ret;
+
+	name = "foo.bla.blaz";
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	/* allow the new connection to own the same name */
+	/* acquire name from the 1st connection */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = conn_is_name_owner(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* check that we can't acquire it again from the 1st connection */
+	ret = kdbus_name_acquire(env->conn, name, NULL);
+	ASSERT_RETURN(ret == -EALREADY);
+
+	/* check that we also can't acquire it again from the 2nd connection */
+	ret = kdbus_name_acquire(conn, name, NULL);
+	ASSERT_RETURN(ret == -EEXIST);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
+
+int kdbus_test_name_queue(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn;
+	const char *name;
+	uint64_t flags;
+	int ret;
+
+	name = "foo.bla.blaz";
+
+	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
+
+	/* create a 2nd connection */
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn != NULL);
+
+	/* allow the new connection to own the same name */
+	/* acquire name from the 1st connection */
+	ret = kdbus_name_acquire(env->conn, name, &flags);
+	ASSERT_RETURN(ret == 0);
+
+	ret = conn_is_name_owner(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* queue the 2nd connection as waiting owner */
+	flags = KDBUS_NAME_QUEUE;
+	ret = kdbus_name_acquire(conn, name, &flags);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
+
+	/* release name from 1st connection */
+	ret = kdbus_name_release(env->conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	/* now the name should be owned by the 2nd connection */
+	ret = conn_is_name_owner(conn, name);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-policy-ns.c b/tools/testing/selftests/kdbus/test-policy-ns.c
new file mode 100644
index 000000000000..abd5b922df6a
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-policy-ns.c
@@ -0,0 +1,633 @@
+/*
+ * Test metadata and policies in new namespaces. Even if our tests
+ * can run in a namespaced setup, this test is necessary so we can
+ * inspect policies on the same kdbusfs but between multiple
+ * namespaces.
+ *
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/wait.h>
+#include <sys/prctl.h>
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include <sys/syscall.h>
+#include <sys/capability.h>
+#include <linux/sched.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+#define MAX_CONN	64
+#define POLICY_NAME	"foo.test.policy-test"
+
+#define KDBUS_CONN_MAX_MSGS_PER_USER            16
+
+/**
+ * Note: this test can be used to inspect policy_db->talk_access_hash
+ *
+ * The purpose of these tests:
+ * 1) Check KDBUS_POLICY_TALK
+ * 2) Check the cache state: kdbus_policy_db->talk_access_hash
+ * Should be extended
+ */
+
+/**
+ * Check a list of connections against conn_db[0]
+ * conn_db[0] will own the name "foo.test.policy-test" and the
+ * policy holder connection for this name will update the policy
+ * entries, so different use cases can be tested.
+ */
+static struct kdbus_conn **conn_db;
+
+static void *kdbus_recv_echo(void *ptr)
+{
+	int ret;
+	struct kdbus_conn *conn = ptr;
+
+	ret = kdbus_msg_recv_poll(conn, 200, NULL, NULL);
+
+	return (void *)(long)ret;
+}
+
+/* Trigger kdbus_policy_set() */
+static int kdbus_set_policy_talk(struct kdbus_conn *conn,
+				 const char *name,
+				 uid_t id, unsigned int type)
+{
+	int ret;
+	struct kdbus_policy_access access = {
+		.type = type,
+		.id = id,
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(conn, name, &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	return TEST_OK;
+}
+
+/* return TEST_OK or TEST_ERR on failure */
+static int kdbus_register_same_activator(char *bus, const char *name,
+					 struct kdbus_conn **c)
+{
+	int ret;
+	struct kdbus_conn *activator;
+
+	activator = kdbus_hello_activator(bus, name, NULL, 0);
+	if (activator) {
+		*c = activator;
+		fprintf(stderr, "--- error was able to register name twice '%s'.\n",
+			name);
+		return TEST_ERR;
+	}
+
+	ret = -errno;
+	/* -EEXIST means test succeeded */
+	if (ret == -EEXIST)
+		return TEST_OK;
+
+	return TEST_ERR;
+}
+
+/* return TEST_OK or TEST_ERR on failure */
+static int kdbus_register_policy_holder(char *bus, const char *name,
+					struct kdbus_conn **conn)
+{
+	struct kdbus_conn *c;
+	struct kdbus_policy_access access[2];
+
+	access[0].type = KDBUS_POLICY_ACCESS_USER;
+	access[0].access = KDBUS_POLICY_OWN;
+	access[0].id = geteuid();
+
+	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
+	access[1].access = KDBUS_POLICY_TALK;
+	access[1].id = geteuid();
+
+	c = kdbus_hello_registrar(bus, name, access, 2,
+				  KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(c);
+
+	*conn = c;
+
+	return TEST_OK;
+}
+
+/**
+ * Create new threads for receiving from multiple senders,
+ * The 'conn_db' will be populated by newly created connections.
+ * Caller should free all allocated connections.
+ *
+ * return 0 on success, negative errno on failure.
+ */
+static int kdbus_recv_in_threads(const char *bus, const char *name,
+				 struct kdbus_conn **conn_db)
+{
+	int ret;
+	bool pool_full = false;
+	unsigned int sent_packets = 0;
+	unsigned int lost_packets = 0;
+	unsigned int i, tid;
+	unsigned long dst_id;
+	unsigned long cookie = 1;
+	unsigned int thread_nr = MAX_CONN - 1;
+	pthread_t thread_id[MAX_CONN - 1] = {'\0'};
+
+	dst_id = name ? KDBUS_DST_ID_NAME : conn_db[0]->id;
+
+	for (tid = 0, i = 1; tid < thread_nr; tid++, i++) {
+		ret = pthread_create(&thread_id[tid], NULL,
+				     kdbus_recv_echo, (void *)conn_db[0]);
+		if (ret < 0) {
+			ret = -errno;
+			kdbus_printf("error pthread_create: %d (%m)\n",
+				      ret);
+			break;
+		}
+
+		/* just free before re-using */
+		kdbus_conn_free(conn_db[i]);
+		conn_db[i] = NULL;
+
+		/* We need to create connections here */
+		conn_db[i] = kdbus_hello(bus, 0, NULL, 0);
+		if (!conn_db[i]) {
+			ret = -errno;
+			break;
+		}
+
+		ret = kdbus_add_match_empty(conn_db[i]);
+		if (ret < 0)
+			break;
+
+		ret = kdbus_msg_send(conn_db[i], name, cookie++,
+				     0, 0, 0, dst_id);
+		if (ret < 0) {
+			/*
+			 * Receivers are not reading their messages,
+			 * not scheduled ?!
+			 *
+			 * So set the pool full here, perhaps the
+			 * connection pool or queue was full, later
+			 * recheck receivers errors
+			 */
+			if (ret == -ENOBUFS || ret == -EXFULL)
+				pool_full = true;
+			break;
+		}
+
+		sent_packets++;
+	}
+
+	for (tid = 0; tid < thread_nr; tid++) {
+		int thread_ret = 0;
+
+		if (thread_id[tid]) {
+			pthread_join(thread_id[tid], (void *)&thread_ret);
+			if (thread_ret < 0) {
+				/* Update only if send did not fail */
+				if (ret == 0)
+					ret = thread_ret;
+
+				lost_packets++;
+			}
+		}
+	}
+
+	/*
+	 * When sending if we did fail with -ENOBUFS or -EXFULL
+	 * then we should have set lost_packet and we should at
+	 * least have sent_packets set to KDBUS_CONN_MAX_MSGS_PER_USER
+	 */
+	if (pool_full) {
+		ASSERT_RETURN(lost_packets > 0);
+
+		/*
+		 * We should at least send KDBUS_CONN_MAX_MSGS_PER_USER
+		 *
+		 * For every send operation we create a thread to
+		 * recv the packet, so we keep the queue clean
+		 */
+		ASSERT_RETURN(sent_packets >= KDBUS_CONN_MAX_MSGS_PER_USER);
+
+		/*
+		 * Set ret to zero since we only failed due to
+		 * the receiving threads that have not been
+		 * scheduled
+		 */
+		ret = 0;
+	}
+
+	return ret;
+}
+
+/* Return: TEST_OK or TEST_ERR on failure */
+static int kdbus_normal_test(const char *bus, const char *name,
+			     struct kdbus_conn **conn_db)
+{
+	int ret;
+
+	ret = kdbus_recv_in_threads(bus, name, conn_db);
+	ASSERT_RETURN(ret >= 0);
+
+	return TEST_OK;
+}
+
+static int kdbus_fork_test_by_id(const char *bus,
+				 struct kdbus_conn **conn_db,
+				 int parent_status, int child_status)
+{
+	int ret;
+	pid_t pid;
+	uint64_t cookie = 0x9876ecba;
+	struct kdbus_msg *msg = NULL;
+	uint64_t offset = 0;
+	int status = 0;
+
+	/*
+	 * If the child_status is not EXIT_SUCCESS, then we expect
+	 * that sending from the child will fail, thus receiving
+	 * from parent must error with -ETIMEDOUT, and vice versa.
+	 */
+	bool parent_timedout = !!child_status;
+	bool child_timedout = !!parent_status;
+
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		struct kdbus_conn *conn_src;
+
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		ASSERT_EXIT(ret == 0);
+
+		ret = drop_privileges(65534, 65534);
+		ASSERT_EXIT(ret == 0);
+
+		conn_src = kdbus_hello(bus, 0, NULL, 0);
+		ASSERT_EXIT(conn_src);
+
+		ret = kdbus_add_match_empty(conn_src);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * child_status is always checked against send
+		 * operations, in case it fails always return
+		 * EXIT_FAILURE.
+		 */
+		ret = kdbus_msg_send(conn_src, NULL, cookie,
+				     0, 0, 0, conn_db[0]->id);
+		ASSERT_EXIT(ret == child_status);
+
+		ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
+
+		kdbus_conn_free(conn_src);
+
+		/*
+		 * Child kdbus_msg_recv_poll() should timeout since
+		 * the parent_status was set to a non EXIT_SUCCESS
+		 * value.
+		 */
+		if (child_timedout)
+			_exit(ret == -ETIMEDOUT ? EXIT_SUCCESS : EXIT_FAILURE);
+
+		_exit(ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	ret = kdbus_msg_recv_poll(conn_db[0], 100, &msg, &offset);
+	/*
+	 * If parent_timedout is set then this should fail with
+	 * -ETIMEDOUT since the child_status was set to a non
+	 * EXIT_SUCCESS value. Otherwise, assume
+	 * that kdbus_msg_recv_poll() has succeeded.
+	 */
+	if (parent_timedout) {
+		ASSERT_RETURN_VAL(ret == -ETIMEDOUT, TEST_ERR);
+
+		/* timedout no need to continue, we don't have the
+		 * child connection ID, so just terminate. */
+		goto out;
+	} else {
+		ASSERT_RETURN_VAL(ret == 0, ret);
+	}
+
+	ret = kdbus_msg_send(conn_db[0], NULL, ++cookie,
+			     0, 0, 0, msg->src_id);
+	/*
+	 * parent_status is checked against send operations,
+	 * on failures always return TEST_ERR.
+	 */
+	ASSERT_RETURN_VAL(ret == parent_status, TEST_ERR);
+
+	kdbus_msg_free(msg);
+	kdbus_free(conn_db[0], offset);
+
+out:
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+/*
+ * Return: TEST_OK, TEST_ERR or TEST_SKIP
+ * we return TEST_OK only if the children return with the expected
+ * 'expected_status' that is specified as an argument.
+ */
+static int kdbus_fork_test(const char *bus, const char *name,
+			   struct kdbus_conn **conn_db, int expected_status)
+{
+	pid_t pid;
+	int ret = 0;
+	int status = 0;
+
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		ASSERT_EXIT(ret == 0);
+
+		ret = drop_privileges(65534, 65534);
+		ASSERT_EXIT(ret == 0);
+
+		ret = kdbus_recv_in_threads(bus, name, conn_db);
+		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN(ret >= 0);
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+/* Return EXIT_SUCCESS, EXIT_FAILURE or negative errno */
+static int __kdbus_clone_userns_test(const char *bus,
+				     const char *name,
+				     struct kdbus_conn **conn_db,
+				     int expected_status)
+{
+	int efd;
+	pid_t pid;
+	int ret = 0;
+	unsigned int uid = 65534;
+	int status;
+
+	ret = drop_privileges(uid, uid);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	/*
+	 * Since we just dropped privileges, the dumpable flag was just
+	 * cleared which makes the /proc/$clone_child/uid_map to be
+	 * owned by root, hence any userns uid mapping will fail with
+	 * -EPERM since the mapping will be done by uid 65534.
+	 *
+	 * To avoid this set the dumpable flag again which makes procfs
+	 * update the /proc/$clone_child/ inodes owner to 65534.
+	 *
+	 * Using this we will be able write to /proc/$clone_child/uid_map
+	 * as uid 65534 and map the uid 65534 to 0 inside the user
+	 * namespace.
+	 */
+	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	/* sync parent/child */
+	efd = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd >= 0, efd);
+
+	pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWUSER, NULL);
+	if (pid < 0) {
+		ret = -errno;
+		kdbus_printf("error clone: %d (%m)\n", ret);
+		/*
+		 * Normal user not allowed to create userns,
+		 * so nothing to worry about ?
+		 */
+		if (ret == -EPERM) {
+			kdbus_printf("-- CLONE_NEWUSER TEST Failed for uid: %u\n"
+				"-- Make sure that your kernel do not allow "
+				"CLONE_NEWUSER for unprivileged users\n"
+				"-- Upstream Commit: "
+				"https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e\n",
+				uid);
+			ret = 0;
+		}
+
+		return ret;
+	}
+
+	if (pid == 0) {
+		struct kdbus_conn *conn_src;
+		eventfd_t event_status = 0;
+
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		ASSERT_EXIT(ret == 0);
+
+		ret = eventfd_read(efd, &event_status);
+		ASSERT_EXIT(ret >= 0 && event_status == 1);
+
+		/* ping connection from the new user namespace */
+		conn_src = kdbus_hello(bus, 0, NULL, 0);
+		ASSERT_EXIT(conn_src);
+
+		ret = kdbus_add_match_empty(conn_src);
+		ASSERT_EXIT(ret == 0);
+
+		ret = kdbus_msg_send(conn_src, name, 0xabcd1234,
+				     0, 0, 0, KDBUS_DST_ID_NAME);
+		kdbus_conn_free(conn_src);
+
+		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	/* Tell child we are ready */
+	ret = eventfd_write(efd, 1);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	close(efd);
+
+	return status == EXIT_SUCCESS ? TEST_OK : TEST_ERR;
+}
+
+static int kdbus_clone_userns_test(const char *bus,
+				   const char *name,
+				   struct kdbus_conn **conn_db,
+				   int expected_status)
+{
+	pid_t pid;
+	int ret = 0;
+	int status;
+
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, -errno);
+
+	if (pid == 0) {
+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
+		if (ret < 0)
+			_exit(EXIT_FAILURE);
+
+		ret = __kdbus_clone_userns_test(bus, name, conn_db,
+						expected_status);
+		_exit(ret);
+	}
+
+	/*
+	 * Receive in the original (root privileged) user namespace,
+	 * must fail with -ETIMEDOUT.
+	 */
+	ret = kdbus_msg_recv_poll(conn_db[0], 100, NULL, NULL);
+	ASSERT_RETURN_VAL(ret == -ETIMEDOUT, ret);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+int kdbus_test_policy_ns(struct kdbus_test_env *env)
+{
+	int i;
+	int ret;
+	struct kdbus_conn *activator = NULL;
+	struct kdbus_conn *policy_holder = NULL;
+	char *bus = env->buspath;
+
+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	/* no enough privileges, SKIP test */
+	if (!ret)
+		return TEST_SKIP;
+
+	/* we require user-namespaces */
+	if (access("/proc/self/uid_map", F_OK) != 0)
+		return TEST_SKIP;
+
+	/* uids/gids must be mapped */
+	if (!all_uids_gids_are_mapped())
+		return TEST_SKIP;
+
+	conn_db = calloc(MAX_CONN, sizeof(struct kdbus_conn *));
+	ASSERT_RETURN(conn_db);
+
+	memset(conn_db, 0, MAX_CONN * sizeof(struct kdbus_conn *));
+
+	conn_db[0] = kdbus_hello(bus, 0, NULL, 0);
+	ASSERT_RETURN(conn_db[0]);
+
+	ret = kdbus_add_match_empty(conn_db[0]);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
+	ASSERT_EXIT(ret == 0);
+
+	ret = kdbus_register_policy_holder(bus, POLICY_NAME,
+					   &policy_holder);
+	ASSERT_RETURN(ret == 0);
+
+	/* Try to register the same name with an activator */
+	ret = kdbus_register_same_activator(bus, POLICY_NAME,
+					    &activator);
+	ASSERT_RETURN(ret == 0);
+
+	/* Acquire POLICY_NAME */
+	ret = kdbus_name_acquire(conn_db[0], POLICY_NAME, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_normal_test(bus, POLICY_NAME, conn_db);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_list(conn_db[0], KDBUS_NAME_LIST_NAMES |
+					  KDBUS_NAME_LIST_UNIQUE |
+					  KDBUS_NAME_LIST_ACTIVATORS |
+					  KDBUS_NAME_LIST_QUEUED);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, EXIT_SUCCESS);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * children connections are able to talk to conn_db[0] since
+	 * current POLICY_NAME TALK type is KDBUS_POLICY_ACCESS_WORLD,
+	 * so expect EXIT_SUCCESS when sending from child. However,
+	 * since the child's connection does not own any well-known
+	 * name, The parent connection conn_db[0] should fail with
+	 * -EPERM but since it is a privileged bus user the TALK is
+	 *  allowed.
+	 */
+	ret = kdbus_fork_test_by_id(bus, conn_db,
+				    EXIT_SUCCESS, EXIT_SUCCESS);
+	ASSERT_EXIT(ret == 0);
+
+	/*
+	 * Connections that can talk are perhaps being destroyed now.
+	 * Restrict the policy and purge cache entries where the
+	 * conn_db[0] is the destination.
+	 *
+	 * Now only connections with uid == 0 are allowed to talk.
+	 */
+	ret = kdbus_set_policy_talk(policy_holder, POLICY_NAME,
+				    geteuid(), KDBUS_POLICY_ACCESS_USER);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Testing connections (FORK+DROP) again:
+	 * After setting the policy re-check connections
+	 * we expect the children to fail with -EPERM
+	 */
+	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, -EPERM);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Now expect that both parent and child to fail.
+	 *
+	 * Child should fail with -EPERM since we just restricted
+	 * the POLICY_NAME TALK to uid 0 and its uid is 65534.
+	 *
+	 * Since the parent's connection will timeout when receiving
+	 * from the child, we never continue. FWIW just put -EPERM.
+	 */
+	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
+	ASSERT_EXIT(ret == 0);
+
+	/* Check if the name can be reached in a new userns */
+	ret = kdbus_clone_userns_test(bus, POLICY_NAME, conn_db, -EPERM);
+	ASSERT_RETURN(ret == 0);
+
+	for (i = 0; i < MAX_CONN; i++)
+		kdbus_conn_free(conn_db[i]);
+
+	kdbus_conn_free(activator);
+	kdbus_conn_free(policy_holder);
+
+	free(conn_db);
+
+	return ret;
+}
diff --git a/tools/testing/selftests/kdbus/test-policy-priv.c b/tools/testing/selftests/kdbus/test-policy-priv.c
new file mode 100644
index 000000000000..ab515201be2a
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-policy-priv.c
@@ -0,0 +1,1270 @@
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <time.h>
+#include <sys/capability.h>
+#include <sys/eventfd.h>
+#include <sys/ioctl.h>
+#include <sys/wait.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+static int test_policy_priv_by_id(const char *bus,
+				  struct kdbus_conn *conn_dst,
+				  bool drop_second_user,
+				  int parent_status,
+				  int child_status)
+{
+	int ret = 0;
+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
+
+	ASSERT_RETURN(conn_dst);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, bus, ({
+		ret = kdbus_msg_send(unpriv, NULL,
+				     expected_cookie, 0, 0, 0,
+				     conn_dst->id);
+		ASSERT_EXIT(ret == child_status);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_dst, 300, NULL, NULL);
+	ASSERT_RETURN(ret == parent_status);
+
+	return 0;
+}
+
+static int test_policy_priv_by_broadcast(const char *bus,
+					 struct kdbus_conn *conn_dst,
+					 int drop_second_user,
+					 int parent_status,
+					 int child_status)
+{
+	int efd;
+	int ret = 0;
+	eventfd_t event_status = 0;
+	struct kdbus_msg *msg = NULL;
+	uid_t second_uid = UNPRIV_UID;
+	gid_t second_gid = UNPRIV_GID;
+	struct kdbus_conn *child_2 = conn_dst;
+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
+
+	/* Drop to another unprivileged user other than UNPRIV_UID */
+	if (drop_second_user == DROP_OTHER_UNPRIV) {
+		second_uid = UNPRIV_UID - 1;
+		second_gid = UNPRIV_GID - 1;
+	}
+
+	/* child will signal parent to send broadcast */
+	efd = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd >= 0, efd);
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
+		struct kdbus_conn *child;
+
+		child = kdbus_hello(bus, 0, NULL, 0);
+		ASSERT_EXIT(child);
+
+		ret = kdbus_add_match_empty(child);
+		ASSERT_EXIT(ret == 0);
+
+		/* signal parent */
+		ret = eventfd_write(efd, 1);
+		ASSERT_EXIT(ret == 0);
+
+		/* Use a little bit high time */
+		ret = kdbus_msg_recv_poll(child, 500, &msg, NULL);
+		ASSERT_EXIT(ret == child_status);
+
+		/*
+		 * If we expect the child to get the broadcast
+		 * message, then check the received cookie.
+		 */
+		if (ret == 0) {
+			ASSERT_EXIT(expected_cookie == msg->cookie);
+		}
+
+		/* Use expected_cookie since 'msg' might be NULL */
+		ret = kdbus_msg_send(child, NULL, expected_cookie + 1,
+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == 0);
+
+		kdbus_msg_free(msg);
+		kdbus_conn_free(child);
+	}),
+	({
+		if (drop_second_user == DO_NOT_DROP) {
+			ASSERT_RETURN(child_2);
+
+			ret = eventfd_read(efd, &event_status);
+			ASSERT_RETURN(ret >= 0 && event_status == 1);
+
+			ret = kdbus_msg_send(child_2, NULL,
+					     expected_cookie, 0, 0, 0,
+					     KDBUS_DST_ID_BROADCAST);
+			ASSERT_RETURN(ret == 0);
+
+			/* Use a little bit high time */
+			ret = kdbus_msg_recv_poll(child_2, 1000,
+						  &msg, NULL);
+			ASSERT_RETURN(ret == parent_status);
+
+			/*
+			 * Check returned cookie in case we expect
+			 * success.
+			 */
+			if (ret == 0) {
+				ASSERT_RETURN(msg->cookie ==
+					      expected_cookie + 1);
+			}
+
+			kdbus_msg_free(msg);
+		} else {
+			/*
+			 * Two unprivileged users will try to
+			 * communicate using broadcast.
+			 */
+			ret = RUN_UNPRIVILEGED(second_uid, second_gid, ({
+				child_2 = kdbus_hello(bus, 0, NULL, 0);
+				ASSERT_EXIT(child_2);
+
+				ret = kdbus_add_match_empty(child_2);
+				ASSERT_EXIT(ret == 0);
+
+				ret = eventfd_read(efd, &event_status);
+				ASSERT_EXIT(ret >= 0 && event_status == 1);
+
+				ret = kdbus_msg_send(child_2, NULL,
+						expected_cookie, 0, 0, 0,
+						KDBUS_DST_ID_BROADCAST);
+				ASSERT_EXIT(ret == 0);
+
+				/* Use a little bit high time */
+				ret = kdbus_msg_recv_poll(child_2, 1000,
+							  &msg, NULL);
+				ASSERT_EXIT(ret == parent_status);
+
+				/*
+				 * Check returned cookie in case we expect
+				 * success.
+				 */
+				if (ret == 0) {
+					ASSERT_EXIT(msg->cookie ==
+						    expected_cookie + 1);
+				}
+
+				kdbus_msg_free(msg);
+				kdbus_conn_free(child_2);
+			}),
+			({ 0; }));
+			ASSERT_RETURN(ret == 0);
+		}
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	close(efd);
+
+	return ret;
+}
+
+static void nosig(int sig)
+{
+}
+
+static int test_priv_before_policy_upload(struct kdbus_test_env *env)
+{
+	int ret = 0;
+	struct kdbus_conn *conn;
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	/*
+	 * Make sure unprivileged bus user cannot acquire names
+	 * before registring any policy holder.
+	 */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret < 0);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Make sure unprivileged bus users cannot talk by default
+	 * to privileged ones, unless a policy holder that allows
+	 * this was uploaded.
+	 */
+
+	ret = test_policy_priv_by_id(env->buspath, conn, false,
+				     -ETIMEDOUT, -EPERM);
+	ASSERT_RETURN(ret == 0);
+
+	/* Activate matching for a privileged connection */
+	ret = kdbus_add_match_empty(conn);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * First make sure that BROADCAST with msg flag
+	 * KDBUS_MSG_EXPECT_REPLY will fail with -ENOTUNIQ
+	 */
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef,
+				     KDBUS_MSG_EXPECT_REPLY,
+				     5000000000ULL, 0,
+				     KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == -ENOTUNIQ);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test broadcast with a privileged connection.
+	 *
+	 * The first unprivileged receiver should not get the
+	 * broadcast message sent by the privileged connection,
+	 * since there is no a TALK policy that allows the
+	 * unprivileged to TALK to the privileged connection. It
+	 * will fail with -ETIMEDOUT
+	 *
+	 * Then second case:
+	 * The privileged connection should get the broadcast
+	 * message from the unprivileged one. Since the receiver is
+	 * a privileged bus user and it has default TALK access to
+	 * all connections it will receive those.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, conn,
+					    DO_NOT_DROP,
+					    0, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+
+	/*
+	 * Test broadcast with two unprivileged connections running
+	 * under the same user.
+	 *
+	 * Both connections should succeed.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
+					    DROP_SAME_UNPRIV, 0, 0);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test broadcast with two unprivileged connections running
+	 * under different users.
+	 *
+	 * Both connections will fail with -ETIMEDOUT.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
+					    DROP_OTHER_UNPRIV,
+					    -ETIMEDOUT, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(conn);
+
+	return ret;
+}
+
+static int test_broadcast_after_policy_upload(struct kdbus_test_env *env)
+{
+	int ret;
+	int efd;
+	eventfd_t event_status = 0;
+	struct kdbus_msg *msg = NULL;
+	struct kdbus_conn *owner_a, *owner_b;
+	struct kdbus_conn *holder_a, *holder_b;
+	struct kdbus_policy_access access = {};
+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
+
+	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(owner_a);
+
+	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users cannot talk by default
+	 * to privileged ones, unless a policy holder that allows
+	 * this was uploaded.
+	 */
+
+	++expected_cookie;
+	ret = test_policy_priv_by_id(env->buspath, owner_a, false,
+				     -ETIMEDOUT, -EPERM);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Make sure that privileged won't receive broadcasts unless
+	 * it installs a match. It will fail with -ETIMEDOUT
+	 *
+	 * At same time check that the unprivileged connection will
+	 * not receive the broadcast message from the privileged one
+	 * since the privileged one owns a name with a restricted
+	 * policy TALK (actually the TALK policy is still not
+	 * registered so we fail by default), thus the unprivileged
+	 * receiver is not able to TALK to that name.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
+					    DO_NOT_DROP,
+					    -ETIMEDOUT, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+	/* Activate matching for a privileged connection */
+	ret = kdbus_add_match_empty(owner_a);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Redo the previous test. The privileged conn owner_a is
+	 * able to TALK to any connection so it will receive the
+	 * broadcast message now.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
+					    DO_NOT_DROP,
+					    0, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test that broadcast between two unprivileged users running
+	 * under the same user still succeed.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
+					    DROP_SAME_UNPRIV, 0, 0);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test broadcast with two unprivileged connections running
+	 * under different users.
+	 *
+	 * Both connections will fail with -ETIMEDOUT.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
+					    DROP_OTHER_UNPRIV,
+					    -ETIMEDOUT, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	holder_a = kdbus_hello_registrar(env->buspath,
+					 "com.example.broadcastA",
+					 &access, 1,
+					 KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder_a);
+
+	holder_b = kdbus_hello_registrar(env->buspath,
+					 "com.example.broadcastB",
+					 &access, 1,
+					 KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(holder_b);
+
+	/* Free connections and their received messages and restart */
+	kdbus_conn_free(owner_a);
+
+	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(owner_a);
+
+	/* Activate matching for a privileged connection */
+	ret = kdbus_add_match_empty(owner_a);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	owner_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(owner_b);
+
+	ret = kdbus_name_acquire(owner_b, "com.example.broadcastB", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/* Activate matching for a privileged connection */
+	ret = kdbus_add_match_empty(owner_b);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Test that even if "com.example.broadcastA" and
+	 * "com.example.broadcastB" do have a TALK access by default
+	 * they are able to signal each other using broadcast due to
+	 * the fact they are privileged connections, they receive
+	 * all broadcasts if the match allows it.
+	 */
+
+	++expected_cookie;
+	ret = kdbus_msg_send(owner_a, NULL, expected_cookie, 0,
+			     0, 0, KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv_poll(owner_b, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+	ASSERT_RETURN(msg->cookie == expected_cookie);
+
+	/* Check src ID */
+	ASSERT_RETURN(msg->src_id == owner_a->id);
+
+	kdbus_msg_free(msg);
+
+	/* Release name "com.example.broadcastB" */
+
+	ret = kdbus_name_release(owner_b, "com.example.broadcastB");
+	ASSERT_EXIT(ret >= 0);
+
+	/* KDBUS_POLICY_OWN for unprivileged connections */
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	/* Update the policy so unprivileged will own the name */
+
+	ret = kdbus_conn_update_policy(holder_b,
+				       "com.example.broadcastB",
+				       &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Send broadcasts from an unprivileged connection that
+	 * owns a name "com.example.broadcastB".
+	 *
+	 * We'll have four destinations here:
+	 *
+	 * 1) destination owner_a: privileged connection that owns
+	 * "com.example.broadcastA". It will receive the broadcast
+	 * since it is a privileged has default TALK access to all
+	 * connections, and it is subscribed to the match.
+	 * Will succeed.
+	 *
+	 * owner_b: privileged connection (running under a different
+	 * uid) that do not own names, but with an empty broadcast
+	 * match, so it will receive broadcasts since it has default
+	 * TALK access to all connection.
+	 *
+	 * unpriv_a: unpriv connection that do not own any name.
+	 * It will receive the broadcast since it is running under
+	 * the same user of the one broadcasting and did install
+	 * matches. It should get the message.
+	 *
+	 * unpriv_b: unpriv connection is not interested in broadcast
+	 * messages, so it did not install broadcast matches. Should
+	 * fail with -ETIMEDOUT
+	 */
+
+	++expected_cookie;
+	efd = eventfd(0, EFD_CLOEXEC);
+	ASSERT_RETURN_VAL(efd >= 0, efd);
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
+		struct kdbus_conn *unpriv_owner;
+		struct kdbus_conn *unpriv_a, *unpriv_b;
+
+		unpriv_owner = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_EXIT(unpriv_owner);
+
+		unpriv_a = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_EXIT(unpriv_a);
+
+		unpriv_b = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_EXIT(unpriv_b);
+
+		ret = kdbus_name_acquire(unpriv_owner,
+					 "com.example.broadcastB",
+					 NULL);
+		ASSERT_EXIT(ret >= 0);
+
+		ret = kdbus_add_match_empty(unpriv_a);
+		ASSERT_EXIT(ret == 0);
+
+		/* Signal that we are doing broadcasts */
+		ret = eventfd_write(efd, 1);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * Do broadcast from a connection that owns the
+		 * names "com.example.broadcastB".
+		 */
+		ret = kdbus_msg_send(unpriv_owner, NULL,
+				     expected_cookie,
+				     0, 0, 0,
+				     KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == 0);
+
+		/*
+		 * Unprivileged connection running under the same
+		 * user. It should succeed.
+		 */
+		ret = kdbus_msg_recv_poll(unpriv_a, 300, &msg, NULL);
+		ASSERT_EXIT(ret == 0 && msg->cookie == expected_cookie);
+
+		/*
+		 * Did not install matches, not interested in
+		 * broadcasts
+		 */
+		ret = kdbus_msg_recv_poll(unpriv_b, 300, NULL, NULL);
+		ASSERT_EXIT(ret == -ETIMEDOUT);
+	}),
+	({
+		ret = eventfd_read(efd, &event_status);
+		ASSERT_RETURN(ret >= 0 && event_status == 1);
+
+		/*
+		 * owner_a must fail with -ETIMEDOUT, since it owns
+		 * name "com.example.broadcastA" and its TALK
+		 * access is restriced.
+		 */
+		ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
+		ASSERT_RETURN(ret == 0);
+
+		/* confirm the received cookie */
+		ASSERT_RETURN(msg->cookie == expected_cookie);
+
+		kdbus_msg_free(msg);
+
+		/*
+		 * owner_b got the broadcast from an unprivileged
+		 * connection.
+		 */
+		ret = kdbus_msg_recv_poll(owner_b, 300, &msg, NULL);
+		ASSERT_RETURN(ret == 0);
+
+		/* confirm the received cookie */
+		ASSERT_RETURN(msg->cookie == expected_cookie);
+
+		kdbus_msg_free(msg);
+
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	close(efd);
+
+	/*
+	 * Test broadcast with two unprivileged connections running
+	 * under different users.
+	 *
+	 * Both connections will fail with -ETIMEDOUT.
+	 */
+
+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
+					    DROP_OTHER_UNPRIV,
+					    -ETIMEDOUT, -ETIMEDOUT);
+	ASSERT_RETURN(ret == 0);
+
+	/* Drop received broadcasts by privileged */
+	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
+	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(owner_a, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
+	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_msg_recv(owner_b, NULL, NULL);
+	ASSERT_RETURN(ret == -EAGAIN);
+
+	/*
+	 * Perform last tests, allow others to talk to name
+	 * "com.example.broadcastA". So now receiving broadcasts
+	 * from it should succeed since the TALK policy allow it.
+	 */
+
+	/* KDBUS_POLICY_OWN for unprivileged connections */
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(holder_a,
+				       "com.example.broadcastA",
+				       &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Unprivileged is able to TALK to "com.example.broadcastA"
+	 * now so it will receive its broadcasts
+	 */
+	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
+					    DO_NOT_DROP, 0, 0);
+	ASSERT_RETURN(ret == 0);
+
+	++expected_cookie;
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
+					 NULL);
+		ASSERT_EXIT(ret >= 0);
+		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == 0);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	/* owner_a is privileged it will get the broadcast now. */
+	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* confirm the received cookie */
+	ASSERT_RETURN(msg->cookie == expected_cookie);
+
+	kdbus_msg_free(msg);
+
+	/*
+	 * owner_a released name "com.example.broadcastA". It should
+	 * receive broadcasts since it is still privileged and has
+	 * the right match.
+	 *
+	 * Unprivileged connection will own a name and will try to
+	 * signal to the privileged connection.
+	 */
+
+	ret = kdbus_name_release(owner_a, "com.example.broadcastA");
+	ASSERT_EXIT(ret >= 0);
+
+	++expected_cookie;
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
+					 NULL);
+		ASSERT_EXIT(ret >= 0);
+		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
+		ASSERT_EXIT(ret == 0);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	/* owner_a will get the broadcast now. */
+	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
+	ASSERT_RETURN(ret == 0);
+
+	/* confirm the received cookie */
+	ASSERT_RETURN(msg->cookie == expected_cookie);
+
+	kdbus_msg_free(msg);
+
+	kdbus_conn_free(owner_a);
+	kdbus_conn_free(owner_b);
+	kdbus_conn_free(holder_a);
+	kdbus_conn_free(holder_b);
+
+	return 0;
+}
+
+static int test_policy_priv(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn_a, *conn_b, *conn, *owner;
+	struct kdbus_policy_access access, *acc;
+	sigset_t sset;
+	size_t num;
+	int ret;
+
+	/*
+	 * Make sure we have CAP_SETUID/SETGID so we can drop privileges
+	 */
+
+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
+	ASSERT_RETURN(ret >= 0);
+
+	if (!ret)
+		return TEST_SKIP;
+
+	/* make sure that uids and gids are mapped */
+	if (!all_uids_gids_are_mapped())
+		return TEST_SKIP;
+
+	/*
+	 * Setup:
+	 *  conn_a: policy holder for com.example.a
+	 *  conn_b: name holder of com.example.b
+	 */
+
+	signal(SIGUSR1, nosig);
+	sigemptyset(&sset);
+	sigaddset(&sset, SIGUSR1);
+	sigprocmask(SIG_BLOCK, &sset, NULL);
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	/*
+	 * Before registering any policy holder, make sure that the
+	 * bus is secure by default. This test is necessary, it catches
+	 * several cases where old D-Bus was vulnerable.
+	 */
+
+	ret = test_priv_before_policy_upload(env);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Make sure unprivileged are not able to register policy
+	 * holders
+	 */
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
+		struct kdbus_conn *holder;
+
+		holder = kdbus_hello_registrar(env->buspath,
+					       "com.example.a", NULL, 0,
+					       KDBUS_HELLO_POLICY_HOLDER);
+		ASSERT_EXIT(holder == NULL && errno == EPERM);
+	}),
+	({ 0; }));
+	ASSERT_RETURN(ret == 0);
+
+
+	/* Register policy holder */
+
+	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn_a);
+
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_b);
+
+	ret = kdbus_name_acquire(conn_b, "com.example.b", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure bus-owners can always acquire names.
+	 */
+	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	kdbus_conn_free(conn);
+
+	/*
+	 * Make sure unprivileged users cannot acquire names with default
+	 * policy assigned.
+	 */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret < 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged users can acquire names if we make them
+	 * world-accessible.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = 0,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	/*
+	 * Make sure unprivileged/normal connections are not able
+	 * to update policies
+	 */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_conn_update_policy(unpriv, "com.example.a",
+					       &access, 1);
+		ASSERT_EXIT(ret == -EOPNOTSUPP);
+	}));
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged users can acquire names if we make them
+	 * gid-accessible. But only if the gid matches.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_GROUP,
+		.id = UNPRIV_GID,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_GROUP,
+		.id = 1,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret < 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged users can acquire names if we make them
+	 * uid-accessible. But only if the uid matches.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = UNPRIV_UID,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = 1,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret < 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged users cannot acquire names if no owner-policy
+	 * matches, even if SEE/TALK policies match.
+	 */
+
+	num = 4;
+	acc = (struct kdbus_policy_access[]){
+		{
+			.type = KDBUS_POLICY_ACCESS_GROUP,
+			.id = UNPRIV_GID,
+			.access = KDBUS_POLICY_SEE,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = UNPRIV_UID,
+			.access = KDBUS_POLICY_TALK,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_WORLD,
+			.id = 0,
+			.access = KDBUS_POLICY_TALK,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_WORLD,
+			.id = 0,
+			.access = KDBUS_POLICY_SEE,
+		},
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret < 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged users can acquire names if the only matching
+	 * policy is somewhere in the middle.
+	 */
+
+	num = 5;
+	acc = (struct kdbus_policy_access[]){
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 1,
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 2,
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = UNPRIV_UID,
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 3,
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 4,
+			.access = KDBUS_POLICY_OWN,
+		},
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Clear policies
+	 */
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", NULL, 0);
+	ASSERT_RETURN(ret == 0);
+
+	/*
+	 * Make sure privileged bus users can _always_ talk to others.
+	 */
+
+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_msg_send(conn, "com.example.b", 0xdeadbeef, 0, 0, 0, 0);
+	ASSERT_EXIT(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_b, 300, NULL, NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	kdbus_conn_free(conn);
+
+	/*
+	 * Make sure unprivileged bus users cannot talk by default.
+	 */
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users can talk to equals, even without
+	 * policy.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = UNPRIV_UID,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.c", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		struct kdbus_conn *owner;
+
+		owner = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(owner);
+
+		ret = kdbus_name_acquire(owner, "com.example.c", NULL);
+		ASSERT_EXIT(ret >= 0);
+
+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
+		ASSERT_EXIT(ret >= 0);
+
+		kdbus_conn_free(owner);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users can talk to privileged users if a
+	 * suitable UID policy is set.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = UNPRIV_UID,
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users can talk to privileged users if a
+	 * suitable GID policy is set.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_GROUP,
+		.id = UNPRIV_GID,
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users can talk to privileged users if a
+	 * suitable WORLD policy is set.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = 0,
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users cannot talk to privileged users if
+	 * no suitable policy is set.
+	 */
+
+	num = 5;
+	acc = (struct kdbus_policy_access[]){
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 0,
+			.access = KDBUS_POLICY_OWN,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 1,
+			.access = KDBUS_POLICY_TALK,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = UNPRIV_UID,
+			.access = KDBUS_POLICY_SEE,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 3,
+			.access = KDBUS_POLICY_TALK,
+		},
+		{
+			.type = KDBUS_POLICY_ACCESS_USER,
+			.id = 4,
+			.access = KDBUS_POLICY_TALK,
+		},
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", acc, num);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure unprivileged bus users can talk to privileged users if a
+	 * suitable OWN privilege overwrites TALK.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = 0,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
+	ASSERT_EXIT(ret >= 0);
+
+	/*
+	 * Make sure the TALK cache is reset correctly when policies are
+	 * updated.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = 0,
+		.access = KDBUS_POLICY_TALK,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+
+		ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
+		ASSERT_EXIT(ret >= 0);
+
+		ret = kdbus_conn_update_policy(conn_a, "com.example.b",
+					       NULL, 0);
+		ASSERT_RETURN(ret == 0);
+
+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret == -EPERM);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+	/*
+	 * Make sure the TALK cache is reset correctly when policy holders
+	 * disconnect.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_WORLD,
+		.id = 0,
+		.access = KDBUS_POLICY_OWN,
+	};
+
+	conn = kdbus_hello_registrar(env->buspath, "com.example.c",
+				     NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn);
+
+	ret = kdbus_conn_update_policy(conn, "com.example.c", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	owner = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(owner);
+
+	ret = kdbus_name_acquire(owner, "com.example.c", NULL);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
+		struct kdbus_conn *unpriv;
+
+		/* wait for parent to be finished */
+		sigemptyset(&sset);
+		ret = sigsuspend(&sset);
+		ASSERT_RETURN(ret == -1 && errno == EINTR);
+
+		unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(unpriv);
+
+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret >= 0);
+
+		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
+		ASSERT_EXIT(ret >= 0);
+
+		/* free policy holder */
+		kdbus_conn_free(conn);
+
+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
+				     0, 0);
+		ASSERT_EXIT(ret == -EPERM);
+
+		kdbus_conn_free(unpriv);
+	}), ({
+		/* make sure policy holder is only valid in child */
+		kdbus_conn_free(conn);
+		kill(pid, SIGUSR1);
+	}));
+	ASSERT_RETURN(ret >= 0);
+
+
+	/*
+	 * The following tests are necessary.
+	 */
+
+	ret = test_broadcast_after_policy_upload(env);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_conn_free(owner);
+
+	/*
+	 * cleanup resources
+	 */
+
+	kdbus_conn_free(conn_b);
+	kdbus_conn_free(conn_a);
+
+	return TEST_OK;
+}
+
+int kdbus_test_policy_priv(struct kdbus_test_env *env)
+{
+	pid_t pid;
+	int ret;
+
+	/* make sure to exit() if a child returns from fork() */
+	pid = getpid();
+	ret = test_policy_priv(env);
+	if (pid != getpid())
+		exit(1);
+
+	return ret;
+}
diff --git a/tools/testing/selftests/kdbus/test-policy.c b/tools/testing/selftests/kdbus/test-policy.c
new file mode 100644
index 000000000000..4eb6e65f96d1
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-policy.c
@@ -0,0 +1,81 @@
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+int kdbus_test_policy(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn_a, *conn_b;
+	struct kdbus_policy_access access;
+	int ret;
+
+	/* Invalid name */
+	conn_a = kdbus_hello_registrar(env->buspath, ".example.a",
+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn_a == NULL);
+
+	conn_a = kdbus_hello_registrar(env->buspath, "example",
+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn_a == NULL);
+
+	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn_a);
+
+	conn_b = kdbus_hello_registrar(env->buspath, "com.example.b",
+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
+	ASSERT_RETURN(conn_b);
+
+	/*
+	 * Verify there cannot be any duplicate entries, except for specific vs.
+	 * wildcard entries.
+	 */
+
+	access = (struct kdbus_policy_access){
+		.type = KDBUS_POLICY_ACCESS_USER,
+		.id = geteuid(),
+		.access = KDBUS_POLICY_SEE,
+	};
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == -EEXIST);
+
+	ret = kdbus_conn_update_policy(conn_b, "com.example.a.*", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.a.*", &access, 1);
+	ASSERT_RETURN(ret == -EEXIST);
+
+	ret = kdbus_conn_update_policy(conn_a, "com.example.*", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
+	ASSERT_RETURN(ret == 0);
+
+	ret = kdbus_conn_update_policy(conn_b, "com.example.*", &access, 1);
+	ASSERT_RETURN(ret == -EEXIST);
+
+	/* Invalid name */
+	ret = kdbus_conn_update_policy(conn_b, ".example.*", &access, 1);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	ret = kdbus_conn_update_policy(conn_b, "example", &access, 1);
+	ASSERT_RETURN(ret == -EINVAL);
+
+	kdbus_conn_free(conn_b);
+	kdbus_conn_free(conn_a);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-race.c b/tools/testing/selftests/kdbus/test-race.c
new file mode 100644
index 000000000000..b159711c13c1
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-race.c
@@ -0,0 +1,313 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/ioctl.h>
+#include <pthread.h>
+#include <stdbool.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+struct race_thread {
+	pthread_spinlock_t lock;
+	pthread_t thread;
+	int (*fn) (struct kdbus_test_env *env, void *ctx);
+	struct kdbus_test_env *env;
+	void *ctx;
+	int ret;
+};
+
+static void *race_thread_fn(void *data)
+{
+	struct race_thread *thread = data;
+	int ret;
+
+	ret = pthread_spin_lock(&thread->lock);
+	if (ret < 0)
+		goto error;
+
+	ret = thread->fn(thread->env, thread->ctx);
+	pthread_spin_unlock(&thread->lock);
+
+error:
+	return (void*)(long)ret;
+}
+
+static int race_thread_init(struct race_thread *thread)
+{
+	int ret;
+
+	ret = pthread_spin_init(&thread->lock, PTHREAD_PROCESS_PRIVATE);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = pthread_spin_lock(&thread->lock);
+	ASSERT_RETURN(ret >= 0);
+
+	ret = pthread_create(&thread->thread, NULL, race_thread_fn, thread);
+	ASSERT_RETURN(ret >= 0);
+
+	return TEST_OK;
+}
+
+static void race_thread_run(struct race_thread *thread,
+			    int (*fn)(struct kdbus_test_env *env, void *ctx),
+			    struct kdbus_test_env *env, void *ctx)
+{
+	int ret;
+
+	thread->fn = fn;
+	thread->env = env;
+	thread->ctx = ctx;
+
+	ret = pthread_spin_unlock(&thread->lock);
+	if (ret < 0)
+		abort();
+}
+
+static int race_thread_join(struct race_thread *thread)
+{
+	void *val = (void*)(long)-EFAULT;
+	int ret;
+
+	ret = pthread_join(thread->thread, &val);
+	ASSERT_RETURN(ret >= 0);
+
+	thread->ret = (long)val;
+
+	return TEST_OK;
+}
+
+static void shuffle(size_t *array, size_t n)
+{
+	size_t i, j, t;
+
+	if (n <= 1)
+		return;
+
+	for (i = 0; i < n - 1; i++) {
+		j = i + rand() / (RAND_MAX / (n - i) + 1);
+		t = array[j];
+		array[j] = array[i];
+		array[i] = t;
+	}
+}
+
+static int race_thread(int (*init_fn) (struct kdbus_test_env *env, void *ctx),
+		       int (*exit_fn) (struct kdbus_test_env *env, void *ctx,
+		                       int *ret, size_t n_ret),
+		       int (*verify_fn) (struct kdbus_test_env *env, void *ctx),
+		       int (**fns) (struct kdbus_test_env *env, void *ctx),
+		       size_t n_fns, struct kdbus_test_env *env, void *ctx,
+		       size_t runs)
+{
+	struct race_thread *t;
+	size_t i, num, *order;
+	int *ret, r;
+
+	t = calloc(sizeof(*t), n_fns);
+	ASSERT_RETURN(t != NULL);
+
+	ret = calloc(sizeof(*ret), n_fns);
+	ASSERT_RETURN(ret != NULL);
+
+	order = calloc(sizeof(*order), n_fns);
+	ASSERT_RETURN(order != NULL);
+
+	for (num = 0; num < runs; ++num) {
+		ASSERT_RETURN(init_fn(env, ctx) == TEST_OK);
+
+		for (i = 0; i < n_fns; ++i) {
+			ASSERT_RETURN(race_thread_init(&t[i]) == TEST_OK);
+			order[i] = i;
+		}
+
+		/* random order */
+		shuffle(order, n_fns);
+		for (i = 0; i < n_fns; ++i)
+			race_thread_run(&t[order[i]], fns[order[i]], env, ctx);
+
+		for (i = 0; i < n_fns; ++i) {
+			ASSERT_RETURN(race_thread_join(&t[i]) == TEST_OK);
+			ret[i] = t[i].ret;
+		}
+
+		ASSERT_RETURN(exit_fn(env, ctx, ret, n_fns) == TEST_OK);
+	}
+
+	r = verify_fn(env, ctx);
+	free(order);
+	free(ret);
+	free(t);
+	return r;
+}
+
+#define ASSERT_RACE(env, ctx, runs, init_fn, exit_fn, verify_fn, ...) ({\
+		int (*fns[])(struct kdbus_test_env*, void*) = {		\
+			__VA_ARGS__					\
+		};							\
+		size_t cnt = sizeof(fns) / sizeof(*fns);		\
+		race_thread(init_fn, exit_fn, verify_fn,		\
+				fns, cnt, env, ctx, runs);		\
+	})
+
+#define TEST_RACE2(_name_, _runs_, _ctx_, _a_, _b_, _init_, _exit_, _verify_)\
+	static int _name_ ## ___a(struct kdbus_test_env *env, void *_ctx)\
+	{								\
+		__attribute__((__unused__)) _ctx_ *ctx = _ctx;		\
+		_a_;							\
+		return TEST_OK;						\
+	}								\
+	static int _name_ ## ___b(struct kdbus_test_env *env, void *_ctx)\
+	{								\
+		__attribute__((__unused__)) _ctx_ *ctx = _ctx;		\
+		_b_;							\
+		return TEST_OK;						\
+	}								\
+	static int _name_ ## ___init(struct kdbus_test_env *env,	\
+				void *_ctx)				\
+	{								\
+		__attribute__((__unused__)) _ctx_ *ctx = _ctx;		\
+		_init_;							\
+		return TEST_OK;						\
+	}								\
+	static int _name_ ## ___exit(struct kdbus_test_env *env,	\
+				void *_ctx, int *ret, size_t n_ret)	\
+	{								\
+		__attribute__((__unused__)) _ctx_ *ctx = _ctx;		\
+		_exit_;							\
+		return TEST_OK;						\
+	}								\
+	static int _name_ ## ___verify(struct kdbus_test_env *env,	\
+				void *_ctx)				\
+	{								\
+		__attribute__((__unused__)) _ctx_ *ctx = _ctx;		\
+		_verify_;						\
+		return TEST_OK;						\
+	}								\
+	int _name_ (struct kdbus_test_env *env) {			\
+		_ctx_ ctx;						\
+		memset(&ctx, 0, sizeof(ctx));				\
+		return ASSERT_RACE(env, &ctx, _runs_,			\
+				_name_ ## ___init,			\
+				_name_ ## ___exit,			\
+				_name_ ## ___verify,			\
+				_name_ ## ___a,				\
+				_name_ ## ___b);			\
+	}
+
+/*
+ * Race Testing
+ * This file provides some rather trivial helpers to run multiple threads in
+ * parallel and test for races. You can define races with TEST_RACEX(), whereas
+ * 'X' is the number of threads you want. The arguments to this function should
+ * be code-blocks that are executed in the threads. Each code-block, if it
+ * does not contain a "return" statement, will implicitly return TEST_OK.
+ *
+ * The arguments are:
+ * @arg1: The name of the test to define
+ * @arg2: The number of runs
+ * @arg3: The datatype used as context across all test runs
+ * @arg4-@argN: The code-blocks for the threads to run.
+ * @argN+1: The code-block that is run before each test-run. Use it to
+ *          initialize your contexts.
+ * @argN+2: The code-block that is run after each test-run. Use it to verify
+ *          everything went as expected.
+ * @argN+3: The code-block that is executed after all runs are finished. Use it
+ *          to verify the sum of results.
+ *
+ * Each function has "env" and "ctx" as variables implicitly defined.
+ * Furthermore, the function executed after the tests were run can access "ret",
+ * which is an array of return values of all threads. "n_ret" is the number of
+ * threads.
+ *
+ * Race testing is kinda nasty if you cannot place breakpoints yourself.
+ * Therefore, we run each thread multiple times and allow you to verify the
+ * results of all test-runs after we're finished. Usually, we try to verify all
+ * possible outcomes happened. However, no-one can predict how the scheduler
+ * ran each thread, even if we run 10k times. Furthermore, the execution of all
+ * threads is randomized by us, so we cannot predict how they're run. Therefore,
+ * we only return TEST_SKIP in those cases. This is not a hard-failure, but
+ * signals test-runners that something went unexpected.
+ */
+
+/*
+ * We run BYEBYE in parallel in two threads. Only one of them is allowed to
+ * succeed, the other one *MUST* return -EALREADY.
+ */
+TEST_RACE2(kdbus_test_race_byebye, 100, int,
+	({
+		return ioctl(env->conn->fd, KDBUS_CMD_BYEBYE, 0) ? -errno : 0;
+	}),
+	({
+		return ioctl(env->conn->fd, KDBUS_CMD_BYEBYE, 0) ? -errno : 0;
+	}),
+	({
+		env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(env->conn);
+	}),
+	({
+		ASSERT_RETURN((ret[0] == 0 && ret[1] == -EALREADY) ||
+			      (ret[1] == 0 && ret[0] == -EALREADY));
+		kdbus_conn_free(env->conn);
+		env->conn = NULL;
+	}),
+	({
+	}))
+
+/*
+ * Run BYEBYE against MATCH_REMOVE. If BYEBYE is first, it returns 0 and
+ * MATCH_REMOVE must fail with ECONNRESET. If BYEBYE is last, it still succeeds
+ * but MATCH_REMOVE does, too.
+ * Run 10k times; at least on my machine it takes usually about ~100 runs to
+ * trigger ECONNRESET races.
+ */
+TEST_RACE2(kdbus_test_race_byebye_match, 10000,
+	struct {
+		bool res1:1;
+		bool res2:1;
+	},
+	({
+		return ioctl(env->conn->fd, KDBUS_CMD_BYEBYE, 0) ? -errno : 0;
+	}),
+	({
+		struct kdbus_cmd_match cmd = {};
+		int ret;
+
+		cmd.size = sizeof(cmd);
+		cmd.cookie = 0xdeadbeef;
+		ret = ioctl(env->conn->fd, KDBUS_CMD_MATCH_REMOVE, &cmd);
+		if (ret == 0 || errno == ENOENT)
+			return 0;
+
+		return -errno;
+	}),
+	({
+		env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
+		ASSERT_RETURN(env->conn);
+	}),
+	({
+		if (ret[0] == 0 && ret[1] == 0) {
+			/* MATCH_REMOVE ran first, then BYEBYE */
+			ctx->res1 = true;
+		} else if (ret[0] == 0 && ret[1] == -ECONNRESET) {
+			/* BYEBYE ran first, then MATCH_REMOVE failed */
+			ctx->res2 = true;
+		} else {
+			ASSERT_RETURN(0);
+		}
+
+		kdbus_conn_free(env->conn);
+		env->conn = NULL;
+	}),
+	({
+		if (!ctx->res1 || !ctx->res2)
+			return TEST_SKIP;
+	}))
diff --git a/tools/testing/selftests/kdbus/test-sync.c b/tools/testing/selftests/kdbus/test-sync.c
new file mode 100644
index 000000000000..464509fe19f1
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-sync.c
@@ -0,0 +1,368 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/ioctl.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <sys/wait.h>
+#include <sys/eventfd.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+static struct kdbus_conn *conn_a, *conn_b;
+static unsigned int cookie = 0xdeadbeef;
+
+static void nop_handler(int sig) {}
+
+static int interrupt_sync(struct kdbus_conn *conn_src,
+			  struct kdbus_conn *conn_dst)
+{
+	pid_t pid;
+	int ret, status;
+	struct kdbus_msg *msg = NULL;
+	struct sigaction sa = {
+		.sa_handler = nop_handler,
+		.sa_flags = SA_NOCLDSTOP|SA_RESTART,
+	};
+
+	cookie++;
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		ret = sigaction(SIGINT, &sa, NULL);
+		ASSERT_EXIT(ret == 0);
+
+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
+					  KDBUS_MSG_EXPECT_REPLY,
+					  100000000ULL, 0, conn_src->id, -1);
+		ASSERT_EXIT(ret == -ETIMEDOUT);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	ret = kill(pid, SIGINT);
+	ASSERT_RETURN_VAL(ret == 0, ret);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	if (WIFSIGNALED(status))
+		return TEST_ERR;
+
+	ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
+	ASSERT_RETURN(ret == -ETIMEDOUT);
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+static int close_epipe_sync(const char *bus)
+{
+	pid_t pid;
+	int ret, status;
+	struct kdbus_conn *conn_src;
+	struct kdbus_conn *conn_dst;
+	struct kdbus_msg *msg = NULL;
+
+	conn_src = kdbus_hello(bus, 0, NULL, 0);
+	ASSERT_RETURN(conn_src);
+
+	ret = kdbus_add_match_empty(conn_src);
+	ASSERT_RETURN(ret == 0);
+
+	conn_dst = kdbus_hello(bus, 0, NULL, 0);
+	ASSERT_RETURN(conn_dst);
+
+	cookie++;
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		uint64_t dst_id;
+
+		/* close our reference */
+		dst_id = conn_dst->id;
+		kdbus_conn_free(conn_dst);
+
+		ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
+		ASSERT_EXIT(ret == 0 && msg->cookie == cookie);
+		ASSERT_EXIT(msg->src_id == dst_id);
+
+		cookie++;
+		ret = kdbus_msg_send_sync(conn_src, NULL, cookie,
+					  KDBUS_MSG_EXPECT_REPLY,
+					  100000000ULL, 0, dst_id, -1);
+		ASSERT_EXIT(ret == -EPIPE);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = kdbus_msg_send(conn_dst, NULL, cookie, 0, 0, 0,
+			     KDBUS_DST_ID_BROADCAST);
+	ASSERT_RETURN(ret == 0);
+
+	cookie++;
+	ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	/* destroy connection */
+	kdbus_conn_free(conn_dst);
+	kdbus_conn_free(conn_src);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	if (!WIFEXITED(status))
+		return TEST_ERR;
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+static int cancel_fd_sync(struct kdbus_conn *conn_src,
+			  struct kdbus_conn *conn_dst)
+{
+	pid_t pid;
+	int cancel_fd;
+	int ret, status;
+	uint64_t counter = 1;
+	struct kdbus_msg *msg = NULL;
+
+	cancel_fd = eventfd(0, 0);
+	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
+
+	cookie++;
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
+					  KDBUS_MSG_EXPECT_REPLY,
+					  100000000ULL, 0, conn_src->id,
+					  cancel_fd);
+		ASSERT_EXIT(ret == -ECANCELED);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
+
+	kdbus_msg_free(msg);
+
+	ret = write(cancel_fd, &counter, sizeof(counter));
+	ASSERT_RETURN(ret == sizeof(counter));
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	if (WIFSIGNALED(status))
+		return TEST_ERR;
+
+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
+}
+
+static int no_cancel_sync(struct kdbus_conn *conn_src,
+			  struct kdbus_conn *conn_dst)
+{
+	pid_t pid;
+	int cancel_fd;
+	int ret, status;
+	struct kdbus_msg *msg = NULL;
+
+	/* pass eventfd, but never signal it so it shouldn't have any effect */
+
+	cancel_fd = eventfd(0, 0);
+	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
+
+	cookie++;
+	pid = fork();
+	ASSERT_RETURN_VAL(pid >= 0, pid);
+
+	if (pid == 0) {
+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
+					  KDBUS_MSG_EXPECT_REPLY,
+					  100000000ULL, 0, conn_src->id,
+					  cancel_fd);
+		ASSERT_EXIT(ret == 0);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
+	ASSERT_RETURN_VAL(ret == 0 && msg->cookie == cookie, -1);
+
+	kdbus_msg_free(msg);
+
+	ret = kdbus_msg_send_reply(conn_src, cookie, conn_dst->id);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	ret = waitpid(pid, &status, 0);
+	ASSERT_RETURN_VAL(ret >= 0, ret);
+
+	if (WIFSIGNALED(status))
+		return -1;
+
+	return (status == EXIT_SUCCESS) ? 0 : -1;
+}
+
+static void *run_thread_reply(void *data)
+{
+	int ret;
+	unsigned long status = TEST_OK;
+
+	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
+	if (ret < 0)
+		goto exit_thread;
+
+	kdbus_printf("Thread received message, sending reply ...\n");
+
+	/* using an unknown cookie must fail */
+	ret = kdbus_msg_send_reply(conn_a, ~cookie, conn_b->id);
+	if (ret != -EPERM) {
+		status = TEST_ERR;
+		goto exit_thread;
+	}
+
+	ret = kdbus_msg_send_reply(conn_a, cookie, conn_b->id);
+	if (ret != 0) {
+		status = TEST_ERR;
+		goto exit_thread;
+	}
+
+exit_thread:
+	pthread_exit(NULL);
+	return (void *) status;
+}
+
+int kdbus_test_sync_reply(struct kdbus_test_env *env)
+{
+	unsigned long status;
+	pthread_t thread;
+	int ret;
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	pthread_create(&thread, NULL, run_thread_reply, NULL);
+
+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
+				  KDBUS_MSG_EXPECT_REPLY,
+				  5000000000ULL, 0, conn_a->id, -1);
+
+	pthread_join(thread, (void *) &status);
+	ASSERT_RETURN(status == 0);
+	ASSERT_RETURN(ret == 0);
+
+	ret = interrupt_sync(conn_a, conn_b);
+	ASSERT_RETURN(ret == 0);
+
+	ret = close_epipe_sync(env->buspath);
+	ASSERT_RETURN(ret == 0);
+
+	ret = cancel_fd_sync(conn_a, conn_b);
+	ASSERT_RETURN(ret == 0);
+
+	ret = no_cancel_sync(conn_a, conn_b);
+	ASSERT_RETURN(ret == 0);
+
+	kdbus_printf("-- closing bus connections\n");
+
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	return TEST_OK;
+}
+
+#define BYEBYE_ME ((void*)0L)
+#define BYEBYE_THEM ((void*)1L)
+
+static void *run_thread_byebye(void *data)
+{
+	int ret;
+
+	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
+	if (ret == 0) {
+		kdbus_printf("Thread received message, invoking BYEBYE ...\n");
+		kdbus_msg_recv(conn_a, NULL, NULL);
+		if (data == BYEBYE_ME)
+			ioctl(conn_b->fd, KDBUS_CMD_BYEBYE, 0);
+		else if (data == BYEBYE_THEM)
+			ioctl(conn_a->fd, KDBUS_CMD_BYEBYE, 0);
+	}
+
+	pthread_exit(NULL);
+	return NULL;
+}
+
+int kdbus_test_sync_byebye(struct kdbus_test_env *env)
+{
+	pthread_t thread;
+	int ret;
+
+	/*
+	 * This sends a synchronous message to a thread, which waits until it
+	 * received the message and then invokes BYEBYE on the *ORIGINAL*
+	 * connection. That is, on the same connection that synchronously waits
+	 * for an reply.
+	 * This should properly wake the connection up and cause ECONNRESET as
+	 * the connection is disconnected now.
+	 *
+	 * The second time, we do the same but invoke BYEBYE on the *TARGET*
+	 * connection. This should also wake up the synchronous sender as the
+	 * reply cannot be sent by a disconnected target.
+	 */
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_ME);
+
+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
+				  KDBUS_MSG_EXPECT_REPLY,
+				  5000000000ULL, 0, conn_a->id, -1);
+
+	ASSERT_RETURN(ret == -ECONNRESET);
+
+	pthread_join(thread, NULL);
+
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_THEM);
+
+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
+				  KDBUS_MSG_EXPECT_REPLY,
+				  5000000000ULL, 0, conn_a->id, -1);
+
+	ASSERT_RETURN(ret == -EPIPE);
+
+	pthread_join(thread, NULL);
+
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	return TEST_OK;
+}
diff --git a/tools/testing/selftests/kdbus/test-timeout.c b/tools/testing/selftests/kdbus/test-timeout.c
new file mode 100644
index 000000000000..0c66e79a75e4
--- /dev/null
+++ b/tools/testing/selftests/kdbus/test-timeout.c
@@ -0,0 +1,99 @@
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <errno.h>
+#include <assert.h>
+#include <poll.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+
+#include "kdbus-test.h"
+#include "kdbus-util.h"
+#include "kdbus-enum.h"
+
+int timeout_msg_recv(struct kdbus_conn *conn, uint64_t *expected)
+{
+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
+	struct kdbus_msg *msg;
+	int ret;
+
+	ret = ioctl(conn->fd, KDBUS_CMD_RECV, &recv);
+	if (ret < 0) {
+		kdbus_printf("error receiving message: %d (%m)\n", ret);
+		return -errno;
+	}
+
+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
+
+	ASSERT_RETURN_VAL(msg->payload_type == KDBUS_PAYLOAD_KERNEL, -EINVAL);
+	ASSERT_RETURN_VAL(msg->src_id == KDBUS_SRC_ID_KERNEL, -EINVAL);
+	ASSERT_RETURN_VAL(msg->dst_id == conn->id, -EINVAL);
+
+	*expected &= ~(1ULL << msg->cookie_reply);
+	kdbus_printf("Got message timeout for cookie %llu\n",
+		     msg->cookie_reply);
+
+	ret = kdbus_free(conn, recv.msg.offset);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+int kdbus_test_timeout(struct kdbus_test_env *env)
+{
+	struct kdbus_conn *conn_a, *conn_b;
+	struct pollfd fd;
+	int ret, i, n_msgs = 4;
+	uint64_t expected = 0;
+	uint64_t cookie = 0xdeadbeef;
+
+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
+	ASSERT_RETURN(conn_a && conn_b);
+
+	fd.fd = conn_b->fd;
+
+	/*
+	 * send messages that expect a reply (within 100 msec),
+	 * but never answer it.
+	 */
+	for (i = 0; i < n_msgs; i++, cookie++) {
+		kdbus_printf("Sending message with cookie %llu ...\n",
+			     (unsigned long long)cookie);
+		ASSERT_RETURN(kdbus_msg_send(conn_b, NULL, cookie,
+			      KDBUS_MSG_EXPECT_REPLY,
+			      (i + 1) * 100ULL * 1000000ULL, 0,
+			      conn_a->id) == 0);
+		expected |= 1ULL << cookie;
+	}
+
+	for (;;) {
+		fd.events = POLLIN | POLLPRI | POLLHUP;
+		fd.revents = 0;
+
+		ret = poll(&fd, 1, (n_msgs + 1) * 100);
+		if (ret == 0)
+			kdbus_printf("--- timeout\n");
+		if (ret <= 0)
+			break;
+
+		if (fd.revents & POLLIN)
+			ASSERT_RETURN(!timeout_msg_recv(conn_b, &expected));
+
+		if (expected == 0)
+			break;
+	}
+
+	ASSERT_RETURN(expected == 0);
+
+	kdbus_conn_free(conn_a);
+	kdbus_conn_free(conn_b);
+
+	return TEST_OK;
+}
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-16 22:07   ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-16 22:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-api,
	Linux-Kernel@Vger. Kernel. Org, daniel, David Herrmann, tixxdz

On Fri, Jan 16, 2015 at 2:16 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This can also be found in a git tree, the kdbus branch of char-misc.git at:
>         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/

Is this now the canonical tree?  I ask because the github tree hasn't
been updated in quite some time.  The code.google.com tree has commits
from 2 days ago, but it still calls d_materialise_unique in fs.c
whereas the patchset you've posted uses the correct d_splice_alias.
So the code.google.com tree doesn't actually compile against 3.19-rcX.

I'm confused where we're supposed to track things now.

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-16 22:07   ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-16 22:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Linux-Kernel@Vger. Kernel. Org,
	daniel-cYrQPVfZooxQFI55V6+gNQ, David Herrmann,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Fri, Jan 16, 2015 at 2:16 PM, Greg Kroah-Hartman
<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> wrote:
> This can also be found in a git tree, the kdbus branch of char-misc.git at:
>         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/

Is this now the canonical tree?  I ask because the github tree hasn't
been updated in quite some time.  The code.google.com tree has commits
from 2 days ago, but it still calls d_materialise_unique in fs.c
whereas the patchset you've posted uses the correct d_splice_alias.
So the code.google.com tree doesn't actually compile against 3.19-rcX.

I'm confused where we're supposed to track things now.

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-16 22:07   ` Josh Boyer
  (?)
@ 2015-01-16 22:18   ` Greg Kroah-Hartman
  2015-01-17  0:26       ` Daniel Mack
  -1 siblings, 1 reply; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-16 22:18 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-api,
	Linux-Kernel@Vger. Kernel. Org, daniel, David Herrmann, tixxdz

On Fri, Jan 16, 2015 at 05:07:25PM -0500, Josh Boyer wrote:
> On Fri, Jan 16, 2015 at 2:16 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > This can also be found in a git tree, the kdbus branch of char-misc.git at:
> >         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/
> 
> Is this now the canonical tree?  I ask because the github tree hasn't
> been updated in quite some time.

That's my fault, it's just a mirror of the code.google.com tree for
people who like to use github.  It's now updated.

> The code.google.com tree has commits
> from 2 days ago, but it still calls d_materialise_unique in fs.c
> whereas the patchset you've posted uses the correct d_splice_alias.
> So the code.google.com tree doesn't actually compile against 3.19-rcX.
> 
> I'm confused where we're supposed to track things now.

I think the google tree is the "correct" one, but when generating
patches, apis are tweaked to work properly with the latest -rc kernel
for submission.

Daniel knows more though, he's the one generating the patchsets, I don't
know how he's doing it exactly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-17  0:26       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-17  0:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Josh Boyer
  Cc: Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski, linux-api,
	Linux-Kernel@Vger. Kernel. Org, David Herrmann, tixxdz

Hi Josh,

On 01/16/2015 11:18 PM, Greg Kroah-Hartman wrote:
> On Fri, Jan 16, 2015 at 05:07:25PM -0500, Josh Boyer wrote:
>> The code.google.com tree has commits
>> from 2 days ago, but it still calls d_materialise_unique in fs.c
>> whereas the patchset you've posted uses the correct d_splice_alias.
>> So the code.google.com tree doesn't actually compile against 3.19-rcX.
>>
>> I'm confused where we're supposed to track things now.

The code.google.com repository is where we do all the development, and
the code is made to build external kernel modules for 3.18. The patches
sent in this series are meant for 3.19 and 3.20 kernels, and while they
are based on the exact same sources, the patches differ in the following
minor details:

 * Code is moved to appropriate locations, such as ipc/kdbus,
   include/uapi, tools/testing/selftests/kdbus/, Documentation/ etc.

 * Include file location amendments, "kdbus.h" vs. <uapi/linux/kdbus.h>

 * Added iov_iter_kvec() usage, as that's a new API in v3.19

 * The file system magic number is moved to include/uapi/linux/magic.h

 * d_materialise_unique() is renamed to d_splice_alias() to catch up
   with changes in 3.19


The commit this patch set is based on is tagged as "lkml-v3" in the
upstream repository now.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-17  0:26       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-17  0:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Josh Boyer
  Cc: Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Linux-Kernel@Vger. Kernel. Org,
	David Herrmann, tixxdz-Umm1ozX2/EEdnm+yROfE0A

Hi Josh,

On 01/16/2015 11:18 PM, Greg Kroah-Hartman wrote:
> On Fri, Jan 16, 2015 at 05:07:25PM -0500, Josh Boyer wrote:
>> The code.google.com tree has commits
>> from 2 days ago, but it still calls d_materialise_unique in fs.c
>> whereas the patchset you've posted uses the correct d_splice_alias.
>> So the code.google.com tree doesn't actually compile against 3.19-rcX.
>>
>> I'm confused where we're supposed to track things now.

The code.google.com repository is where we do all the development, and
the code is made to build external kernel modules for 3.18. The patches
sent in this series are meant for 3.19 and 3.20 kernels, and while they
are based on the exact same sources, the patches differ in the following
minor details:

 * Code is moved to appropriate locations, such as ipc/kdbus,
   include/uapi, tools/testing/selftests/kdbus/, Documentation/ etc.

 * Include file location amendments, "kdbus.h" vs. <uapi/linux/kdbus.h>

 * Added iov_iter_kvec() usage, as that's a new API in v3.19

 * The file system magic number is moved to include/uapi/linux/magic.h

 * d_materialise_unique() is renamed to d_splice_alias() to catch up
   with changes in 3.19


The commit this patch set is based on is tagged as "lkml-v3" in the
upstream repository now.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-17  0:41         ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-17  0:41 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-api, Linux-Kernel@Vger. Kernel. Org, David Herrmann,
	tixxdz

On Fri, Jan 16, 2015 at 7:26 PM, Daniel Mack <daniel@zonque.org> wrote:
> Hi Josh,
>
> On 01/16/2015 11:18 PM, Greg Kroah-Hartman wrote:
>> On Fri, Jan 16, 2015 at 05:07:25PM -0500, Josh Boyer wrote:
>>> The code.google.com tree has commits
>>> from 2 days ago, but it still calls d_materialise_unique in fs.c
>>> whereas the patchset you've posted uses the correct d_splice_alias.
>>> So the code.google.com tree doesn't actually compile against 3.19-rcX.
>>>
>>> I'm confused where we're supposed to track things now.
>
> The code.google.com repository is where we do all the development, and
> the code is made to build external kernel modules for 3.18. The patches
> sent in this series are meant for 3.19 and 3.20 kernels, and while they
> are based on the exact same sources, the patches differ in the following
> minor details:
>
>  * Code is moved to appropriate locations, such as ipc/kdbus,
>    include/uapi, tools/testing/selftests/kdbus/, Documentation/ etc.
>
>  * Include file location amendments, "kdbus.h" vs. <uapi/linux/kdbus.h>
>
>  * Added iov_iter_kvec() usage, as that's a new API in v3.19
>
>  * The file system magic number is moved to include/uapi/linux/magic.h
>
>  * d_materialise_unique() is renamed to d_splice_alias() to catch up
>    with changes in 3.19
>
>
> The commit this patch set is based on is tagged as "lkml-v3" in the
> upstream repository now.

OK, thanks for the explanation.

I wonder if it would be possible to have branches for each kernel
version in the code.google.com repo?  I build kdbus against a number
of kernels and trying to chase down different repos and patch sets to
match might get to be a chore.  I mean, I'm all for doing closet
development to get stuff ready for upstream but it's hard for others
to keep up when the closet keeps moving :)

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-17  0:41         ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-17  0:41 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Linux-Kernel@Vger. Kernel. Org,
	David Herrmann, tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Fri, Jan 16, 2015 at 7:26 PM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> Hi Josh,
>
> On 01/16/2015 11:18 PM, Greg Kroah-Hartman wrote:
>> On Fri, Jan 16, 2015 at 05:07:25PM -0500, Josh Boyer wrote:
>>> The code.google.com tree has commits
>>> from 2 days ago, but it still calls d_materialise_unique in fs.c
>>> whereas the patchset you've posted uses the correct d_splice_alias.
>>> So the code.google.com tree doesn't actually compile against 3.19-rcX.
>>>
>>> I'm confused where we're supposed to track things now.
>
> The code.google.com repository is where we do all the development, and
> the code is made to build external kernel modules for 3.18. The patches
> sent in this series are meant for 3.19 and 3.20 kernels, and while they
> are based on the exact same sources, the patches differ in the following
> minor details:
>
>  * Code is moved to appropriate locations, such as ipc/kdbus,
>    include/uapi, tools/testing/selftests/kdbus/, Documentation/ etc.
>
>  * Include file location amendments, "kdbus.h" vs. <uapi/linux/kdbus.h>
>
>  * Added iov_iter_kvec() usage, as that's a new API in v3.19
>
>  * The file system magic number is moved to include/uapi/linux/magic.h
>
>  * d_materialise_unique() is renamed to d_splice_alias() to catch up
>    with changes in 3.19
>
>
> The commit this patch set is based on is tagged as "lkml-v3" in the
> upstream repository now.

OK, thanks for the explanation.

I wonder if it would be possible to have branches for each kernel
version in the code.google.com repo?  I build kdbus against a number
of kernels and trying to chase down different repos and patch sets to
match might get to be a chore.  I mean, I'm all for doing closet
development to get stuff ready for upstream but it's hard for others
to keep up when the closet keeps moving :)

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 18:06   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 18:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

Hi Greg and Daniel,

I don't have a clue so I need to ask some stupid questions...

On Fri, Jan 16, 2015 at 11:16:04AM -0800, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.

How about the big picture?

> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
[abbreviated]
> - performance
> - security
> - semantics for apps with heavy data payloads


First of all I wonder about the relationship with D-Bus.
http://dbus.freedesktop.org/doc/dbus-specification.html says:

   D-Bus is designed for two specific use cases:

       A "system bus" for notifications from the system to user
       sessions, and to allow the system to request input from
       user sessions.

       A "session bus" used to implement desktop environments such
       as GNOME and KDE. 

   D-Bus is not intended to be a generic IPC system for any
   possible application, and intentionally omits many features
   found in other IPC systems for this reason. 

Does this also apply to kdbus?  If not, what are the
suggested uses of kdbus beyond those where D-Bus is
currently used?

Another related quote by Havoc Pennington:
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

   In general, reading this, I think in some cases there are
   problems that make sense to fix in dbus, and in other cases
   there are problems that are best solved by not using dbus.
   ...
   there are about 10000 IPC solutions already, from ICE (both of
   them) to ZeroMQ to AMQP to CORBA to X11 to HTTP to SOAP to
   WebSockets to SUN-RPC to whatever-the-heck. To me, trying to
   make dbus configurable so that it can substitute for any of
   these is a Bad Idea (tm).

Do you think it also applies to kdbus?


Wrt the performance improvement achieved by kdbus, my impression
about D-Bus is that the number of messages on my system is
about a dozen per minute.  Are there actually any existing
applications using D-Bus that have a performance issue?
Or is this only about future possible uses?


Linked from http://kroah.com/log/blog/2014/01/15/kdbus-details/,
http://lwn.net/Articles/580194/ "The unveiling of kdbus" says:

   Unlike most other kernels, Linux has never had a well-designed
   IPC mechanism. Windows and Mac OS have this feature; even
   Android, based on Linux, has one in the form of the "binder"
   subsystem. Linux, instead, has only had the primitives —
   sockets, FIFOs, and shared memory — but those have never been
   knitted together into a reasonable application-level API. Kdbus
   is an attempt to do that knitting and create something that is
   at least as good as the mechanisms found on other systems.

These are bold words. I'm not sure what Windows and Mac OS
have in terms of IPC, but the above suggests that kdbus
is *the* new Linux IPC that everyone will use for everything,
rather than a special purpose facility.
True?


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 18:06   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 18:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

Hi Greg and Daniel,

I don't have a clue so I need to ask some stupid questions...

On Fri, Jan 16, 2015 at 11:16:04AM -0800, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.

How about the big picture?

> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
[abbreviated]
> - performance
> - security
> - semantics for apps with heavy data payloads


First of all I wonder about the relationship with D-Bus.
http://dbus.freedesktop.org/doc/dbus-specification.html says:

   D-Bus is designed for two specific use cases:

       A "system bus" for notifications from the system to user
       sessions, and to allow the system to request input from
       user sessions.

       A "session bus" used to implement desktop environments such
       as GNOME and KDE. 

   D-Bus is not intended to be a generic IPC system for any
   possible application, and intentionally omits many features
   found in other IPC systems for this reason. 

Does this also apply to kdbus?  If not, what are the
suggested uses of kdbus beyond those where D-Bus is
currently used?

Another related quote by Havoc Pennington:
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

   In general, reading this, I think in some cases there are
   problems that make sense to fix in dbus, and in other cases
   there are problems that are best solved by not using dbus.
   ...
   there are about 10000 IPC solutions already, from ICE (both of
   them) to ZeroMQ to AMQP to CORBA to X11 to HTTP to SOAP to
   WebSockets to SUN-RPC to whatever-the-heck. To me, trying to
   make dbus configurable so that it can substitute for any of
   these is a Bad Idea (tm).

Do you think it also applies to kdbus?


Wrt the performance improvement achieved by kdbus, my impression
about D-Bus is that the number of messages on my system is
about a dozen per minute.  Are there actually any existing
applications using D-Bus that have a performance issue?
Or is this only about future possible uses?


Linked from http://kroah.com/log/blog/2014/01/15/kdbus-details/,
http://lwn.net/Articles/580194/ "The unveiling of kdbus" says:

   Unlike most other kernels, Linux has never had a well-designed
   IPC mechanism. Windows and Mac OS have this feature; even
   Android, based on Linux, has one in the form of the "binder"
   subsystem. Linux, instead, has only had the primitives —
   sockets, FIFOs, and shared memory — but those have never been
   knitted together into a reasonable application-level API. Kdbus
   is an attempt to do that knitting and create something that is
   at least as good as the mechanisms found on other systems.

These are bold words. I'm not sure what Windows and Mac OS
have in terms of IPC, but the above suggests that kdbus
is *the* new Linux IPC that everyone will use for everything,
rather than a special purpose facility.
True?


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 18:33   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, Daniel Mack, dh.herrmann, tixxdz

(resend, fix Daniel's email address)

Hi Greg and Daniel,

I don't have a clue so I need to ask some stupid questions...

On Fri, Jan 16, 2015 at 11:16:04AM -0800, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.

How about the big picture?

> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
[abbreviated]
> - performance
> - security
> - semantics for apps with heavy data payloads


First of all I wonder about the relationship with D-Bus.
http://dbus.freedesktop.org/doc/dbus-specification.html says:

   D-Bus is designed for two specific use cases:

       A "system bus" for notifications from the system to user
       sessions, and to allow the system to request input from
       user sessions.

       A "session bus" used to implement desktop environments such
       as GNOME and KDE. 

   D-Bus is not intended to be a generic IPC system for any
   possible application, and intentionally omits many features
   found in other IPC systems for this reason. 

Does this also apply to kdbus?  If not, what are the
suggested uses of kdbus beyond those where D-Bus is
currently used?

Another related quote by Havoc Pennington:
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

   In general, reading this, I think in some cases there are
   problems that make sense to fix in dbus, and in other cases
   there are problems that are best solved by not using dbus.
   ...
   there are about 10000 IPC solutions already, from ICE (both of
   them) to ZeroMQ to AMQP to CORBA to X11 to HTTP to SOAP to
   WebSockets to SUN-RPC to whatever-the-heck. To me, trying to
   make dbus configurable so that it can substitute for any of
   these is a Bad Idea (tm).

Do you think it also applies to kdbus?


Wrt the performance improvement achieved by kdbus, my impression
about D-Bus is that the number of messages on my system is
about a dozen per minute.  Are there actually any existing
applications using D-Bus that have a performance issue?
Or is this only about future possible uses?


Linked from http://kroah.com/log/blog/2014/01/15/kdbus-details/,
http://lwn.net/Articles/580194/ "The unveiling of kdbus" says:

   Unlike most other kernels, Linux has never had a well-designed
   IPC mechanism. Windows and Mac OS have this feature; even
   Android, based on Linux, has one in the form of the "binder"
   subsystem. Linux, instead, has only had the primitives —
   sockets, FIFOs, and shared memory — but those have never been
   knitted together into a reasonable application-level API. Kdbus
   is an attempt to do that knitting and create something that is
   at least as good as the mechanisms found on other systems.

These are bold words. I'm not sure what Windows and Mac OS
have in terms of IPC, but the above suggests that kdbus
is *the* new Linux IPC that everyone will use for everything,
rather than a special purpose facility.
True?


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 18:33   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Daniel Mack,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

(resend, fix Daniel's email address)

Hi Greg and Daniel,

I don't have a clue so I need to ask some stupid questions...

On Fri, Jan 16, 2015 at 11:16:04AM -0800, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.

How about the big picture?

> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
[abbreviated]
> - performance
> - security
> - semantics for apps with heavy data payloads


First of all I wonder about the relationship with D-Bus.
http://dbus.freedesktop.org/doc/dbus-specification.html says:

   D-Bus is designed for two specific use cases:

       A "system bus" for notifications from the system to user
       sessions, and to allow the system to request input from
       user sessions.

       A "session bus" used to implement desktop environments such
       as GNOME and KDE. 

   D-Bus is not intended to be a generic IPC system for any
   possible application, and intentionally omits many features
   found in other IPC systems for this reason. 

Does this also apply to kdbus?  If not, what are the
suggested uses of kdbus beyond those where D-Bus is
currently used?

Another related quote by Havoc Pennington:
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

   In general, reading this, I think in some cases there are
   problems that make sense to fix in dbus, and in other cases
   there are problems that are best solved by not using dbus.
   ...
   there are about 10000 IPC solutions already, from ICE (both of
   them) to ZeroMQ to AMQP to CORBA to X11 to HTTP to SOAP to
   WebSockets to SUN-RPC to whatever-the-heck. To me, trying to
   make dbus configurable so that it can substitute for any of
   these is a Bad Idea (tm).

Do you think it also applies to kdbus?


Wrt the performance improvement achieved by kdbus, my impression
about D-Bus is that the number of messages on my system is
about a dozen per minute.  Are there actually any existing
applications using D-Bus that have a performance issue?
Or is this only about future possible uses?


Linked from http://kroah.com/log/blog/2014/01/15/kdbus-details/,
http://lwn.net/Articles/580194/ "The unveiling of kdbus" says:

   Unlike most other kernels, Linux has never had a well-designed
   IPC mechanism. Windows and Mac OS have this feature; even
   Android, based on Linux, has one in the form of the "binder"
   subsystem. Linux, instead, has only had the primitives —
   sockets, FIFOs, and shared memory — but those have never been
   knitted together into a reasonable application-level API. Kdbus
   is an attempt to do that knitting and create something that is
   at least as good as the mechanisms found on other systems.

These are bold words. I'm not sure what Windows and Mac OS
have in terms of IPC, but the above suggests that kdbus
is *the* new Linux IPC that everyone will use for everything,
rather than a special purpose facility.
True?


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-19 18:06   ` Johannes Stezenbach
  (?)
@ 2015-01-19 18:38   ` Greg Kroah-Hartman
  2015-01-19 20:19       ` Johannes Stezenbach
  -1 siblings, 1 reply; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-19 18:38 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Jan 19, 2015 at 07:06:42PM +0100, Johannes Stezenbach wrote:
> Hi Greg and Daniel,

[Fixing Daniel's email, which I messed up originally...]

> On Fri, Jan 16, 2015 at 11:16:04AM -0800, Greg Kroah-Hartman wrote:
> > kdbus is a kernel-level IPC implementation that aims for resemblance to
> > the the protocol layer with the existing userspace D-Bus daemon while
> > enabling some features that couldn't be implemented before in userspace.
> > 
> > The documentation in the first patch in this series explains the
> > protocol and the API details.
> 
> How about the big picture?
> 
> > Reasons why this should be done in the kernel, instead of userspace as
> > it is currently done today include the following:
> [abbreviated]
> > - performance
> > - security
> > - semantics for apps with heavy data payloads
> 
> 
> First of all I wonder about the relationship with D-Bus.
> http://dbus.freedesktop.org/doc/dbus-specification.html says:
> 
>    D-Bus is designed for two specific use cases:
> 
>        A "system bus" for notifications from the system to user
>        sessions, and to allow the system to request input from
>        user sessions.
> 
>        A "session bus" used to implement desktop environments such
>        as GNOME and KDE. 
> 
>    D-Bus is not intended to be a generic IPC system for any
>    possible application, and intentionally omits many features
>    found in other IPC systems for this reason. 
> 
> Does this also apply to kdbus?  If not, what are the
> suggested uses of kdbus beyond those where D-Bus is
> currently used?

I don't really know.  I have heard from lots of random people who are
starting to look into kdbus as to if it will work for their use cases,
which seem quite varied.  I'll leave it to them to pop up and say if it
will work for them outside of the above specific ways.  But even then,
the above two things are something almost all Linux boxes rely on today,
so it's not like this is a solution searching for a problem to solve :)

> Another related quote by Havoc Pennington:
> http://lists.freedesktop.org/archives/dbus/2012-March/015024.html
> 
>    In general, reading this, I think in some cases there are
>    problems that make sense to fix in dbus, and in other cases
>    there are problems that are best solved by not using dbus.
>    ...
>    there are about 10000 IPC solutions already, from ICE (both of
>    them) to ZeroMQ to AMQP to CORBA to X11 to HTTP to SOAP to
>    WebSockets to SUN-RPC to whatever-the-heck. To me, trying to
>    make dbus configurable so that it can substitute for any of
>    these is a Bad Idea (tm).
> 
> Do you think it also applies to kdbus?

Yes, I do agree, there are lots of existing ipc solutions today that
kdbus is not designed for, nor would it be good to use it for.  The
majority of them being IPC that crosses the network layer, as there are
lots of good solutions today for that problem.  That being said, I do
know one research group that has kdbus working cross-network, just "to
try it out", but I don't know what ever came of it.

> Wrt the performance improvement achieved by kdbus, my impression
> about D-Bus is that the number of messages on my system is
> about a dozen per minute.  Are there actually any existing
> applications using D-Bus that have a performance issue?
> Or is this only about future possible uses?

There are a number of existing applications that have this performance
issue today.  The majority of them have been ported from other operating
systems that have a fast message bus, so their process model is all
about messages.  They use a library layer on Linux to turn that message
bus into D-Bus messages, and have suffered a huge hit in performance
from their previous operating system.  Using kdbus has brought it back
in line to make it reasonable to use.

These applications can be usually found in the Automotive sector, which
has been playing with light-weight dbus library implementations for a
while now, and have done some initial kdbus testing to verify this will
work for them.

> Linked from http://kroah.com/log/blog/2014/01/15/kdbus-details/,
> http://lwn.net/Articles/580194/ "The unveiling of kdbus" says:
> 
>    Unlike most other kernels, Linux has never had a well-designed
>    IPC mechanism. Windows and Mac OS have this feature; even
>    Android, based on Linux, has one in the form of the "binder"
>    subsystem. Linux, instead, has only had the primitives —
>    sockets, FIFOs, and shared memory — but those have never been
>    knitted together into a reasonable application-level API. Kdbus
>    is an attempt to do that knitting and create something that is
>    at least as good as the mechanisms found on other systems.
> 
> These are bold words. I'm not sure what Windows and Mac OS
> have in terms of IPC, but the above suggests that kdbus
> is *the* new Linux IPC that everyone will use for everything,
> rather than a special purpose facility.

Everyone uses D-Bus today for everything on their system, so by
replacing the underlying library with kdbus, they will continue to use
it for everything without having to change any application or library
code at all.

Hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 20:19       ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 02:38:06AM +0800, Greg Kroah-Hartman wrote:
> Yes, I do agree, there are lots of existing ipc solutions today that
> kdbus is not designed for, nor would it be good to use it for.  The
> majority of them being IPC that crosses the network layer, as there are
> lots of good solutions today for that problem.  That being said, I do
> know one research group that has kdbus working cross-network, just "to
> try it out", but I don't know what ever came of it.
...
> Everyone uses D-Bus today for everything on their system, so by
> replacing the underlying library with kdbus, they will continue to use
> it for everything without having to change any application or library
> code at all.

These two statements somehow contradict. From my admittedly very
limited experience, I never used D-Bus because it did not
fit my usage scenarios: I never needed a bus, only point-to-point
links like pipes or sockets.

Let me rephrase my previous, lengthy mail: Will kdbus only
support the same IPC model as D-Bus (just with higher
performance and some bells and whistles), or will it
be useful for other scenarios?  Like, can two programs
use it to communicate directly without the need of
any daemon?  (And if so, would there be any advantage
compared to traditional UNIX IPC methods?)

You were comparing kdbus and Binder.  Why?
So far my impression is that D-Bus and Binder are
completely seperate things, not just because of
the thread vs. event-loop programming model but
also because Binder is not a bus (i.e. no multicast messaging).

> Hope this helps,

Well, it made your intentions a bit clearer, but it does
not help to sell kdbus to me, sorry ;-/


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 20:19       ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, Jan 20, 2015 at 02:38:06AM +0800, Greg Kroah-Hartman wrote:
> Yes, I do agree, there are lots of existing ipc solutions today that
> kdbus is not designed for, nor would it be good to use it for.  The
> majority of them being IPC that crosses the network layer, as there are
> lots of good solutions today for that problem.  That being said, I do
> know one research group that has kdbus working cross-network, just "to
> try it out", but I don't know what ever came of it.
...
> Everyone uses D-Bus today for everything on their system, so by
> replacing the underlying library with kdbus, they will continue to use
> it for everything without having to change any application or library
> code at all.

These two statements somehow contradict. From my admittedly very
limited experience, I never used D-Bus because it did not
fit my usage scenarios: I never needed a bus, only point-to-point
links like pipes or sockets.

Let me rephrase my previous, lengthy mail: Will kdbus only
support the same IPC model as D-Bus (just with higher
performance and some bells and whistles), or will it
be useful for other scenarios?  Like, can two programs
use it to communicate directly without the need of
any daemon?  (And if so, would there be any advantage
compared to traditional UNIX IPC methods?)

You were comparing kdbus and Binder.  Why?
So far my impression is that D-Bus and Binder are
completely seperate things, not just because of
the thread vs. event-loop programming model but
also because Binder is not a bus (i.e. no multicast messaging).

> Hope this helps,

Well, it made your intentions a bit clearer, but it does
not help to sell kdbus to me, sorry ;-/


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-19 20:19       ` Johannes Stezenbach
  (?)
@ 2015-01-19 20:31       ` Greg Kroah-Hartman
  2015-01-19 23:38           ` Johannes Stezenbach
  -1 siblings, 1 reply; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-19 20:31 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Jan 19, 2015 at 09:19:06PM +0100, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 02:38:06AM +0800, Greg Kroah-Hartman wrote:
> > Yes, I do agree, there are lots of existing ipc solutions today that
> > kdbus is not designed for, nor would it be good to use it for.  The
> > majority of them being IPC that crosses the network layer, as there are
> > lots of good solutions today for that problem.  That being said, I do
> > know one research group that has kdbus working cross-network, just "to
> > try it out", but I don't know what ever came of it.
> ...
> > Everyone uses D-Bus today for everything on their system, so by
> > replacing the underlying library with kdbus, they will continue to use
> > it for everything without having to change any application or library
> > code at all.
> 
> These two statements somehow contradict. From my admittedly very
> limited experience, I never used D-Bus because it did not
> fit my usage scenarios: I never needed a bus, only point-to-point
> links like pipes or sockets.

Great, then you don't need this, no need to worry about it at all, why
are we having this conversation? :)

> Let me rephrase my previous, lengthy mail: Will kdbus only
> support the same IPC model as D-Bus (just with higher
> performance and some bells and whistles), or will it
> be useful for other scenarios?  Like, can two programs
> use it to communicate directly without the need of
> any daemon?  (And if so, would there be any advantage
> compared to traditional UNIX IPC methods?)

It's a totally different model, as you point out from what you are
thinking of "traditional" IPC methods (side note, which of the 15+
current IPC methods do you consider "traditional", we have a lot of them
these days...)

> You were comparing kdbus and Binder.  Why?

Why not?  :)

Seriously, they are related in a way, see my long blog post for all of
the details about it if you are curious.

> So far my impression is that D-Bus and Binder are
> completely seperate things, not just because of
> the thread vs. event-loop programming model but
> also because Binder is not a bus (i.e. no multicast messaging).

People compare them a lot, which is why I brought it up, it's a
discussion that needed to be made.

> > Hope this helps,
> 
> Well, it made your intentions a bit clearer, but it does
> not help to sell kdbus to me, sorry ;-/

It's not my "goal" to sell kdbus to you, if you don't want it, great,
don't worry about it, don't build it on your kernels, and the world will
be fine.  Consider it like any other "driver" or filesystem, if you
don't need it, there's nothing to even discuss.

But odds are, you are using a system with D-Bus today, if not, then you
are using Linux in a very specific and limited manner, which is
wonderful, in that case this whole thread isn't really pertinent.

Lots of people do use D-Bus, and for those users, that is what this
patchset is for.

Hope that helps clear things up,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 23:38           ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 23:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 04:31:55AM +0800, Greg Kroah-Hartman wrote:
> On Mon, Jan 19, 2015 at 09:19:06PM +0100, Johannes Stezenbach wrote:
> > These two statements somehow contradict. From my admittedly very
> > limited experience, I never used D-Bus because it did not
> > fit my usage scenarios: I never needed a bus, only point-to-point
> > links like pipes or sockets.
> 
> Great, then you don't need this, no need to worry about it at all, why
> are we having this conversation? :)

Well, for one because that's what I wanted to find out...

> > Well, it made your intentions a bit clearer, but it does
> > not help to sell kdbus to me, sorry ;-/
> 
> It's not my "goal" to sell kdbus to you, if you don't want it, great,

I used this language because I think you're not providing
the facts that would allow me to judge for myself whether
kdbus is a good idea.  Those automotive applications you
were talking about, what was the OS they were ported from
and what was the messaging API they used?

> But odds are, you are using a system with D-Bus today, if not, then you
> are using Linux in a very specific and limited manner, which is
> wonderful, in that case this whole thread isn't really pertinent.
> 
> Lots of people do use D-Bus, and for those users, that is what this
> patchset is for.

As I said before, I'm seeing about a dozen D-Bus messages per minute,
nothing that would justify adding kdbus to the kernel for
performance reasons.  Wrt security I'm also not aware of any
open issues with D-Bus.  Thus I doubt normal users of D-Bus
would see any benefit from kdbus.  I also think none of the
applications I can install from my distribution has any performance
issue with D-Bus.

And this is the point where I ask myself if I missed something.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-19 23:38           ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-19 23:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, Jan 20, 2015 at 04:31:55AM +0800, Greg Kroah-Hartman wrote:
> On Mon, Jan 19, 2015 at 09:19:06PM +0100, Johannes Stezenbach wrote:
> > These two statements somehow contradict. From my admittedly very
> > limited experience, I never used D-Bus because it did not
> > fit my usage scenarios: I never needed a bus, only point-to-point
> > links like pipes or sockets.
> 
> Great, then you don't need this, no need to worry about it at all, why
> are we having this conversation? :)

Well, for one because that's what I wanted to find out...

> > Well, it made your intentions a bit clearer, but it does
> > not help to sell kdbus to me, sorry ;-/
> 
> It's not my "goal" to sell kdbus to you, if you don't want it, great,

I used this language because I think you're not providing
the facts that would allow me to judge for myself whether
kdbus is a good idea.  Those automotive applications you
were talking about, what was the OS they were ported from
and what was the messaging API they used?

> But odds are, you are using a system with D-Bus today, if not, then you
> are using Linux in a very specific and limited manner, which is
> wonderful, in that case this whole thread isn't really pertinent.
> 
> Lots of people do use D-Bus, and for those users, that is what this
> patchset is for.

As I said before, I'm seeing about a dozen D-Bus messages per minute,
nothing that would justify adding kdbus to the kernel for
performance reasons.  Wrt security I'm also not aware of any
open issues with D-Bus.  Thus I doubt normal users of D-Bus
would see any benefit from kdbus.  I also think none of the
applications I can install from my distribution has any performance
issue with D-Bus.

And this is the point where I ask myself if I missed something.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20  1:13             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-20  1:13 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 12:38:12AM +0100, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 04:31:55AM +0800, Greg Kroah-Hartman wrote:
> > On Mon, Jan 19, 2015 at 09:19:06PM +0100, Johannes Stezenbach wrote:
> > > These two statements somehow contradict. From my admittedly very
> > > limited experience, I never used D-Bus because it did not
> > > fit my usage scenarios: I never needed a bus, only point-to-point
> > > links like pipes or sockets.
> > 
> > Great, then you don't need this, no need to worry about it at all, why
> > are we having this conversation? :)
> 
> Well, for one because that's what I wanted to find out...
> 
> > > Well, it made your intentions a bit clearer, but it does
> > > not help to sell kdbus to me, sorry ;-/
> > 
> > It's not my "goal" to sell kdbus to you, if you don't want it, great,
> 
> I used this language because I think you're not providing
> the facts that would allow me to judge for myself whether
> kdbus is a good idea.  Those automotive applications you
> were talking about, what was the OS they were ported from
> and what was the messaging API they used?

They were ported from QNX and I don't know the exact api, it is wrapped
up in a library layer for them to use.  And typically, they run about
40 thousand messages in the first few seconds of startup time.  Or was
it 400 thousand?  Something huge and crazy to be doing on tiny ARM
chips, but that's the IVI industry for you :(

> > But odds are, you are using a system with D-Bus today, if not, then you
> > are using Linux in a very specific and limited manner, which is
> > wonderful, in that case this whole thread isn't really pertinent.
> > 
> > Lots of people do use D-Bus, and for those users, that is what this
> > patchset is for.
> 
> As I said before, I'm seeing about a dozen D-Bus messages per minute,
> nothing that would justify adding kdbus to the kernel for
> performance reasons.  Wrt security I'm also not aware of any
> open issues with D-Bus.  Thus I doubt normal users of D-Bus
> would see any benefit from kdbus.  I also think none of the
> applications I can install from my distribution has any performance
> issue with D-Bus.

That's because people have not done anything really needing performance
on the desktop over D-Bus in the past due to how slow the current
implementation is.  Now that this is being resolved, that can change,
and there are demos out there of even streaming audio over kdbus with no
problems.

But performance is not just the only reason we want this in the kernel,
I listed a whole long range of them.  Sure, it's great to now be faster,
cutting down the number of context switches and copies by a huge amount,
but the other things are equally important for future development
(namespaces, containers, security, early-boot, etc.)

> And this is the point where I ask myself if I missed something.

Don't focus purely on performance for your existing desktop system,
that's not the only use case here.  There are lots of others, as I
document, that can benefit and want this.

One "fun" thing I've been talking to someone about is the ability to
even port binder to be on top of kdbus.  But that's just a research
project, and requires some API changes on the userspace binder side, but
it shows real promise, and would then mean that we could deprecate the
old binder code and a few hundred million devices could then use kdbus
instead.  But that's long-term goals, not really all that relevant here,
but it shows that having a solid bus IPC mechanism is a powerful thing
that we have been missing in the past from Linux.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20  1:13             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-20  1:13 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, Jan 20, 2015 at 12:38:12AM +0100, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 04:31:55AM +0800, Greg Kroah-Hartman wrote:
> > On Mon, Jan 19, 2015 at 09:19:06PM +0100, Johannes Stezenbach wrote:
> > > These two statements somehow contradict. From my admittedly very
> > > limited experience, I never used D-Bus because it did not
> > > fit my usage scenarios: I never needed a bus, only point-to-point
> > > links like pipes or sockets.
> > 
> > Great, then you don't need this, no need to worry about it at all, why
> > are we having this conversation? :)
> 
> Well, for one because that's what I wanted to find out...
> 
> > > Well, it made your intentions a bit clearer, but it does
> > > not help to sell kdbus to me, sorry ;-/
> > 
> > It's not my "goal" to sell kdbus to you, if you don't want it, great,
> 
> I used this language because I think you're not providing
> the facts that would allow me to judge for myself whether
> kdbus is a good idea.  Those automotive applications you
> were talking about, what was the OS they were ported from
> and what was the messaging API they used?

They were ported from QNX and I don't know the exact api, it is wrapped
up in a library layer for them to use.  And typically, they run about
40 thousand messages in the first few seconds of startup time.  Or was
it 400 thousand?  Something huge and crazy to be doing on tiny ARM
chips, but that's the IVI industry for you :(

> > But odds are, you are using a system with D-Bus today, if not, then you
> > are using Linux in a very specific and limited manner, which is
> > wonderful, in that case this whole thread isn't really pertinent.
> > 
> > Lots of people do use D-Bus, and for those users, that is what this
> > patchset is for.
> 
> As I said before, I'm seeing about a dozen D-Bus messages per minute,
> nothing that would justify adding kdbus to the kernel for
> performance reasons.  Wrt security I'm also not aware of any
> open issues with D-Bus.  Thus I doubt normal users of D-Bus
> would see any benefit from kdbus.  I also think none of the
> applications I can install from my distribution has any performance
> issue with D-Bus.

That's because people have not done anything really needing performance
on the desktop over D-Bus in the past due to how slow the current
implementation is.  Now that this is being resolved, that can change,
and there are demos out there of even streaming audio over kdbus with no
problems.

But performance is not just the only reason we want this in the kernel,
I listed a whole long range of them.  Sure, it's great to now be faster,
cutting down the number of context switches and copies by a huge amount,
but the other things are equally important for future development
(namespaces, containers, security, early-boot, etc.)

> And this is the point where I ask myself if I missed something.

Don't focus purely on performance for your existing desktop system,
that's not the only use case here.  There are lots of others, as I
document, that can benefit and want this.

One "fun" thing I've been talking to someone about is the ability to
even port binder to be on top of kdbus.  But that's just a research
project, and requires some API changes on the userspace binder side, but
it shows real promise, and would then mean that we could deprecate the
old binder code and a few hundred million devices could then use kdbus
instead.  But that's long-term goals, not really all that relevant here,
but it shows that having a solid bus IPC mechanism is a powerful thing
that we have been missing in the past from Linux.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-20  1:13             ` Greg Kroah-Hartman
  (?)
@ 2015-01-20 10:57             ` Johannes Stezenbach
  2015-01-20 11:26                 ` Greg Kroah-Hartman
  -1 siblings, 1 reply; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 10:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 09:13:59AM +0800, Greg Kroah-Hartman wrote:
> On Tue, Jan 20, 2015 at 12:38:12AM +0100, Johannes Stezenbach wrote:
> > Those automotive applications you
> > were talking about, what was the OS they were ported from
> > and what was the messaging API they used?
> 
> They were ported from QNX and I don't know the exact api, it is wrapped
> up in a library layer for them to use.  And typically, they run about
> 40 thousand messages in the first few seconds of startup time.  Or was
> it 400 thousand?  Something huge and crazy to be doing on tiny ARM
> chips, but that's the IVI industry for you :(

So I did some googling and found in QNX servers create a channel
to receive messages, and clients connect to this channel.
Multiple clients can connect to the channel.
But it is not a bus -- no multicast/broadcast, and no name
service or policy rules like D-Bus has.  To me it looks
to be similar in functionality to UNIX domain sockets.

My guess is that the people porting from QNX were just confused
and their use of D-Bus was in error.  Maybe they should've used
plain sockets, capnproto, ZeroMQ or whatever.


> > As I said before, I'm seeing about a dozen D-Bus messages per minute,
> > nothing that would justify adding kdbus to the kernel for
> > performance reasons.  Wrt security I'm also not aware of any
> > open issues with D-Bus.  Thus I doubt normal users of D-Bus
> > would see any benefit from kdbus.  I also think none of the
> > applications I can install from my distribution has any performance
> > issue with D-Bus.
> 
> That's because people have not done anything really needing performance
> on the desktop over D-Bus in the past due to how slow the current
> implementation is.  Now that this is being resolved, that can change,
> and there are demos out there of even streaming audio over kdbus with no
> problems.
> 
> But performance is not just the only reason we want this in the kernel,
> I listed a whole long range of them.  Sure, it's great to now be faster,
> cutting down the number of context switches and copies by a huge amount,
> but the other things are equally important for future development
> (namespaces, containers, security, early-boot, etc.)
> 
> > And this is the point where I ask myself if I missed something.
> 
> Don't focus purely on performance for your existing desktop system,
> that's not the only use case here.  There are lots of others, as I
> document, that can benefit and want this.
> 
> One "fun" thing I've been talking to someone about is the ability to
> even port binder to be on top of kdbus.  But that's just a research
> project, and requires some API changes on the userspace binder side, but
> it shows real promise, and would then mean that we could deprecate the
> old binder code and a few hundred million devices could then use kdbus
> instead.  But that's long-term goals, not really all that relevant here,
> but it shows that having a solid bus IPC mechanism is a powerful thing
> that we have been missing in the past from Linux.

Well, IMHO you got it backwards.  Before adding a complex new IPC
API to the kernel you should do the homework and gather some
evidence that there will be enough users to justify the addition.

But maybe I misunderstood the purpose of this thread and you're
just advertising it to find possible users instead of already
suggesting to merge it?  If someone has some convincing story
to share why kdbus would solve their IPC needs, I'm all ears.

(I'm sorry this implies your responses so far were not convincing:
not verifiable facts, no numbers, no testimonials etc.)

FWIW, my gut feeling was that the earlier attempts to add a new
IPC primitve like multicast UNIX domain sockets
http://thread.gmane.org/gmane.linux.kernel/1255575/focus=1257999
were a much saner approach.  But now I think the comments
from this old thread have not been addressed, instead the
new approach just made the thing more complex and
put in ipc/ instead of net/ to bypass the guards.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 11:26                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-20 11:26 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 11:57:12AM +0100, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 09:13:59AM +0800, Greg Kroah-Hartman wrote:
> > On Tue, Jan 20, 2015 at 12:38:12AM +0100, Johannes Stezenbach wrote:
> > > Those automotive applications you
> > > were talking about, what was the OS they were ported from
> > > and what was the messaging API they used?
> > 
> > They were ported from QNX and I don't know the exact api, it is wrapped
> > up in a library layer for them to use.  And typically, they run about
> > 40 thousand messages in the first few seconds of startup time.  Or was
> > it 400 thousand?  Something huge and crazy to be doing on tiny ARM
> > chips, but that's the IVI industry for you :(
> 
> So I did some googling and found in QNX servers create a channel
> to receive messages, and clients connect to this channel.
> Multiple clients can connect to the channel.

Hence, a bus :)

> But it is not a bus -- no multicast/broadcast, and no name
> service or policy rules like D-Bus has.  To me it looks
> to be similar in functionality to UNIX domain sockets.

It's not as complex as D-Bus, but it's still subscribing to things and
getting messages.

> My guess is that the people porting from QNX were just confused
> and their use of D-Bus was in error.  Maybe they should've used
> plain sockets, capnproto, ZeroMQ or whatever.

I tend to trust that they knew what they were doing, they wouldn't have
picked D-Bus for no good reason.

> Well, IMHO you got it backwards.  Before adding a complex new IPC
> API to the kernel you should do the homework and gather some
> evidence that there will be enough users to justify the addition.

systemd wants this today for early boot.  It will remove lots of code
and enable a lot of good things to happen.  The first email in this
thread describes this quite well, is that not sufficient?

> FWIW, my gut feeling was that the earlier attempts to add a new
> IPC primitve like multicast UNIX domain sockets
> http://thread.gmane.org/gmane.linux.kernel/1255575/focus=1257999
> were a much saner approach.  But now I think the comments
> from this old thread have not been addressed, instead the
> new approach just made the thing more complex and
> put in ipc/ instead of net/ to bypass the guards.

Not at all, the networking maintainers said that that proposal was not
acceptable to them at all and it should not be done in the networking
stack at all.  So this was solution was created instead, which provides
a lot more things than the old networking patches did, which shows that
the networking developers were right to reject it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 11:26                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-20 11:26 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, Jan 20, 2015 at 11:57:12AM +0100, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 09:13:59AM +0800, Greg Kroah-Hartman wrote:
> > On Tue, Jan 20, 2015 at 12:38:12AM +0100, Johannes Stezenbach wrote:
> > > Those automotive applications you
> > > were talking about, what was the OS they were ported from
> > > and what was the messaging API they used?
> > 
> > They were ported from QNX and I don't know the exact api, it is wrapped
> > up in a library layer for them to use.  And typically, they run about
> > 40 thousand messages in the first few seconds of startup time.  Or was
> > it 400 thousand?  Something huge and crazy to be doing on tiny ARM
> > chips, but that's the IVI industry for you :(
> 
> So I did some googling and found in QNX servers create a channel
> to receive messages, and clients connect to this channel.
> Multiple clients can connect to the channel.

Hence, a bus :)

> But it is not a bus -- no multicast/broadcast, and no name
> service or policy rules like D-Bus has.  To me it looks
> to be similar in functionality to UNIX domain sockets.

It's not as complex as D-Bus, but it's still subscribing to things and
getting messages.

> My guess is that the people porting from QNX were just confused
> and their use of D-Bus was in error.  Maybe they should've used
> plain sockets, capnproto, ZeroMQ or whatever.

I tend to trust that they knew what they were doing, they wouldn't have
picked D-Bus for no good reason.

> Well, IMHO you got it backwards.  Before adding a complex new IPC
> API to the kernel you should do the homework and gather some
> evidence that there will be enough users to justify the addition.

systemd wants this today for early boot.  It will remove lots of code
and enable a lot of good things to happen.  The first email in this
thread describes this quite well, is that not sufficient?

> FWIW, my gut feeling was that the earlier attempts to add a new
> IPC primitve like multicast UNIX domain sockets
> http://thread.gmane.org/gmane.linux.kernel/1255575/focus=1257999
> were a much saner approach.  But now I think the comments
> from this old thread have not been addressed, instead the
> new approach just made the thing more complex and
> put in ipc/ instead of net/ to bypass the guards.

Not at all, the networking maintainers said that that proposal was not
acceptable to them at all and it should not be done in the networking
stack at all.  So this was solution was created instead, which provides
a lot more things than the old networking patches did, which shows that
the networking developers were right to reject it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 13:24                   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 13:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, Jan 20, 2015 at 07:26:09PM +0800, Greg Kroah-Hartman wrote:
> On Tue, Jan 20, 2015 at 11:57:12AM +0100, Johannes Stezenbach wrote:
> > 
> > So I did some googling and found in QNX servers create a channel
> > to receive messages, and clients connect to this channel.
> > Multiple clients can connect to the channel.
> 
> Hence, a bus :)
> 
> > But it is not a bus -- no multicast/broadcast, and no name
> > service or policy rules like D-Bus has.  To me it looks
> > to be similar in functionality to UNIX domain sockets.
> 
> It's not as complex as D-Bus, but it's still subscribing to things and
> getting messages.

Apparently you don't read what I write, probably you're not interested
in this discussion anymore...
QNX uses the term "channel" but it does not refer to a bus
or subscription facility, it is more like a socket in listening state.

> > My guess is that the people porting from QNX were just confused
> > and their use of D-Bus was in error.  Maybe they should've used
> > plain sockets, capnproto, ZeroMQ or whatever.
> 
> I tend to trust that they knew what they were doing, they wouldn't have
> picked D-Bus for no good reason.

The automotive developers I had the pleasure to work with would
use anything which is available via a mouse click in the
commercial Embedded Linux SDK IDE of their choice :)
Let's face it: QNX has a single IPC solution while Linux has
a confusing multitude of possibilities.

> > Well, IMHO you got it backwards.  Before adding a complex new IPC
> > API to the kernel you should do the homework and gather some
> > evidence that there will be enough users to justify the addition.
> 
> systemd wants this today for early boot.  It will remove lots of code
> and enable a lot of good things to happen.  The first email in this
> thread describes this quite well, is that not sufficient?

The first mail in this thread doesn't even mention systemd,
instead it has a lot of "marketing" buzzwords.
Of course it is no secret that systemd is the driving force
behind kdbus, but no public record exists to explain why
kdbus was chosen and designed the way it is, what alternatives
were considered and rejected etc.  (or if there is, please send a link)

> > FWIW, my gut feeling was that the earlier attempts to add a new
> > IPC primitve like multicast UNIX domain sockets
> > http://thread.gmane.org/gmane.linux.kernel/1255575/focus=1257999
> > were a much saner approach.  But now I think the comments
> > from this old thread have not been addressed, instead the
> > new approach just made the thing more complex and
> > put in ipc/ instead of net/ to bypass the guards.
> 
> Not at all, the networking maintainers said that that proposal was not
> acceptable to them at all and it should not be done in the networking
> stack at all.  So this was solution was created instead, which provides
> a lot more things than the old networking patches did, which shows that
> the networking developers were right to reject it.

Please read the gmane thread to the end.  It seems there were
several indications that D-Bus can be improved in userspace
using existing kernel facilities.  Havoc Pennington's mail
I quoted in my first response also contains some hints
about it.  I have no idea if any of this has ever been
pursued.  While adding complexity to critical net/ code paths
is probematic and a good reason to reject it, this was not
the only reason, the major one being "not neccessary".


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 13:24                   ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 13:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, Jan 20, 2015 at 07:26:09PM +0800, Greg Kroah-Hartman wrote:
> On Tue, Jan 20, 2015 at 11:57:12AM +0100, Johannes Stezenbach wrote:
> > 
> > So I did some googling and found in QNX servers create a channel
> > to receive messages, and clients connect to this channel.
> > Multiple clients can connect to the channel.
> 
> Hence, a bus :)
> 
> > But it is not a bus -- no multicast/broadcast, and no name
> > service or policy rules like D-Bus has.  To me it looks
> > to be similar in functionality to UNIX domain sockets.
> 
> It's not as complex as D-Bus, but it's still subscribing to things and
> getting messages.

Apparently you don't read what I write, probably you're not interested
in this discussion anymore...
QNX uses the term "channel" but it does not refer to a bus
or subscription facility, it is more like a socket in listening state.

> > My guess is that the people porting from QNX were just confused
> > and their use of D-Bus was in error.  Maybe they should've used
> > plain sockets, capnproto, ZeroMQ or whatever.
> 
> I tend to trust that they knew what they were doing, they wouldn't have
> picked D-Bus for no good reason.

The automotive developers I had the pleasure to work with would
use anything which is available via a mouse click in the
commercial Embedded Linux SDK IDE of their choice :)
Let's face it: QNX has a single IPC solution while Linux has
a confusing multitude of possibilities.

> > Well, IMHO you got it backwards.  Before adding a complex new IPC
> > API to the kernel you should do the homework and gather some
> > evidence that there will be enough users to justify the addition.
> 
> systemd wants this today for early boot.  It will remove lots of code
> and enable a lot of good things to happen.  The first email in this
> thread describes this quite well, is that not sufficient?

The first mail in this thread doesn't even mention systemd,
instead it has a lot of "marketing" buzzwords.
Of course it is no secret that systemd is the driving force
behind kdbus, but no public record exists to explain why
kdbus was chosen and designed the way it is, what alternatives
were considered and rejected etc.  (or if there is, please send a link)

> > FWIW, my gut feeling was that the earlier attempts to add a new
> > IPC primitve like multicast UNIX domain sockets
> > http://thread.gmane.org/gmane.linux.kernel/1255575/focus=1257999
> > were a much saner approach.  But now I think the comments
> > from this old thread have not been addressed, instead the
> > new approach just made the thing more complex and
> > put in ipc/ instead of net/ to bypass the guards.
> 
> Not at all, the networking maintainers said that that proposal was not
> acceptable to them at all and it should not be done in the networking
> stack at all.  So this was solution was created instead, which provides
> a lot more things than the old networking patches did, which shows that
> the networking developers were right to reject it.

Please read the gmane thread to the end.  It seems there were
several indications that D-Bus can be improved in userspace
using existing kernel facilities.  Havoc Pennington's mail
I quoted in my first response also contains some hints
about it.  I have no idea if any of this has ever been
pursued.  While adding complexity to critical net/ code paths
is probematic and a good reason to reject it, this was not
the only reason, the major one being "not neccessary".


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 13:53     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 13:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: mtk.manpages, daniel, dh.herrmann, tixxdz, Daniel Mack,
	Johannes Stezenbach

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel@zonque.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

I have some details feedback on the contents of this file, and some 
bigger questions. I'll split them out into separate mails.

So here, the bigger, general questions to start with. I've arrived late 
to this, so sorry if they've already been discussed, but the answers to 
some of the questions should actually be in this file, I would have 
expected.

This is an enormous and complex API. Why is the API ioctl() based,
rather than system-call-based? Have we learned nothing from the hydra
that the futex() multiplexing syscall became? (And kdbus is an order
of magnitude more complex, by the look of things.) At the very least,
a *good* justification of why the API is ioctl()-based should be part
of this documentation file.

An observation: The documentation below is substantial, but this API is 
enormous, so the documentation still feels rather thin. What would 
really help would be some example code in the doc. 

And on the subject of code examples... Are there any (prototype) 
working user-space applications that exercise the current kdbus 
implementation? That is, can I install these kdbus patches, and
then find a simple example application somewhere that does
something to exercise kdbus?

And then: is there any substantial real-world application (e.g., a 
full D-Bus port) that is being developed in tandem with this kernel
side patch? (I don't mean a user-space library; I mean a seriously
large application.) This is an incredibly complex API whose
failings are only going to become evident through real-world use.
Solidifying an API in the kernel and then discovering the API
problems later when writing real-world applications would make for
a sad story. A story something like that of inotify, an API which 
is an order of magnitude less complex than kdbus. (I can't help but
feel that many of inotify problems that I discuss at 
https://lwn.net/Articles/605128/ might have been fixed or mitigated 
if a few real-world applications had been implemented before the
API  was set in stone.)

> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h

Is this library intended just for systemd? More generally, is there an 
intention to provide a general purpose library API for kdbus? Or is the
intention that each application will roll a library suitable to its
needs? I think an answer to that question would be useful in this 
Documentation file.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 13:53     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 13:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack, Johannes Stezenbach

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

I have some details feedback on the contents of this file, and some 
bigger questions. I'll split them out into separate mails.

So here, the bigger, general questions to start with. I've arrived late 
to this, so sorry if they've already been discussed, but the answers to 
some of the questions should actually be in this file, I would have 
expected.

This is an enormous and complex API. Why is the API ioctl() based,
rather than system-call-based? Have we learned nothing from the hydra
that the futex() multiplexing syscall became? (And kdbus is an order
of magnitude more complex, by the look of things.) At the very least,
a *good* justification of why the API is ioctl()-based should be part
of this documentation file.

An observation: The documentation below is substantial, but this API is 
enormous, so the documentation still feels rather thin. What would 
really help would be some example code in the doc. 

And on the subject of code examples... Are there any (prototype) 
working user-space applications that exercise the current kdbus 
implementation? That is, can I install these kdbus patches, and
then find a simple example application somewhere that does
something to exercise kdbus?

And then: is there any substantial real-world application (e.g., a 
full D-Bus port) that is being developed in tandem with this kernel
side patch? (I don't mean a user-space library; I mean a seriously
large application.) This is an incredibly complex API whose
failings are only going to become evident through real-world use.
Solidifying an API in the kernel and then discovering the API
problems later when writing real-world applications would make for
a sad story. A story something like that of inotify, an API which 
is an order of magnitude less complex than kdbus. (I can't help but
feel that many of inotify problems that I discuss at 
https://lwn.net/Articles/605128/ might have been fixed or mitigated 
if a few real-world applications had been implemented before the
API  was set in stone.)

> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h

Is this library intended just for systemd? More generally, is there an 
intention to provide a general purpose library API for kdbus? Or is the
intention that each application will roll a library suitable to its
needs? I think an answer to that question would be useful in this 
Documentation file.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 13:58     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 13:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: mtk.manpages, daniel, dh.herrmann, tixxdz, Daniel Mack

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel@zonque.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

And now the details feedback.

Please note that for the various points I raise below, even in
cases where I don't suggest/request a fix, the fact that I've
needed to answer a question probably suggests a deficiency in 
the documentation that probably needs to be remedied.

Many of my comments below are wording and typo fixes. While
these may sem trivial, the existence of various wording problems
and typos is a significant distraction, especially while trying
to grok a document of this size.

I've marked one or two notable questions about the API with "###".

> Signed-off-by: Daniel Mack <daniel@zonque.org>
> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  Documentation/kdbus.txt | 2107 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 2107 insertions(+)
>  create mode 100644 Documentation/kdbus.txt
> 
> diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
> new file mode 100644
> index 000000000000..2592a7e37079
> --- /dev/null
> +++ b/Documentation/kdbus.txt
> @@ -0,0 +1,2107 @@
> +D-Bus is a system for powerful, easy to use interprocess communication (IPC).
> +
> +The focus of this document is an overview of the low-level, native kernel D-Bus
> +transport called kdbus. Kdbus exposes its functionality via files in a
> +filesystem called 'kdbusfs'. All communication between processes takes place
> +via ioctls on files exposed through the mount point of a kdbusfs. The default
> +mount point of kdbusfs is /sys/fs/kdbus.
> +
> +Please note that kdbus was designed as transport layer for D-Bus, but is in no
> +way limited, nor controlled by the D-Bus protocol specification. The D-Bus
> +protocol is one possible application layer on top of kdbus.
> +
> +For the general D-Bus protocol specification, the payload format, the
> +marshaling, and the communication semantics, please refer to:
> +  http://dbus.freedesktop.org/doc/dbus-specification.html
> +
> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
> +
> +Articles about D-Bus and kdbus:
> +  http://lwn.net/Articles/580194/
> +
> +
> +1. Terminology
> +===============================================================================
> +
> +  Domain:
> +    A domain is created each time a kdbusfs is mounted. Each process that is
> +    capable to mount a new instance of a kdbusfs will have its own kdbus

s/is capable to mount/mounts/

> +    hierarchy. Each domain (ie, each mount point) offers its own "control"
> +    file to create new buses. Domains have no connection to each other and
> +    cannot see nor talk to each other. See section 5 for more details.

Smoother would be:

    s/cannot see nor talk/can neither see nor talk/

> +
> +  Bus:
> +    A bus is a named object inside a domain. Clients exchange messages
> +    over a bus. Multiple buses themselves have no connection to each other;
> +    messages can only be exchanged on the same bus. The default endpoint of
> +    a bus, where clients establish the connection to, is the "bus" file

Maybe:

where clients establish the connection to
==>
to which clients establish connections
?

> +    /sys/fs/kdbus/<bus name>/bus.
> +    Common operating system setups create one "system bus" per system, and one
> +    "user bus" for every logged-in user. Applications or services may create

At the kdbus level is there any difference between such system and user buses?
If not, it would perhaps be good to insert a parenthetical aside to say 
that.

> +    their own private buses.  See section 5 for more details.
> +
> +  Endpoint:
> +    An endpoint provides a file to talk to a bus. Opening an endpoint
> +    creates a new connection to the bus to which the endpoint belongs. All
> +    endpoints have unique names and are accessible as files underneath the
> +    directory of a bus, e.g., /sys/fs/kdbus/<bus>/<endpoint>
> +    Every bus has a default endpoint called "bus". A bus can optionally offer
> +    additional endpoints with custom names to provide restricted access to the
> +    bus. Custom endpoints carry additional policy which can be used to create
> +    sandboxes with locked-down, limited, filtered access to a bus.  See
> +    section 5 for more details.
> +
> +  Connection:
> +    A connection to a bus is created by opening an endpoint file of a bus and
> +    becoming an active client with the HELLO exchange. Every ordinary client
> +    connection has a unique identifier on the bus and can address messages to
> +    every other connection on the same bus by using the peer's connection id
> +    as the destination.  See section 6 for more details.
> +
> +  Pool:
> +    Each connection allocates a piece of shmem-backed memory that is used
> +    to receive messages and answers to ioctl commands from the kernel. It is
> +    never used to send anything to the kernel. In order to access that memory,
> +    userspace must mmap() it into its address space.

s/userspace/a userspace application/

> +    See section 12 for more details.
> +
> +  Well-known Name:
> +    A connection can, in addition to its implicit unique connection id, request
> +    the ownership of a textual well-known name. Well-known names are noted in
> +    reverse-domain notation, such as com.example.service1. Connections offering
> +    a service on a bus are usually reached by its well-known name. The analogy

Noun/pronoun number disagreement at "Connections... its".
==>
    A connection that offers a service on a bus is usually reached by its 
    well-known name.

> +    of connection id and well-known name is an IP address and a DNS name

Doing s/id/ID/ throughout the doc would help readability. (The doc 
already uses "ID" in some places, so consistency is a further argument 
for this change.)

> +    associated with that address.
> +
> +  Message:
> +    Connections can exchange messages with other connections by addressing
> +    the peers with their connection id or well-known name. A message consists
> +    of a message header with kernel-specific information on how to route the

What does "kernel-specific" mean here? Something needs explaining (or removing).

> +    message, and the message payload, which is a logical byte stream of
> +    arbitrary size. Messages can carry additional file descriptors to be passed

So, this is FD passing like UNIX domain sockets? If yes, it would be helpful
here to mention that analogy.

> +    from one connection to another. Every connection can specify which set of
> +    metadata the kernel should attach to the message when it is delivered
> +    to the receiving connection. Metadata contains information like: system
> +    timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm,

s/well-known-names/well-known names/

See the note on "ID" above. I think UID, GID, TID throughout would help 
readability.

> +    process exe, process argv, cgroup, capabilities, seclabel, audit session,
> +    loginuid and the connection's human-readable name.
> +    See section 7 and 13 for more details.
> +
> +  Item:
> +    The API of kdbus implements a notion of items, submitted through and

s/a notion/the notion/

> +    returned by most ioctls, and stored inside data structures in the
> +    connection's pool. See section 4 for more details.
> +
> +  Broadcast and Match:
> +    Broadcast messages are potentially sent to all connections of a bus. By
> +    default, the connections will not actually receive any of the sent
> +    broadcast messages; only after installing a match for specific message
> +    properties, a broadcast message passes this filter.

"Filter" suddenly gets mentioned here without previously being defined. I suspect 
the last piece should read more like:

  a connection will receive a broadcast message only after it installs a filter
  that matches the specific message properties of the broadcast message.

> +    See section 10 for more details.
> +
> +  Policy:
> +    A policy is a set of rules that define which connections can see, talk to,
> +    or register a well-know name on the bus. A policy is attached to buses and

s/know/know/

> +    custom endpoints, and modified by policy holder connections or owners of
> +    custom endpoints. See section 11 for more details.
> +    See section 11 for more details.

Repeated last sentence.

> +  Privileged bus users:
> +    A user connecting to the bus is considered privileged if it is either the
> +    creator of the bus, or if it has the CAP_IPC_OWNER capability flag set.
> +
> +
> +2. Control Files Layout
> +===============================================================================
> +
> +The kdbus interface is exposed through files in its kdbusfs mount point
> +(defaults to /sys/fs/kdbus):
> +
> +  /sys/fs/kdbus                 (mount point of kdbusfs)
> +  |-- control                   (domain control-file)
> +  |-- 0-system                  (bus of user uid=0)
> +  |   |-- bus                   (default endpoint of bus '0-system')
> +  |   `-- ep.apache             (custom endpoint of bus '0-system')
> +  |-- 1000-user                 (bus of user uid=1000)
> +  |   `-- bus                   (default endpoint of bus '1000-user')
> +  `-- 2702-user                 (bus of user uid=2702)
> +      |-- bus                   (default endpoint of bus '2702-user')
> +      `-- ep.app                (custom endpoint of bus '2702-user')
> +
> +
> +3. Data Structures and flags
> +===============================================================================
> +
> +3.1 Data structures and interconnections
> +----------------------------------------
> +
> +  +--------------------------------------------------------------------------+
> +  | Domain (Mount Point)                                                     |
> +  | /sys/fs/kdbus/control                                                    |
> +  | +----------------------------------------------------------------------+ |
> +  | | Bus (System Bus)                                                     | |
> +  | | /sys/fs/kdbus/0-system/                                              | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | | Endpoint                      | | Endpoint                       | | |
> +  | | | /sys/fs/kdbus/0-system/bus    | | /sys/fs/kdbus/0-system/ep.app  | | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
> +  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | +----------------------------------------------------------------------+ |
> +  |                                                                          |
> +  | +----------------------------------------------------------------------+ |
> +  | | Bus (User Bus for UID 2702)                                          | |
> +  | | /sys/fs/kdbus/2702-user/                                             | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | | Endpoint                      | | Endpoint                       | | |
> +  | | | /sys/fs/kdbus/2702-user/bus   | | /sys/fs/kdbus/2702-user/ep.app | | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
> +  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
> +  | | +--------------+ +--------------+ +--------------------------------+ | |
> +  | +----------------------------------------------------------------------+ |
> +  +--------------------------------------------------------------------------+
> +
> +The above description uses the D-Bus notation of unique connection names that
> +adds a ":1." prefix to the connection's unique ID. kdbus itself doesn't
> +use that notation, neither internally nor externally. However, libraries and
> +other userspace code that aims for compatibility to D-Bus might.

s/compatibility to/compatibility with/

> +
> +3.2 Flags
> +---------
> +
> +All ioctls used in the communication with the driver contain three 64-bit
> +fields: 'flags', 'kernel_flags' and 'return_flags'. All of them are specific
> +to the ioctl used.
> +
> +In 'flags', the behavior of the command can be tweaked. All bits that are not
> +recognized by the kernel in this field are rejected, and the ioctl fails with
> +-EINVAL.
> +
> +In 'kernel_flags', the kernel driver writes back the mask of supported bits
> +upon each call, and sets the KDBUS_FLAGS_KERNEL bit. This is a way to probe
> +possible kernel features and make userspace code forward and backward
> +compatible.

So, is "kernel_flags' a bounding superset of what the caller may specify 
in 'flags'? If yes, please make that clearer.

> +
> +In 'return_flags', the kernel can return results of the command, in addition
> +to the actual return value. This is mostly to inform userspace about non-fatal
> +conditions that occurred during the execution of the command.
> +
> +
> +4. Items
> +===============================================================================
> +
> +To flexibly augment transport structures, data blobs of type struct kdbus_item
> +can be attached to the structs passed into the ioctls. Some ioctls make items
> +of certain types mandatory, others are optional. Unsupported items will cause
> +the ioctl to fail -EINVAL.
> +
> +The total size of an item is variable and is in some cases defined by the item
> +type. In other cases, they can be of arbitrary length (for instance, a string).
> +
> +Items are also used for information stored in a connection's pool, such as
> +received messages, name lists or requested connection or bus owner information.
> +
> +Whenever items are used as part of the kdbus kernel API, they are embedded in
> +structs that have an overall size of their own, so there can be multiple items

"have an overall size of their own" as hard to grok. I think you mean

    ... are embedded insides structs that themselves include a size field
    containing the overall size of the structure. This allows multiple 
    items per ioctl.

> +per ioctl.
> +
> +The kernel expects all items to be aligned to 8-byte boundaries. Unaligned
> +items or such that are unsupported by the ioctl are rejected.

s/such/items/?
(Or otherwise replace "such" with whatever it actually means.)

> +A simple iterator in userspace would iterate over the items until the items
> +have reached the embedding structure's overall size. An example implementation
> +of such an iterator can be found in tools/testing/selftests/kdbus/kdbus-util.h.
> +
> +
> +5. Creation of new domains, buses and endpoints
> +===============================================================================
> +
> +
> +5.1 Domains
> +-----------
> +
> +A domain is a container of buses. Domains themselves do not provide any IPC
> +functionality. Their sole purpose is to manage buses allocated in their
> +domain. Each time kdbusfs is mounted, a new kdbus domain is created, with its
> +own 'control' file. The lifetime of the domain ends once the user has unmounted
> +the kdbusfs. If you mount kdbusfs multiple times, each will have its own kdbus
> +domain internally. 

What does that last sentence mean? Somehow it needs to be reworded to better
convey whatever sense it is trying to convey.

> Operations performed on one domain do not affect any
> +other domain.
> +
> +The full kdbusfs hierarchy, any sub-directory, or file can be bind-mounted to
> +an external mount point and will remain fully functional. The kdbus domain and
> +any linked resources stay available until the original mount and all subsequent
> +bind-mounts have been unmounted.
> +
> +During creation, domains pin the user-namespace of the creator and use
> +it as controlling user-namespace for this domain. Any user accounting is done

s/as/as the/

> +relative to that user-namespace.
> +
> +Newly created kdbus domains do not have any bus pre-created. The only resource
> +available is a 'control' file, which is used to manage kdbus domains.
> +Currently, 'control' files are exclusively used to create buses via

s/exclusively used/used exclusively/

> +KDBUS_CMD_BUS_MAKE, but further ioctls might be added in the future.
> +
> +
> +5.2 Buses
> +---------
> +
> +A bus is a shared resource between connections to transmit messages. Each bus

==> A bus is a resource that is shared between connections in order to 
    transmit messages

> +is independent and operations on the bus will not have any effect on other
> +buses. A bus is a management entity, that controls the addresses of its

s/,//

> +connections, their policies and message transactions performed via this bus.
> +
> +Each bus is bound to the domain it was created on. It has a custom name that is
> +unique across all buses of a domain. In kdbusfs, a bus is presented as a
> +directory. No operations can be performed on the bus itself, instead you need

s/,/;/

> +to perform those on an endpoint associated with the bus. Endpoints are

s/those/operations/

> +accessible as files underneath the bus directory. A default endpoint called
> +"bus" is provided on each bus.
> +
> +Bus names may be chosen freely except for one restriction: the name
> +must be prefixed with the numeric UID of the creator and a dash. This

s/UID/effective  UID/ 
(I assume it's the effective UID...)

> +is required to avoid namespace clashes between different users. When
> +creating a bus the name must be passed in properly formatted, or the

the name must be passed in properly formatted
==>
that name that is passed in must be properly formatted

> +kernel will refuse creation of the bus. Example: "1047-foobar" is an
> +OK name for a bus registered by a user with UID 1047. However,

s/OK/acceptable/

> +"1024-foobar" is not, and neither is "foobar".
> +The UID must be provided in the user-namespace of the parent domain.
> +
> +To create a new bus, you need to open the control file of a domain and run the

s/run/employ/ (One doesn't "run" a system call.)

> +KDBUS_CMD_BUS_MAKE ioctl. The control file descriptor that was used to issue
> +KDBUS_CMD_BUS_MAKE must not have been used for any other control-ioctl before

Maybe better:

have been used for any other control-ioctl before
==>
previously have been used for any other control-ioctl

> +and needs to be kept open for the entire life-time of the created bus. Closing

s/needs/must/ (just reads smoother)

> +it will immediately cleanup the entire bus and all its associated resources and
> +endpoints. Every control file descriptor can only be used to create a single
> +new bus; from that point on, it is not used for any further communication until
> +the final close().
> +
> +Each bus will generate a random, 128-bit UUID upon creation. It will be

/It/This UUID/

> +returned to creators of connections through kdbus_cmd_hello.id128 and can
> +be used by userspace to uniquely identify buses, even across different machines
> +or containers. The UUID will have its variant bits set to 'DCE', and denote
> +version 4 (random).

I find that last sentence rather difficult to grasp. I think more detail
needs to be added.

> +When creating buses, a variable list of items that must be passed in
> +the items array is expected otherwise bus creation will fail.

What does "a variable list of items that must be passed in
the items array" mean? Something needs fixing, I think.

> +
> +5.3 Endpoints
> +-------------
> +
> +Endpoints are entry points to a bus. By default, each bus has a default
> +endpoint called 'bus'. The bus owner has the ability to create custom
> +endpoints with specific names, permissions, and policy databases (see below).
> +An endpoint is presented as file underneath the directory of the parent bus.
> +
> +To create a custom endpoint, open the default endpoint ('bus') and use the
> +KDBUS_CMD_ENDPOINT_MAKE ioctl with "struct kdbus_cmd_make". Custom endpoints
> +always have a policy database that, by default, forbids any operation. You have
> +to explicitly install policy entries to allow any operation on this endpoint.
> +Once KDBUS_CMD_ENDPOINT_MAKE succeeded, this file descriptor will manage the
> +newly created endpoint resource. It cannot be used to manage further resources.
> +
> +Endpoint names may be chosen freely except for one restriction: the name
> +must be prefixed with the numeric UID of the creator and a dash. This

s/UID/effective UID/

> +is required to avoid namespace clashes between different users. When
> +creating an endpoint the name must be passed in properly formatted, or the

creating an endpoint the name must be passed in properly formatted
==>
creating an endpoint, the name that is passed in must be properly formatted

> +kernel will refuse creation of the endpoint. Example: "1047-foobar" is an
> +OK name for an endpoint registered by a user with UID 1047. However,

s/OK/acceptable/

> +"1024-foobar" is not, and neither is "foobar".

Because this text reads almost exactly as the bus text in 5.2, I did 
a double take here. I suggest making the text more distinct in each case.

So, for example:

    Example: "1047-my-endpoint" is an OK name for an endpoint registered 
    by a user with UID 1047. However, "1024-my-endpoint" is not, and 
    neither is "my-endpoint".

(And you could do similar in the bus text in section 5.2.)

> +The UID must be provided in the user-namespace of the parent domain.
> +
> +To create connections to a bus, you use KDBUS_CMD_HELLO. See section 6 for
> +details. Note that once KDBUS_CMD_HELLO succeeded, this file descriptor manages

Note that once KDBUS_CMD_HELLO succeeded,
==>
Note that after a successful KDBUS_CMD_HELLO,

> +the newly created connection resource. It cannot be used to manage further
> +resources.
> +
> +
> +5.4 Creating buses and endpoints
> +--------------------------------
> +
> +KDBUS_CMD_BUS_MAKE, and KDBUS_CMD_ENDPOINT_MAKE take a

s/,//

> +struct kdbus_cmd_make argument.
> +
> +struct kdbus_cmd_make {
> +  __u64 size;
> +    The overall size of the struct, including its items.
> +
> +  __u64 flags;
> +    The flags for creation.
> +
> +    KDBUS_MAKE_ACCESS_GROUP
> +      Make the bus or endpoint file group-accessible
> +
> +    KDBUS_MAKE_ACCESS_WORLD
> +      Make the bus or endpoint file world-accessible
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    A list of items that has specific meanings for KDBUS_CMD_BUS_MAKE

s/has/have/

> +    and KDBUS_CMD_ENDPOINT_MAKE (see above).
> +
> +    Following items are expected for KDBUS_CMD_BUS_MAKE:
> +    KDBUS_ITEM_MAKE_NAME
> +      Contains a string to identify the bus name.

So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
which subfield holds the pointer to the string?

Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
I think.

> +
> +    KDBUS_ITEM_BLOOM_PARAMETER
> +      Bus-wide bloom parameters passed in a dbus_bloom_parameter struct
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_RECV
> +      An optional item that contains a set of required attach flags
> +      that connections must allow. This item is used as a negotiation
> +      measure during connection creation. If connections do not satisfy
> +      the bus requirements, they are not allowed on the bus.
> +      If not set, the bus does not require any metadata to be attached,

s/,/;/

> +      in this case connections are free to set their own attach flags.
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_SEND
> +      An optional item that contains a set of attach flags that are
> +      returned to connections when they query the bus creator metadata.
> +      If not set, no metadata is returned.
> +
> +    Unrecognized items are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +6. Connections
> +===============================================================================
> +
> +
> +6.1 Connection IDs and well-known connection names
> +--------------------------------------------------
> +
> +Connections are identified by their connection id, internally implemented as a
> +uint64_t counter. The IDs of every newly created bus start at 1, and every new
> +connection will increment the counter by 1. The ids are not reused.

Again, please change "ids" to IDs" throughout.

> +
> +In higher level tools, the user visible representation of a connection is
> +defined by the D-Bus protocol specification as ":1.<id>".
> +
> +Messages with a specific uint64_t destination id are directly delivered to
> +the connection with the corresponding id. Messages with the special destination
> +id KDBUS_DST_ID_BROADCAST are broadcast messages and are potentially delivered
> +to all known connections on the bus; clients interested in broadcast messages
> +need to subscribe to the specific messages they are interested, though before
> +any broadcast message reaches them.

The piece following the semicolon would be better as this separate sentence, 
I think:

    However, in order to receive any broadcast messages, clients must
    to subscribe to the specific messages in which they are interested.

> +
> +Messages synthesized and sent directly by the kernel will carry the special
> +source id KDBUS_SRC_ID_KERNEL (0).
> +
> +In addition to the unique uint64_t connection id, established connections can
> +request the ownership of well-known names, under which they can be found and
> +addressed by other bus clients. A well-known name is associated with one and
> +only one connection at a time. See section 8 on name acquisition and the
> +name registry, and the validity of names.
> +
> +Messages can specify the special destination id 0 and carry a well-known name
> +in the message data. Such a message is delivered to the destination connection
> +which owns that well-known name.
> +
> +  +-------------------------------------------------------------------------+
> +  | +---------------+     +---------------------------+                     |
> +  | | Connection    |     | Message                   | -----------------+  |
> +  | | :1.22         | --> | src: 22                   |                  |  |
> +  | |               |     | dst: 25                   |                  |  |
> +  | |               |     |                           |                  |  |
> +  | |               |     |                           |                  |  |
> +  | |               |     +---------------------------+                  |  |
> +  | |               |                                                    |  |
> +  | |               | <--------------------------------------+           |  |
> +  | +---------------+                                        |           |  |
> +  |                                                          |           |  |
> +  | +---------------+     +---------------------------+      |           |  |
> +  | | Connection    |     | Message                   | -----+           |  |
> +  | | :1.25         | --> | src: 25                   |                  |  |
> +  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
> +  | |               |     |  (KDBUS_DST_ID_BROADCAST) |              |   |  |
> +  | |               |     |                           | ---------+   |   |  |
> +  | |               |     +---------------------------+          |   |   |  |
> +  | |               |                                            |   |   |  |
> +  | |               | <--------------------------------------------------+  |
> +  | +---------------+                                            |   |      |
> +  |                                                              |   |      |
> +  | +---------------+     +---------------------------+          |   |      |
> +  | | Connection    |     | Message                   | --+      |   |      |
> +  | | :1.55         | --> | src: 55                   |   |      |   |      |
> +  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
> +  | |               |     |                           |   |      |   |      |
> +  | |               |     |                           |   |      |   |      |
> +  | |               |     +---------------------------+   |      |   |      |
> +  | |               |                                     |      |   |      |
> +  | |               | <------------------------------------------+   |      |
> +  | +---------------+                                     |          |      |
> +  |                                                       |          |      |
> +  | +---------------+                                     |          |      |
> +  | | Connection    |                                     |          |      |
> +  | | :1.81         |                                     |          |      |
> +  | | org.foo.bar   |                                     |          |      |
> +  | |               |                                     |          |      |
> +  | |               |                                     |          |      |
> +  | |               | <-----------------------------------+          |      |
> +  | |               |                                                |      |
> +  | |               | <----------------------------------------------+      |
> +  | +---------------+                                                       |
> +  +-------------------------------------------------------------------------+
> +
> +
> +6.2 Creating connections
> +------------------------
> +
> +A connection to a bus is created by opening an endpoint file of a bus and
> +becoming an active client with the KDBUS_CMD_HELLO ioctl. Every connected client
> +connection has a unique identifier on the bus and can address messages to every
> +other connection on the same bus by using the peer's connection id as the
> +destination.
> +
> +The KDBUS_CMD_HELLO ioctl takes the following struct as argument.
> +
> +struct kdbus_cmd_hello {
> +  __u64 size;
> +    The overall size of the struct, including all attached items.
> +
> +  __u64 flags;
> +    Flags to apply to this connection:
> +
> +    KDBUS_HELLO_ACCEPT_FD
> +      When this flag is set, the connection can be sent file descriptors
> +      as message payload. If it's not set, any attempt of doing so will

s/any attempt of doing so/an attempt to send file descriptors/

> +      result in -ECOMM on the sender's side.
> +
> +    KDBUS_HELLO_ACTIVATOR
> +      Make this connection an activator (see below). With this bit set,
> +      an item of type KDBUS_ITEM_NAME has to be attached which describes

s/attached which describes/attached. This item describes/

> +      the well-known name this connection should be an activator for.
> +
> +    KDBUS_HELLO_POLICY_HOLDER
> +      Make this connection a policy holder (see below). With this bit set,
> +      an item of type KDBUS_ITEM_NAME has to be attached which describes

s/attached which describes/attached. This item describes/

> +      the well-known name this connection should hold a policy for.
> +
> +    KDBUS_HELLO_MONITOR
> +      Make this connection an eaves-dropping connection. See section 6.8 for

s/eaves-dropping/eavesdropping/

> +      more information.
> +
> +To also receive broadcast messages,
   ^
Indentation error.

> +      the connection has to upload appropriate matches as well.
> +      This flag is only valid for privileged bus connections.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 attach_flags_send;
> +      Set the bits for metadata this connection permits to be sent to the
> +      receiving peer. Only metadata items that are both allowed to be sent by
> +      the sender and that are requested by the receiver will effectively be
> +      attached to the message eventually. Note, however, that the bus may

What does "eventually" mean here?

> +      optionally enforce some of those bits to be set. If the match fails,

s/enforce/require/ ?

> +      -ECONNREFUSED will be returned. In either case, this field will be set
> +      to the mask of metadata items that are enforced by the bus. The
> +      KDBUS_FLAGS_KERNEL bit will as well be set.
> +
> +  __u64 attach_flags_recv;
> +      Request the attachment of metadata for each message received by this
> +      connection. The metadata actually attached may actually augment the list

Seems like two "actually" in the previous line is one too many.

> +      of requested items. See section 13 for more details.
> +
> +  __u64 bus_flags;
> +      Upon successful completion of the ioctl, this member will contain the
> +      flags of the bus it connected to.
> +
> +  __u64 id;
> +      Upon successful completion of the ioctl, this member will contain the
> +      id of the new connection.
> +
> +  __u64 pool_size;
> +      The size of the communication pool, in bytes. The pool can be accessed
> +      by calling mmap() on the file descriptor that was used to issue the
> +      KDBUS_CMD_HELLO ioctl.
> +
> +  __u64 offset;
> +      The kernel will return the offset in the pool where returned details
> +      will be stored.
> +
> +  __u8 id128[16];
> +      Upon successful completion of the ioctl, this member will contain the
> +      128 bit wide UUID of the connected bus.

s/128 bit wide/128-bit/

> +
> +  struct kdbus_item items[0];
> +      Variable list of items to add optional additional information. The

s/to add optional additional/containing optional additional/

> +      following items are currently expected/valid:
> +
> +      KDBUS_ITEM_CONN_DESCRIPTION
> +        Contains a string to describes this connection's name, so it can be

s/to/that/

> +        identified later.
> +
> +      KDBUS_ITEM_NAME
> +      KDBUS_ITEM_POLICY_ACCESS
> +        For activators and policy holders only, combinations of these two
> +        items describe policy access entries (see section about policy).
> +
> +      KDBUS_ITEM_CREDS
> +      KDBUS_ITEM_PIDS
> +      KDBUS_ITEM_SECLABEL
> +        Privileged bus users may submit these types in order to create
> +        connections with faked credentials. This information will be returned
> +        when peer information is queried by KDBUS_CMD_CONN_INFO. See section
> +        13 for more information.
> +
> +      Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +At the offset returned in the 'offset' field of struct kdbus_cmd_hello, the
> +kernel will store items of the following types:
> +
> +  KDBUS_ITEM_BLOOM_PARAMETER
> +      Bloom filter parameter as defined by the bus creator (see below).
> +
> +The offset in the pool has to be freed with the KDBUS_CMD_FREE ioctl.

As far as I can tell, KDBUS_CMD_FREE detailed anywhere in this file. It
needs a detailed description soemwhere.

> +
> +6.3 Activator and policy holder connection
> +------------------------------------------
> +
> +An activator connection is a placeholder for a well-known name. Messages sent
> +to such a connection can be used by userspace to start an implementer
> +connection, which will then get all the messages from the activator copied
> +over. An activator connection cannot be used to send any message.
> +
> +A policy holder connection only installs a policy for one or more names.
> +These policy entries are kept active as long as the connection is alive, and
> +are removed once it terminates. Such a policy connection type can be used to
> +deploy restrictions for names that are not yet active on the bus. A policy
> +holder connection cannot be used to send any message.
> +
> +The creation of activator, policy holder or monitor connections is an operation

What is a "monitor connection"? That term springs up unannounced. Is it
an "eavesdropping connection" as described above? Either define the term
"monitor connection" or use consistent terminology. (Actually, further down
in the document, it is clarified that "monitor connection" == "eavesdropper".
But that is not clear at THIS point in the document. It needs to be clearer.)

> +restricted to privileged users on the bus (see section "Terminology").
> +
> +
> +6.4 Retrieving information on a connection
> +------------------------------------------
> +
> +The KDBUS_CMD_CONN_INFO ioctl can be used to retrieve credentials and
> +properties of the initial creator of a connection. This ioctl uses the
> +following struct:
> +
> +struct kdbus_cmd_info {
> +  __u64 size;
> +    The overall size of the struct, including the name with its 0-byte string
> +    terminator.

"0-byte string terminator" reads strangely. I assume you mean "terminating 
null byte" / "null-terminated string" Best to use those more standard terms.

So, maybe:

    including the name with its terminating null byte

or:

    including the null-terminated 'name' string
    

There are multiple other instances of 0-byte in the doc, and I think they 
should also be fixed in a similar fashion.

> +
> +  __u64 flags;
> +    Specify which metadata items should be attached to the answer.

s/Specify/Specifies/

> +    See section 13 for more details.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 id;
> +    The connection's numerical ID to retrieve information for. If set to

Hard to parse sentence.

==> The numerical ID of the connection for which information is to 
    be retrieved.

> +    non-zero value, the 'name' field is ignored.

s/non-zero value/a non-zero value/

> +
> +  __u64 offset;
> +    When the ioctl returns, this value will yield the offset of the connection

s/value will yield/field will contain/

> +    information inside the caller's pool.
> +
> +  __u64 info_size;
> +    The kernel will return the size of the returned information, so applications
> +    can optionally mmap specific parts of the pool.
> +
> +  struct kdbus_item items[0];
> +    The optional item list, containing the well-known name to look up as
> +    a KDBUS_ITEM_OWNED_NAME. Only required if the 'id' field is set to 0.
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +After the ioctl returns, the following struct will be stored in the caller's
> +pool at 'offset'.
> +
> +struct kdbus_info {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 id;
> +    The connection's unique ID.
> +
> +  __u64 flags;
> +    The connection's flags as specified when it was created.
> +
> +  struct kdbus_item items[0];
> +    Depending on the 'flags' field in struct kdbus_cmd_info, items of
> +    types KDBUS_ITEM_OWNED_NAME and KDBUS_ITEM_CONN_DESCRIPTION are followed

s/are followed/follow/

> +    here.
> +};
> +
> +Once the caller is finished with parsing the return buffer, it needs to call
> +KDBUS_CMD_FREE for the offset.
> +
> +
> +6.5 Getting information about a connection's bus creator
> +--------------------------------------------------------
> +
> +The KDBUS_CMD_BUS_CREATOR_INFO ioctl takes the same struct as
> +KDBUS_CMD_CONN_INFO but is used to retrieve information about the creator of
> +the bus the connection is attached to. The metadata returned by this call is
> +collected during the creation of the bus and is never altered afterwards, so
> +it provides pristine information on the task that created the bus, at the
> +moment when it did so.
> +
> +In response to this call, a slice in the connection's pool is allocated and
> +filled with an object of type struct kdbus_info, pointed to by the ioctl's
> +'offset' field.
> +
> +struct kdbus_info {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 id;
> +    The bus ID
> +
> +  __u64 flags;
> +    The bus flags as specified when it was created.

s/it/the bus/

> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  struct kdbus_item items[0];
> +    Metadata information is stored in items here. The item list contains
> +    a KDBUS_ITEM_MAKE_NAME item that indicates the bus name of the
> +    calling connection.
> +};
> +
> +Once the caller is finished with parsing the return buffer, it needs to call
> +KDBUS_CMD_FREE for the offset.
> +
> +
> +6.6 Updating connection details
> +-------------------------------
> +
> +Some of a connection's details can be updated with the KDBUS_CMD_CONN_UPDATE
> +ioctl, using the file descriptor that was used to create the connection.
> +The update command uses the following struct.
> +
> +struct kdbus_cmd_update {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 flags;
> +    Currently no flags are supported. Reserved for future use.

> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to describe the connection details to be updated. The following item
> +    types are supported:
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_SEND
> +      Supply a new set of items that this connection permits to be sent along
> +      with messages.
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_RECV
> +      Supply a new set of items to be attached to each message.
> +
> +    KDBUS_ITEM_NAME
> +    KDBUS_ITEM_POLICY_ACCESS
> +      Policy holder connections may supply a new set of policy information
> +      with these items. For other connection types, -EOPNOTSUPP is returned.
> +
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +6.7 Termination
> +---------------
> +
> +A connection can be terminated by simply closing its file descriptor. All
> +pending incoming messages will be discarded, and the memory in the pool will
> +be freed.
> +
> +An alternative way of closing down a connection is calling the KDBUS_CMD_BYEBYE
> +ioctl on it, which will only succeed if the message queue of the connection is
> +empty at the time of closing, otherwise, -EBUSY is returned.

The preceding is a little hard to parse. I suggest:

    An alternative way of closing down a connection is via the KDBUS_CMD_BYEBYE
    ioctl. This ioctly will succeed only if the message queue of the connection is
    empty at the time of closing; otherwise, -EBUSY is returned.

> +
> +When this ioctl returns successfully, the connection has been terminated and
> +won't accept any new messages from remote peers. This way, a connection can
> +be terminated race-free, without losing any messages.
> +
> +
> +6.8 Monitor connections ('eavesdropper')
> +----------------------------------------
> +
> +Eavesdropping connections are created by setting the KDBUS_HELLO_MONITOR flag
> +in struct kdbus_hello.flags. Such connections have all properties of any other,
> +regular connection, except for the following details:
> +
> +  * They will get every message sent over the bus, both unicasts and broadcasts
> +
> +  * Installing matches for broadcast messages is neither necessary nor allowed
> +
> +  * They cannot send messages or be directly addressed as receiver
> +
> +  * Their creation and destruction will not cause KDBUS_ITEM_ID_{ADD,REMOVE}
> +    notifications to be generated, so other connections cannot detect the
> +    presence of an eavesdropper.
> +
> +
> +7. Messages
> +===============================================================================
> +
> +Messages consist of a fixed-size header followed directly by a list of
> +variable-sized data 'items'. The overall message size is specified in the
> +header of the message. The chain of data items can contain well-defined
> +message metadata fields, raw data, references to data, or file descriptors.
> +
> +
> +7.1 Sending messages
> +--------------------
> +
> +Messages are passed to the kernel with the KDBUS_CMD_SEND ioctl. Depending
> +on the destination address of the message, the kernel delivers the message to
> +the specific destination connection or to all connections on the same bus.
> +Sending messages across buses is not possible. Messages are always queued in
> +the memory pool of the destination connection (see below).
> +
> +The KDBUS_CMD_SEND ioctl uses struct kdbus_cmd_send to describe the message
> +transfer.
> +
> +struct kdbus_cmd_send {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    Flags for message delivery:
> +
> +    KDBUS_SEND_SYNC_REPLY
> +      By default, all calls to kdbus are considered asynchronous,
> +      non-blocking. However, as there are many use cases that need to wait
> +      for a remote peer to answer a method call, there's a way to send a
> +      message and wait for a reply in a synchronous fashion. This is what
> +      the KDBUS_MSG_SYNC_REPLY controls. The KDBUS_CMD_SEND ioctl will block
> +      until the reply has arrived, the timeout limit is reached, in case the
> +      remote connection was shut down, or if interrupted by a signal before
> +      any reply; see signal(7).
> +
> +      The offset of the reply message in the sender's pool is stored in in
> +      'offset_reply' when the ioctl has returned without error. Hence, there
> +      is no need for another KDBUS_CMD_RECV ioctl or anything else to receive
> +      the reply.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call of
> +    KDBUS_CMD_SEND.
> +
> +  __u64 kernel_msg_flags;
> +    Valid bits for message flags, returned by the kernel upon each call of
> +    KDBUS_CMD_SEND.
> +
> +  __u64 return_flags;
> +    Kernel-provided flags, returning non-fatal errors that occurred during
> +    send. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 msg_address;
> +    Userspace has to provide a pointer to a message (struct kdbus_msg) to send.
> +
> +  struct kdbus_msg_info reply;
> +    Only used for synchronous replies. See description of struct kdbus_cmd_recv
> +    for more details.
> +
> +  struct kdbus_item items[0];
> +    The following items are currently recognized:
> +
> +    KDBUS_ITEM_CANCEL_FD
> +      When this optional item is passed in, and the call is executed as SYNC
> +      call, the passed in file descriptor can be used as alternative
> +      cancellation point. The kernel will call poll() on this file descriptor,
> +      and if it reports any incoming bytes, the blocking send operation will
> +      be canceled, and the call will return -ECANCELED. Any type of file
> +      descriptor that implements poll() can be used as payload to this item.
> +      For asynchronous message sending, this item is accepted but ignored.
> +
> +    All other items are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +The message referenced by 'msg_address' above has the following layout.
> +
> +struct kdbus_msg {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    KDBUS_MSG_EXPECT_REPLY
> +      Expect a reply from the remote peer to this message. With this bit set,

s/from the remote peer to this message/to this message from the remote peer/

> +      the timeout_ns field must be set to a non-zero number of nanoseconds in
> +      which the receiving peer is expected to reply. If such a reply is not
> +      received in time, the sender will be notified with a timeout message
> +      (see below). The value must be an absolute value, in nanoseconds and
> +      based on CLOCK_MONOTONIC.

Why is the option of a relative timeout not available?

> +      For a message to be accepted as reply, it must be a direct message to
> +      the original sender (not a broadcast), and its kdbus_msg.reply_cookie
> +      must match the previous message's kdbus_msg.cookie.
> +
> +      Expected replies also temporarily open the policy of the sending
> +      connection, so the other peer is allowed to respond within the given
> +      time window.
> +
> +    KDBUS_MSG_NO_AUTO_START
> +      By default, when a message is sent to an activator connection, the
> +      activator notified and will start an implementer. This flag inhibits
> +      that behavior. With this bit set, and the remote being an activator,
> +      -EADDRNOTAVAIL is returned from the ioctl.
> +
> +  __s64 priority;
> +    The priority of this message. Receiving messages (see below) may
> +    optionally be constrained to messages of a minimal priority. This
> +    allows for use cases where timing critical data is interleaved with
> +    control data on the same connection. If unused, the priority should be
> +    set to zero.
> +
> +  __u64 dst_id;
> +    The numeric ID of the destination connection, or KDBUS_DST_ID_BROADCAST
> +    (~0ULL) to address every peer on the bus, or KDBUS_DST_ID_NAME (0) to look
> +    it up dynamically from the bus' name registry. In the latter case, an item
> +    of type KDBUS_ITEM_DST_NAME is mandatory.
> +
> +  __u64 src_id;
> +    Upon return of the ioctl, this member will contain the sending
> +    connection's numerical ID. Should be 0 at send time.
> +
> +  __u64 payload_type;
> +    Type of the payload in the actual data records. Currently, only
> +    KDBUS_PAYLOAD_DBUS is accepted as input value of this field. When
> +    receiving messages that are generated by the kernel (notifications),
> +    this field will yield KDBUS_PAYLOAD_KERNEL.

s/yield/cointain/ ?

> +
> +  __u64 cookie;
> +    Cookie of this message, for later recognition. Also, when replying
> +    to a message (see above), the cookie_reply field must match this value.
> +
> +  __u64 timeout_ns;
> +    If the message sent requires a reply from the remote peer (see above),
> +    this field contains the timeout in absolute nanoseconds based on
> +    CLOCK_MONOTONIC.
> +
> +  __u64 cookie_reply;
> +    If the message sent is a reply to another message, this field must
> +    match the cookie of the formerly received message.
> +
> +  struct kdbus_item items[0];
> +    A dynamically sized list of items to contain additional information.
> +    The following items are expected/valid:
> +
> +    KDBUS_ITEM_PAYLOAD_VEC
> +    KDBUS_ITEM_PAYLOAD_MEMFD
> +    KDBUS_ITEM_FDS
> +      Actual data records containing the payload. See section "Passing of
> +      Payload Data".
> +
> +    KDBUS_ITEM_BLOOM_FILTER
> +      Bloom filter for matches (see below).
> +
> +    KDBUS_ITEM_DST_NAME
> +      Well-known name to send this message to. Required if dst_id is set
> +      to KDBUS_DST_ID_NAME. If a connection holding the given name can't
> +      be found, -ESRCH is returned.
> +      For messages to a unique name (ID), this item is optional. If present,
> +      the kernel will make sure the name owner matches the given unique name.
> +      This allows userspace tie the message sending to the condition that a
> +      name is currently owned by a certain unique name.
> +};
> +
> +The message will be augmented by the requested metadata items when queued into
> +the receiver's pool. See also section 13.2 ("Metadata and namespaces").
> +
> +
> +7.2 Message layout
> +------------------
> +
> +The layout of a message is shown below.
> +
> +  +-------------------------------------------------------------------------+
> +  | Message                                                                 |
> +  | +---------------------------------------------------------------------+ |
> +  | | Header                                                              | |
> +  | | size:          overall message size, including the data records     | |
> +  | | destination:   connection id of the receiver                        | |
> +  | | source:        connection id of the sender (set by kernel)          | |
> +  | | payload_type:  "DBusDBus" textual identifier stored as uint64_t     | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size (without padding)                        | |
> +  | | type:  type of data                                                 | |
> +  | | data:  reference to data (address or file descriptor)               | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | padding bytes to the next 8 byte alignment                          | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size (without padding)                        | |
> +  | | ...                                                                 | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | padding bytes to the next 8 byte alignment                          | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size                                          | |
> +  | | ...                                                                 | |
> +  | +---------------------------------------------------------------------+ |
> +  |   ... further data records ...                                          |
> +  +-------------------------------------------------------------------------+
> +
> +
> +7.3 Passing of Payload Data
> +---------------------------
> +
> +When connecting to the bus, receivers request a memory pool of a given size,
> +large enough to carry all backlog of data enqueued for the connection. The
> +pool is internally backed by a shared memory file which can be mmap()ed by
> +the receiver.
> +
> +KDBUS_MSG_PAYLOAD_VEC:
> +  Messages are directly copied by the sending process into the receiver's pool,

s/,/./ and then start a new sentence.

> +  that way two peers can exchange data by effectively doing a single-copy from
> +  one process to another, the kernel will not buffer the data anywhere else.
s/,/;/

> +
> +KDBUS_MSG_PAYLOAD_MEMFD:
> +  Messages can reference memfd files which contain the data.
> +  memfd files are tmpfs-backed files that allow sealing of the content of the
> +  file, which prevents all writable access to the file content.
> +  Only memfds that have (F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL) set
> +  are accepted as payload data, which enforces reliable passing of data.
> +  The receiver can assume that neither the sender nor anyone else can alter the
> +  content after the message is sent.
> +  Apart from the sender filling-in the content into memfd files, the data will
> +  be passed as zero-copy from one process to another, read-only, shared between
> +  the peers.
> +
> +The sender must not make any assumptions on the type how data is received by the

Wording error at "the type how". Some fix is needed.

> +remote peer. The kernel is free to re-pack multiple VEC and MEMFD payloads. For
> +instance, the kernel may decide to merge multiple VECs into a single VEC, inline
> +MEMFD payloads into memory or merge all passed VECs into a single MEMFD.
> +However, the kernel preserves the order of passed data. This means, the order of

s/,//

> +all VEC and MEMFD items is not changed in respect to each other.

s/in/with/

> +
> +In other words: All passed VEC and MEMFD data payloads are treated as a single
> +stream of data that may be received by the remote peer in a different set of
> +hunks than it was sent as.
> +
> +
> +7.4 Receiving messages
> +----------------------
> +
> +Messages are received by the client with the KDBUS_CMD_RECV ioctl. The endpoint
> +file of the bus supports poll() to wake up the receiving process when new

s%poll()%poll/select()/epoll%  ?

> +messages are queued up to be received.
> +
> +With the KDBUS_CMD_RECV ioctl, a struct kdbus_cmd_recv is used.
> +
> +struct kdbus_cmd_recv {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    Flags to control the receive command.
> +
> +    KDBUS_RECV_PEEK
> +      Just return the location of the next message. Do not install file
> +      descriptors or anything else. This is usually used to determine the
> +      sender of the next queued message.
> +
> +    KDBUS_RECV_DROP
> +      Drop the next message without doing anything else with it, and free the
> +      pool slice. This a short-cut for KDBUS_RECV_PEEK and KDBUS_CMD_FREE.
> +
> +    KDBUS_RECV_USE_PRIORITY
> +      Use the priority field (see below).
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Kernel-provided flags, returning non-fatal errors that occurred during
> +    send. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __s64 priority;
> +    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
> +    the queue with at least the given priority. If no such message is waiting
> +    in the queue, -ENOMSG is returned.

###
How do I simply select the highest priority message, without knowing what 
its priority is?

> +
> +  __u64 dropped_msgs;
> +    If the CMD_RECV ioctl fails with EOVERFLOW, this field is filled by
> +    the kernel with the number of messages that couldn't be transmitted to
> +    this connection. In that case, the @offset member must not be accessed.
> +
> +  struct kdbus_msg_info msg;
> +   Embedded struct to be filled when the command succeeded (see below).
> +
> +  struct kdbus_item items[0];
> +    Items to specify further details for the receive command. Currently unused.
> +};
> +
> +Both 'struct kdbus_cmd_recv' and 'struct kdbus_cmd_send' embed 'struct
> +kdbus_msg_info'. For the SEND ioctl, it is used to catch synchronous replies,
> +if one was requested, and is unused otherwise.
> +
> +struct kdbus_msg_info {
> +  __u64 offset;
> +    Upon return of the ioctl, this field contains the offset in the receiver's
> +    memory pool. The memory must be freed with KDBUS_CMD_FREE.
> +
> +  __u64 msg_size;
> +    Upon successful return of the ioctl, this field contains the size of the
> +    allocated slice at offset @offset. It is the combination of the size of
> +    the stored kdbus_msg object plus all appended VECs. You can use it in
> +    combination with @offset to map a single message, instead of mapping the
> +    whole pool.
> +
> +  __u64 return_flags;
> +    Kernel-provided return flags. Currently, the following flags are defined.
> +
> +      KDBUS_RECV_RETURN_INCOMPLETE_FDS
> +        The message contained file descriptors which couldn't be installed
> +        into the receiver's task. Most probably that happened because the
> +        maximum number of file descriptors for that task were exceeded.
> +        The message is still delivered, so this is not a fatal condition.
> +        File descriptors inside the KDBUS_ITEM_FDS item that could not be
> +        installed will be set to -1.
> +};
> +
> +Unless KDBUS_RECV_DROP was passed, and given that the ioctl succeeded, the

s/and given that the ioctl succeeded/after a successful KDBUS_CMD_RECV ioctl/

> +offset field contains the location of the new message inside the receiver's
> +pool. The message is stored as struct kdbus_msg at this offset, and can be
> +interpreted with the semantics described above.
> +
> +Also, if the connection allowed for file descriptor to be passed
> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.

###
"after"??? When exactly?

> +The receiving task is obliged to close all of them appropriately. If
> +KDBUS_RECV_PEEK is set, no file descriptors are installed. This allows for
> +peeking at a message and dropping it via KDBUS_RECV_DROP, without installing
> +the passed file descriptors into the receiving process.
> +
> +The caller is obliged to call KDBUS_CMD_FREE with the returned offset when
> +the memory is no longer needed.
> +
> +
> +8. Name registry
> +===============================================================================
> +
> +Each bus instantiates a name registry to resolve well-known names into unique
> +connection IDs for message delivery. The registry will be queried when a
> +message is sent with kdbus_msg.dst_id set to KDBUS_DST_ID_NAME, or when a
> +registry dump is requested.
> +
> +All of the below is subject to policy rules for SEE and OWN permissions.
> +
> +
> +8.1 Name validity
> +-----------------
> +
> +A name has to comply to the following rules to be considered valid:

s/comply to/comply with/

> +
> + - The name has two or more elements separated by a period ('.') character
> + - All elements must contain at least one character
> + - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_"
> +   and must not begin with a digit
> + - The name must contain at least one '.' (period) character
> +   (and thus at least two elements)
> + - The name must not begin with a '.' (period) character
> + - The name must not exceed KDBUS_NAME_MAX_LEN (255)
> +
> +
> +8.2 Acquiring a name
> +--------------------
> +
> +To acquire a name, a client uses the KDBUS_CMD_NAME_ACQUIRE ioctl with the
> +following data structure.
> +
> +struct kdbus_cmd_name {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 flags;
> +    Flags to control details in the name acquisition.
> +
> +    KDBUS_NAME_REPLACE_EXISTING
> +      Acquiring a name that is already present usually fails, unless this flag
> +      is set in the call, and KDBUS_NAME_ALLOW_REPLACEMENT or (see below) was
> +      set when the current owner of the name acquired it, or if the current
> +      owner is an activator connection (see below).
> +
> +    KDBUS_NAME_ALLOW_REPLACEMENT
> +      Allow other connections to take over this name. When this happens, the
> +      former owner of the connection will be notified of the name loss.
> +
> +    KDBUS_NAME_QUEUE (acquire)
> +      A name that is already acquired by a connection, and which wasn't
> +      requested with the KDBUS_NAME_ALLOW_REPLACEMENT flag set can not be
> +      acquired again. However, a connection can put itself in a queue of
> +      connections waiting for the name to be released. Once that happens, the
> +      first connection in that queue becomes the new owner and is notified
> +      accordingly.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
> +    expected and allowed, and the contained string must be a valid bus name.
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +8.3 Releasing a name
> +--------------------
> +
> +A connection may release a name explicitly with the KDBUS_CMD_NAME_RELEASE
> +ioctl. If the connection was an implementer of an activatable name, its
> +pending messages are moved back to the activator. If there are any connections
> +queued up as waiters for the name, the oldest one of them will become the new
> +owner. The same happens implicitly for all names once a connection terminates.
> +
> +The KDBUS_CMD_NAME_RELEASE ioctl uses the same data structure as the
> +acquisition call, but with slightly different field usage.
> +
> +struct kdbus_cmd_name {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 flags;
> +    Flags to the command. Currently unused.

And, so presumably must be 0?  Best to note that.

> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +  struct kdbus_item items[0];
> +    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
> +    expected and allowed, and the contained string must be a valid bus name.
> +};
> +
> +
> +8.4 Dumping the name registry
> +-----------------------------
> +
> +A connection may request a complete or filtered dump of currently active bus
> +names with the KDBUS_CMD_NAME_LIST ioctl, which takes a struct
> +kdbus_cmd_name_list as argument.
> +
> +struct kdbus_cmd_name_list {
> +  __u64 flags;
> +    Any combination of flags to specify which names should be dumped.
> +
> +    KDBUS_NAME_LIST_UNIQUE
> +      List the unique (numeric) IDs of the connection, whether it owns a name
> +      or not.
> +
> +    KDBUS_NAME_LIST_NAMES
> +      List well-known names stored in the database which are actively owned by
> +      a real connection (not an activator).
> +
> +    KDBUS_NAME_LIST_ACTIVATORS
> +      List names that are owned by an activator.
> +
> +    KDBUS_NAME_LIST_QUEUED
> +      List connections that are not yet owning a name but are waiting for it
> +      to become available.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 offset;
> +    When the ioctl returns successfully, the offset to the name registry dump
> +    inside the connection's pool will be stored in this field.
> +};
> +
> +The returned list of names is stored in a struct kdbus_name_list that in turn
> +contains a dynamic number of struct kdbus_cmd_name that carry the actual
> +information. The fields inside that struct kdbus_cmd_name is described next.
> +
> +struct kdbus_name_info {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 owner_id;
> +    The owning connection's unique ID.
> +
> +  __u64 conn_flags;
> +    The flags of the owning connection.
> +
> +  struct kdbus_item items[0];
> +    Items containing the actual name. Currently, one item of type
> +    KDBUS_ITEM_OWNED_NAME will be attached, including the name's flags. In that
> +    item, the flags field of the name may carry the following bits:
> +
> +    KDBUS_NAME_ALLOW_REPLACEMENT
> +      Other connections are allowed to take over this name from the
> +      connection that owns it.
> +
> +    KDBUS_NAME_IN_QUEUE (list)
> +      When retrieving a list of currently acquired name in the registry, this
> +      flag indicates whether the connection actually owns the name or is
> +      currently waiting for it to become available.
> +
> +    KDBUS_NAME_ACTIVATOR (list)
> +      An activator connection owns a name as a placeholder for an implementer,
> +      which is started on demand as soon as the first message arrives. There's
> +      some more information on this topic below. In contrast to
> +      KDBUS_NAME_REPLACE_EXISTING, when a name is taken over from an activator
> +      connection, all the messages that have been queued in the activator
> +      connection will be moved over to the new owner. The activator connection
> +      will still be tracked for the name and will take control again if the
> +      implementer connection terminates.
> +      This flag can not be used when acquiring a name, but is implicitly set
> +      through KDBUS_CMD_HELLO with KDBUS_HELLO_ACTIVATOR set in
> +      kdbus_cmd_hello.conn_flags.
> +};
> +
> +The returned buffer must be freed with the KDBUS_CMD_FREE ioctl when the user
> +is finished with it.
> +
> +
> +9. Notifications
> +===============================================================================
> +
> +The kernel will notify its users of the following events.
> +
> +  * When connection A is terminated while connection B is waiting for a reply
> +    from it, connection B is notified with a message with an item of type
> +    KDBUS_ITEM_REPLY_DEAD.
> +
> +  * When connection A does not receive a reply from connection B within the
> +    specified timeout window, connection A will receive a message with an item
> +    of type KDBUS_ITEM_REPLY_TIMEOUT.
> +
> +  * When an ordinary connection (not a monitor) is created on or removed from
> +    a bus, messages with an item of type KDBUS_ITEM_ID_ADD or
> +    KDBUS_ITEM_ID_REMOVE, respectively, are sent to all bus members that match
> +    these messages through their match database. Eavesdroppers (monitor
> +    connections) do not cause such notifications to be sent. They are invisible
> +    on the bus.
> +
> +  * When a connection gains or loses ownership of a name, messages with an item
> +    of type KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE or
> +    KDBUS_ITEM_NAME_CHANGE are sent to all bus members that match these
> +    messages through their match database.
> +
> +A kernel notification is a regular kdbus message with the following details.
> +
> +  * kdbus_msg.src_id == KDBUS_SRC_ID_KERNEL
> +  * kdbus_msg.dst_id == KDBUS_DST_ID_BROADCAST
> +  * kdbus_msg.payload_type == KDBUS_PAYLOAD_KERNEL
> +  * Has exactly one of the aforementioned items attached
> +
> +Kernel notifications have an item of type KDBUS_ITEM_TIMESTAMP attached.
> +
> +
> +10. Message Matching, Bloom filters
> +===============================================================================
> +
> +10.1 Matches for broadcast messages from other connections
> +----------------------------------------------------------
> +
> +A message addressed at the connection ID KDBUS_DST_ID_BROADCAST (~0ULL) is a
> +broadcast message, delivered to all connected peers which installed a rule to
> +match certain properties of the message. Without any rules installed in the
> +connection, no broadcast message or kernel-side notifications will be delivered
> +to the connection. Broadcast messages are subject to policy rules and TALK
> +access checks.
> +
> +See section 11 for details on policies, and section 11.5 for more
> +details on implicit policies.
> +
> +Matches for messages from other connections (not kernel notifications) are
> +implemented as bloom filters. The sender adds certain properties of the message
> +as elements to a bloom filter bit field, and sends that along with the
> +broadcast message.
> +
> +The connection adds the message properties it is interested as elements to a
> +bloom mask bit field, and uploads the mask to the match rules of the
> +connection.
> +
> +The kernel will match the broadcast message's bloom filter against the
> +connections bloom mask (simply by &-ing it), and decide whether the message
> +should be delivered to the connection.
> +
> +The kernel has no notion of any specific properties of the message, all it
> +sees are the bit fields of the bloom filter and mask to match against. The
> +use of bloom filters allows simple and efficient matching, without exposing
> +any message properties or internals to the kernel side. Clients need to deal
> +with the fact that they might receive broadcasts which they did not subscribe
> +to, as the bloom filter might allow false-positives to pass the filter.
> +
> +To allow the future extension of the set of elements in the bloom filter, the
> +filter specifies a "generation" number. A later generation must always contain
> +all elements of the set of the previous generation, but can add new elements
> +to the set. The match rules mask can carry an array with all previous
> +generations of masks individually stored. When the filter and mask are matched
> +by the kernel, the mask with the closest matching "generation" is selected
> +as the index into the mask array.
> +
> +
> +10.2 Matches for kernel notifications
> +------------------------------------
> +
> +To receive kernel generated notifications (see section 9), a connection must
> +install special match rules that are different from the bloom filter matches
> +described in the section above. They can be filtered by a sender connection's
> +ID, by one of the name the sender connection owns at the time of sending the
> +message, or by type of the notification (id/name add/remove/change).

s/type/the type/

> +
> +10.3 Adding a match
> +-------------------
> +
> +To add a match, the KDBUS_CMD_MATCH_ADD ioctl is used, which takes a struct
> +of the struct described below.
> +
> +Note that each of the items attached to this command will internally create
> +one match 'rule', and the collection of them, which is submitted as one block
> +via the ioctl is called a 'match'. To allow a message to pass, all rules of a

s/ioctl/ioctl,/

> +match have to be satisfied. Hence, adding more items to the command will only
> +narrow the possibility of a match to effectively let the message pass, and will
> +cause the connection's user space process to wake up less likely.

Make that last line

decrease the chance that the connection's userspace process wil be woken up

> +
> +Multiple matches can be installed per connection. As long as one of it has a

s/it/them/ ?
(If that change is not correct, then the sentence is quite confused.)

> +set of rules which allows the message to pass, this one will be decisive.
> +
> +struct kdbus_cmd_match {
> +  __u64 size;
> +    The overall size of the struct, including its items.
> +
> +  __u64 cookie;
> +    A cookie which identifies the match, so it can be referred to at removal
> +    time.
> +
> +  __u64 flags;
> +    Flags to control the behavior of the ioctl.
> +
> +    KDBUS_MATCH_REPLACE:
> +      Remove all entries with the given cookie before installing the new one.
> +      This allows for race-free replacement of matches.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to define the actual rules of the matches. The following item types
> +    are expected. Each item will cause one new match rule to be created.
> +
> +    KDBUS_ITEM_BLOOM_MASK
> +      An item that carries the bloom filter mask to match against in its
> +      data field. The payload size must match the bloom filter size that
> +      was specified when the bus was created.
> +      See section 10.4 for more information.
> +
> +    KDBUS_ITEM_NAME
> +      Specify a name that a sending connection must own at a time of sending

s/a time/the time/

> +      a broadcast message in order to match this rule.
> +
> +    KDBUS_ITEM_ID
> +      Specify a sender connection's ID that will match this rule.
> +
> +    KDBUS_ITEM_NAME_ADD
> +    KDBUS_ITEM_NAME_REMOVE
> +    KDBUS_ITEM_NAME_CHANGE
> +      These items request delivery of broadcast messages that describe a name
> +      acquisition, loss, or change. The details are stored in the item's
> +      kdbus_notify_name_change member. All information specified must be
> +      matched in order to make the message pass. Use KDBUS_MATCH_ID_ANY to
> +      match against any unique connection ID.
> +
> +    KDBUS_ITEM_ID_ADD
> +    KDBUS_ITEM_ID_REMOVE
> +      These items request delivery of broadcast messages that are generated
> +      when a connection is created or terminated. struct kdbus_notify_id_change
> +      is used to store the actual match information. This item can be used to
> +      monitor one particular connection ID, or, when the id field is set to
> +      KDBUS_MATCH_ID_ANY, all of them.
> +
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +10.4 Bloom filters
> +------------------
> +
> +Bloom filters allow checking whether a given word is present in a dictionary.
> +This allows connections to set up a mask for information it is interested in,
> +and will be delivered signal messages that have a matching filter.
> +
> +For general information on bloom filters, see
> +
> +  https://en.wikipedia.org/wiki/Bloom_filter
> +
> +The size of the bloom filter is defined per bus when it is created, in
> +kdbus_bloom_parameter.size. All bloom filters attached to signals on the bus
> +must match this size, and all bloom filter matches uploaded by connections must
> +also match the size, or a multiple thereof (see below).
> +
> +The calculation of the mask has to be done on the userspace side. The kernel
> +just checks the bitmasks to decide whether or not to let the message pass. All
> +bits in the mask must match the filter in and bit-wise AND logic, but the

Parse error at "in and bit-wise AND logic". I am not sure what you are meaning 
there, buth something needs fixing.

> +mask may have more bits set than the filter. Consequently, false positive
> +matches are expected to happen, and userspace must deal with that fact.
> +
> +Masks are entities that are always passed to the kernel as part of a match
> +(with an item of type KDBUS_ITEM_BLOOM_MASK), and filters can be attached to
> +signals, with an item of type KDBUS_ITEM_BLOOM_FILTER.
> +
> +For a filter to match, all its bits have to be set in the match mask as well.
> +For example, consider a bus has a bloom size of 8 bytes, and the following

s/has/that has/

> +mask/filter combinations:
> +
> +    filter  0x0101010101010101
> +    mask    0x0101010101010101
> +            -> matches
> +
> +    filter  0x0303030303030303
> +    mask    0x0101010101010101
> +            -> doesn't match
> +
> +    filter  0x0101010101010101
> +    mask    0x0303030303030303
> +            -> matches
> +
> +Hence, in order to catch all messages, a mask filled with 0xff bytes can be
> +installed as a wildcard match rule.
> +
> +Uploaded matches may contain multiple masks, each of which in the size of the

Parse error at "each of which in the size of"
s/in/is/?

> +bloom size defined by the bus. Each block of a mask is called a 'generation',
> +starting at index 0.
> +
> +At match time, when a signal is about to be delivered, a bloom mask generation
> +is passed, which denotes which of the bloom masks the filter should be matched
> +against. This allows userspace to provide backward compatible masks at upload
> +time, while older clients can still match against older versions of filters.
> +
> +
> +10.5 Removing a match
> +--------------------
> +
> +Matches can be removed through the KDBUS_CMD_MATCH_REMOVE ioctl, which again
> +takes struct kdbus_cmd_match as argument, but its fields are used slightly
> +differently.
> +
> +struct kdbus_cmd_match {
> +  __u64 size;
> +    The overall size of the struct. As it has no items in this use case, the
> +    value should yield 16.

s/yield/contain/ ?
> +
> +  __u64 cookie;
> +    The cookie of the match, as it was passed when the match was added.
> +    All matches that have this cookie will be removed.
> +
> +  __u64 flags;
> +    Unused for this use case,
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Unused und not allowed for this use case.
> +};
> +
> +
> +11. Policy
> +===============================================================================
> +
> +A policy databases restrict the possibilities of connections to own, see and
> +talk to well-known names. It can be associated with a bus (through a policy

s/It/A policy/

> +holder connection) or a custom endpoint.
> +
> +See section 8.1 for more details on the validity of well-known names.
> +
> +Default endpoints of buses always have a policy database. The default
> +policy is to deny all operations except for operations that are covered by
> +implicit policies. Custom endpoints always have a policy, and by default,
> +a policy database is empty. Therefore, unless policy rules are added, all

s/a policy database/the policy database/

> +operations will also be denied by default.
> +
> +See section 11.5 for more details on implicit policies.
> +
> +A set of policy rules is described by a name and multiple access rules, defined
> +by the following struct.
> +
> +struct kdbus_policy_access {
> +  __u64 type;	/* USER, GROUP, WORLD */
> +    One of the following.
> +
> +    KDBUS_POLICY_ACCESS_USER
> +      Grant access to a user with the uid stored in the 'id' field.
> +
> +    KDBUS_POLICY_ACCESS_GROUP
> +      Grant access to a user with the gid stored in the 'id' field.
> +
> +    KDBUS_POLICY_ACCESS_WORLD
> +      Grant access to everyone. The 'id' field is ignored.
> +
> +  __u64 access;	/* OWN, TALK, SEE */
> +    The access to grant.
> +
> +    KDBUS_POLICY_SEE
> +      Allow the name to be seen.
> +
> +    KDBUS_POLICY_TALK
> +      Allow the name to be talked to.
> +
> +    KDBUS_POLICY_OWN
> +      Allow the name to be owned.
> +
> +  __u64 id;
> +    For KDBUS_POLICY_ACCESS_USER, stores the uid.
> +    For KDBUS_POLICY_ACCESS_GROUP, stores the gid.
> +};
> +
> +Policies are set through KDBUS_CMD_HELLO (when creating a policy holder
> +connection), KDBUS_CMD_CONN_UPDATE (when updating a policy holder connection),
> +KDBUS_CMD_ENDPOINT_MAKE (creating a custom endpoint) or
> +KDBUS_CMD_ENDPOINT_UPDATE (updating a custom endpoint). In all cases, the name
> +and policy access information is stored in items of type KDBUS_ITEM_NAME and
> +KDBUS_ITEM_POLICY_ACCESS. For this transport, the following rules apply.
> +
> +  * An item of type KDBUS_ITEM_NAME must be followed by at least one
> +    KDBUS_ITEM_POLICY_ACCESS item
> +  * An item of type KDBUS_ITEM_NAME can be followed by an arbitrary number of
> +    KDBUS_ITEM_POLICY_ACCESS items
> +  * An arbitrary number of groups of names and access levels can be passed
> +
> +uids and gids are internally always stored in the kernel's view of global ids,
> +and are translated back and forth on the ioctl level accordingly.
> +
> +
> +11.2 Wildcard names
> +-------------------
> +
> +Policy holder connections may upload names that contain the wild card suffix

s/wild card/wildcard/

> +(".*"). That way, a policy can be uploaded that is effective for every
> +well-known name that extends the provided name by exactly one more level.
> +
> +For example, if an item of a set up uploaded policy rules contains the name

I find that last line very difficult to parse. Someehting is broken. 
What are you trying to say?

> +"foo.bar.*", both "foo.bar.baz" and "foo.bar.bazbaz" are valid, but

s/both/then both/ to help parsing.

> +"foo.bar.baz.baz" is not.
> +
> +This allows connections to take control over multiple names that the policy
> +holder doesn't need to know about when uploading the policy.
> +
> +Such wildcard entries are not allowed for custom endpoints.
> +
> +
> +11.3 Policy example
> +-------------------
> +
> +For example, a set of policy rules may look like this:
> +
> +  KDBUS_ITEM_NAME: str='org.foo.bar'
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=1000
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, id=1001
> +  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
> +  KDBUS_ITEM_NAME: str='org.blah.baz'
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=0
> +  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
> +
> +That means that 'org.foo.bar' may only be owned by uid 1000, but every user on
> +the bus is allowed to see the name. However, only uid 1001 may actually send
> +a message to the connection and receive a reply from it.
> +
> +The second rule allows 'org.blah.baz' to be owned by uid 0 only, but every user
> +may talk to it.
> +
> +
> +11.4 TALK access and multiple well-known names per connection
> +-------------------------------------------------------------
> +
> +Note that TALK access is checked against all names of a connection.
> +For example, if a connection owns both 'org.foo.bar' and 'org.blah.baz', and
> +the policy database allows 'org.blah.baz' to be talked to by WORLD, then this
> +permission is also granted to 'org.foo.bar'. That might sound illogical, but
> +after all, we allow messages to be directed to either the ID or a well-known
> +name, and policy is applied to the connection, not the name. In other words,
> +the effective TALK policy for a connection is the most permissive of all names
> +the connection owns.
> +
> +For broadcast messages, the receiver needs TALK permissions to the sender to
> +receive the broadcast.
> +
> +If a policy database exists for a bus (because a policy holder created one on
> +demand) or for a custom endpoint (which always has one), each one is consulted

By "created one on demand" do you mean "explicitly created one"? If so
please use the latter wording, since its clearer.

> +during name registry listing, name owning or message delivery. If either one

What is "name owning". I think this could be worded better.

> +fails, the operation is failed with -EPERM.
> +
> +For best practices, connections that own names with a restricted TALK
> +access should not install matches. This avoids cases where the sent
> +message may pass the bloom filter due to false-positives and may also
> +satisfy the policy rules.
> +
> +
> +11.5 Implicit policies
> +----------------------
> +
> +Depending on the type of the endpoint, a set of implicit rules that
> +override installed policies might be enforced.
> +
> +On default endpoints, the following set is enforced and checked before
> +any user-supplied policy is checked.
> +
> +  * Privileged connections always override any installed policy. Those
> +    connections could easily install their own policies, so there is no
> +    reason to enforce installed policies.
> +  * Connections can always talk to connections of the same user. This
> +    includes broadcast messages.
> +
> +Custom endpoints have stricter policies. The following rules apply:
> +
> +  * Policy rules are always enforced, even if the connection is a privileged
> +    connection.
> +  * Policy rules are always enforced for TALK access, even if both ends are
> +    running under the same user. This includes broadcast messages.
> +  * To restrict the set of names that can be seen, endpoint policies can
> +    install "SEE" policies.
> +
> +
> +12. Pool
> +===============================================================================
> +
> +A pool for data received from the kernel is installed for every connection of
> +the bus, and is sized according to the information stored in the
> +KDBUS_ITEM_BLOOM_PARAMETER item that is returned by KDBUS_CMD_HELLO.
> +
> +The pool is written to by the kernel when one of the following ioctls is issued:
> +
> +  * KDBUS_CMD_HELLO, to receive details about the bus the connection was made to
> +  * KDBUS_CMD_RECV, to receive a message
> +  * KDBUS_CMD_NAME_LIST, to dump the name registry
> +  * KDBUS_CMD_CONN_INFO, to retrieve information on a connection
> +
> +The offsets returned by either one of the aforementioned ioctls describe offsets
> +inside the pool. In order to make the slice available for subsequent calls,
> +KDBUS_CMD_FREE has to be called on the offset.
> +
> +To access the memory, the caller is expected to mmap() it to its task, like

s/to its task//

> +this:
> +
> +  /*
> +   * POOL_SIZE has to be a multiple of PAGE_SIZE, and it must match the
> +   * value that was previously returned through the KDBUS_ITEM_BLOOM_PARAMETER
> +   * item field when the KDBUS_CMD_HELLO ioctl returned.
> +   */
> +
> +  buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, conn_fd, 0);
> +
> +Alternatively, instead of mapping the entire pool buffer, only parts of it can
> +be mapped. The length of the response is returned by the kernel along with the
> +offset for each of the ioctls listed above.
> +
> +
> +13. Metadata
> +===============================================================================
> +
> +kdbus records data about the system in certain situations. Such metadata can
> +refer to the currently active process (creds, PIDs, current user groups, process
> +names and its executable path, cgroup membership, capabilities, security label
> +and audit information), connection information (description string, currently
> +owned names) and the timestamp.
> +
> +Metadata is collected in the following circumstances:
> +
> +  * When a bus is created (KDBUS_CMD_MAKE), information about the calling task
> +    is collected. This data is returned by the kernel via the
> +    KDBUS_CMD_BUS_CREATOR_INFO call-

s/-/./

> +
> +  * When a connection is created (KDBUS_CMD_HELLO), information about the
> +    calling task is collected. Alternatively, a privileged connection may
> +    provide 'faked' information about credentials, PIDs and a security labels

s/a security/security/

> +    which will be taken instead. This data is returned by the kernel as
> +    information on a connection (KDBUS_CMD_CONN_INFO).
> +
> +  * When a message is sent (KDBUS_CMD_SEND), information about the sending task
> +    and the sending connection are collected. This metadata will be attached
> +    to the message when it arrives in the receiver's pool. If the connection
> +    sending the message installed faked credentials (see above), the message
> +    will not be augmented by any information about the currently sending task.
> +
> +Which metadata items are actually delivered depends on the following sets and
> +masks:
> +
> +    (a) the system-wide kmod creds mask (module parameter 'attach_flags_mask')
> +    (b) the per-connection send creds mask, set by the connecting client
> +    (c) the per-connection receive creds mask, set by the connecting client
> +    (d) the per-bus minimal creds mask, set by the bus creator
> +    (e) the per-bus owner creds mask, set by the bus creator
> +    (f) the mask specified when querying creds of a bus peer
> +    (g) the mask specified when querying creds of a bus owner
> +
> +With the following rules:
> +
> +    [1] The creds attached to messages are determined as (a & b & c).
> +    [2] When connecting to a bus (KDBUS_CMD_HELLO), and (~b & d) != 0, the call
> +        will fail, the connection is refused.
> +    [3] When querying creds of a bus peer, the creds returned are  (a & b & f)
> +    [4] When querying creds of a bus owner, the creds returned are (a & e & g)
> +    [5] When creating a new bus, and (d & ~a) != 0, then bus creation will fail
> +
> +Hence, userspace might not always get all requested metadata items that it
> +requested. Code must be written so that it can cope with this fact.
> +
> +
> +13.1 Known item types
> +---------------------
> +
> +The following attach flags are currently supported.
> +
> +  KDBUS_ATTACH_TIMESTAMP
> +    Attaches an item of type KDBUS_ITEM_TIMESTAMP which contains both the
> +    monotonic and the realtime timestamp, taken when the message was
> +    processed on the kernel side.
> +
> +  KDBUS_ATTACH_CREDS
> +    Attaches an item of type KDBUS_ITEM_CREDS, containing credentials as
> +    described in struct kdbus_creds: the user and group IDs in the usual four
> +    flavors: real, effective, saved and file-system related.
> +
> +  KDBUS_ATTACH_PIDS
> +    Attaches an item of type KDBUS_ITEM_PIDS, containing information on the
> +    process. In particular, the PID (process ID), TID (thread ID), and PPID
> +    (PID of the parent process).
> +
> +  KDBUS_ATTACH_AUXGROUPS
> +    Attaches an item of type KDBUS_ITEM_AUXGROUPS, containing a dynamic
> +    number of auxiliary groups the sending task was a member of.
> +
> +  KDBUS_ATTACH_NAMES
> +    Attaches items of type KDBUS_ITEM_OWNED_NAME, one for each name the sending
> +    connection currently owns. The name and flags are stored in kdbus_item.name
> +    for each of them.
> +
> +  KDBUS_ATTACH_TID_COMM [*]
> +    Attaches an items of type KDBUS_ITEM_TID_COMM, transporting the sending
> +    task's 'comm', for the tid.  The string is stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_PID_COMM [*]
> +    Attaches an items of type KDBUS_ITEM_PID_COMM, transporting the sending
> +    task's 'comm', for the pid.  The string is stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_EXE [*]
> +    Attaches an item of type KDBUS_ITEM_EXE, containing the path to the
> +    executable of the sending task, stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_CMDLINE [*]
> +    Attaches an item of type KDBUS_ITEM_CMDLINE, containing the command line
> +    arguments of the sending task, as an array of strings, stored in
> +    kdbus_item.str.
> +
> +  KDBUS_ATTACH_CGROUP
> +    Attaches an item of type KDBUS_ITEM_CGROUP with the task's cgroup path.
> +
> +  KDBUS_ATTACH_CAPS
> +    Attaches an item of type KDBUS_ITEM_CAPS, carrying sets of capabilities
> +    that should be accessed via kdbus_item.caps.caps. Also, userspace should
> +    be written in a way that it takes kdbus_item.caps.last_cap into account,
> +    and derive the number of sets and rows from the item size and the reported
> +    number of valid capability bits.
> +
> +  KDBUS_ATTACH_SECLABEL
> +    Attaches an item of type KDBUS_ITEM_SECLABEL, which contains the SELinux
> +    security label of the sending task. SELinux and other MACs might want to
> +    do additional per-service security checks. For example, a service manager
> +    might want to check the security label of a service file against the
> +    security label of the client process checking the SELinux database before
> +    allowing access.  The label can be accessed via kdbus_item->str.
> +
> +  KDBUS_ATTACH_AUDIT
> +    Attaches an item of type KDBUS_ITEM_AUDIT, which contains the audit
> +    sessionid and loginuid of the sending task. Access via kdbus_item->audit.
> +
> +  KDBUS_ATTACH_CONN_DESCRIPTION
> +    Attaches an item of type KDBUS_ITEM_CONN_DESCRIPTION that contains a
> +    descriptive string of the sending peer. That string can be supplied
> +    during HELLO by attaching an item of type KDBUS_ITEM_CONN_DESCRIPTION.
> +
> +
> +[*] Note that the content stored in these items can easily be tampered by
> +    the sending tasks. Therefore, they should NOT be used for any sort of
> +    security relevant assumptions. The only reason why they are transmitted is
> +    to let receivers know about details that were set when metadata was
> +    collected, even though the task they were collected from is not active any
> +    longer when the items are received.
> +
> +
> +13.2 Metadata and namespaces
> +----------------------------
> +
> +Metadata such as PIDs, UIDs or GIDs are automatically translated to the
> +namespaces of the task that receives them.
> +
> +
> +14. Error codes
> +===============================================================================
> +
> +Below is a list of error codes that might be returned by the individual
> +ioctl commands. The list focuses on the return values from kdbus code itself,
> +and might not cover those of all kernel internal functions.
> +
> +For all ioctls:
> +
> +  -ENOMEM	The kernel memory is exhausted
> +  -ENOTTY	Illegal ioctl command issued for the file descriptor

Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
explanation in this document.)

> +  -ENOSYS	The requested functionality is not available

Maybe add here an explanation or examples of why the functionality is 
not available?

> +  -EINVAL	Unsupported item attached to command
> +
> +For all ioctls that carry a struct as payload:
> +
> +  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
> +		inaccessible from the kernel side.
> +  -EINVAL	The size inside the supplied struct was smaller than expected
> +  -EMSGSIZE	The size inside the supplied struct was bigger than expected

Why two different errors for smaller and larger than expected? (If you keep things this
way, pelase explain the reason in this document.)

> +  -ENAMETOOLONG	A supplied name is larger than the allowed maximum size
> +
> +For KDBUS_CMD_BUS_MAKE:
> +
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid or
> +		the supplied name does not start with the current uid and a '-'
> +  -EEXIST	A bus of that name already exists
> +  -ESHUTDOWN	The domain for the bus is already shut down
> +  -EMFILE	The maximum number of buses for the current user is exhausted
> +
> +For KDBUS_CMD_ENDPOINT_MAKE:
> +
> +  -EPERM	The calling user is not privileged (see Terminology)
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid
> +  -EEXIST	An endpoint of that name already exists
> +
> +For KDBUS_CMD_HELLO:
> +
> +  -EFAULT	The supplied pool size was 0 or not a multiple of the page size
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid, or
> +		an illegal combination of KDBUS_HELLO_MONITOR,
> +		KDBUS_HELLO_ACTIVATOR and KDBUS_HELLO_POLICY_HOLDER was passed
> +		in the flags, or an invalid set of items was supplied
> +  -ECONNREFUSED	The attach_flags_send field did not satisfy the requirements of
> +		the bus
> +  -EPERM	An KDBUS_ITEM_CREDS items was supplied, but the current user is
> +		not privileged
> +  -ESHUTDOWN	The bus has already been shut down
> +  -EMFILE	The maximum number of connection on the bus has been reached

s/connection/connections/

> +  -EOPNOTSUPP	The endpoint does not support the connection flags
> +		supplied in the kdbus_cmd_hello struct
> +
> +For KDBUS_CMD_BYEBYE:
> +
> +  -EALREADY	The connection has already been shut down
> +  -EBUSY	There are still messages queued up in the connection's pool
> +
> +For KDBUS_CMD_SEND:
> +
> +  -EOPNOTSUPP	The connection is not an ordinary connection, or the passed
> +		file descriptors are either kdbus handles or unix domain
> +		sockets. Both are currently unsupported
> +  -EINVAL	The submitted payload type is KDBUS_PAYLOAD_KERNEL,
> +		KDBUS_MSG_EXPECT_REPLY was set without timeout or cookie
> +		values, KDBUS_MSG_SYNC_REPLY was set without
> +		KDBUS_MSG_EXPECT_REPLY, an invalid item was supplied,
> +		src_id was != 0 and different from the current connection's ID,

s/src_id was != 0 and different/src_id was nonzero and was different/

> +		a supplied memfd had a size of 0, a string was not properly
> +		null-terminated
> +  -ENOTUNIQ	The supplied destination is KDBUS_DST_ID_BROADCAST, a file
> +		descriptor was passed, KDBUS_MSG_EXPECT_REPLY was set,
> +		or a timeout was given for a broadcast message
> +  -E2BIG	Too many items
> +  -EMSGSIZE	The size of the message header and items or the payload vector
> +		is too big.
> +  -EEXIST	Multiple KDBUS_ITEM_FDS, KDBUS_ITEM_BLOOM_FILTER or
> +		KDBUS_ITEM_DST_NAME items were supplied
> +  -EBADF	The supplied KDBUS_ITEM_FDS or KDBUS_MSG_PAYLOAD_MEMFD items
> +		contained an illegal file descriptor
> +  -EMEDIUMTYPE	The supplied memfd is not a sealed kdbus memfd
> +  -EMFILE	Too many file descriptors inside a KDBUS_ITEM_FDS
> +  -EBADMSG	An item had illegal size, both a dst_id and a
> +		KDBUS_ITEM_DST_NAME was given, or both a name and a bloom
> +		filter was given
> +  -ETXTBSY	The supplied kdbus memfd file cannot be sealed or the seal
> +		was removed, because it is shared with other processes or
> +		still mmap()ed
> +  -ECOMM	A peer does not accept the file descriptors addressed to it
> +  -EFAULT	The supplied bloom filter size was not 64-bit aligned
> +  -EDOM		The supplied bloom filter size did not match the bloom filter
> +		size of the bus
> +  -EDESTADDRREQ	dst_id was set to KDBUS_DST_ID_NAME, but no KDBUS_ITEM_DST_NAME
> +		was attached
> +  -ESRCH	The name to look up was not found in the name registry
> +  -EADDRNOTAVAIL KDBUS_MSG_NO_AUTO_START was given but the destination
> +		 connection is an activator.
> +  -ENXIO	The passed numeric destination connection ID couldn't be found,
> +		or is not connected
> +  -ECONNRESET	The destination connection is no longer active
> +  -ETIMEDOUT	Timeout while synchronously waiting for a reply
> +  -EINTR	System call interrupted while synchronously waiting for a reply
> +  -EPIPE	When sending a message, a synchronous reply from the receiving
> +		connection was expected but the connection died before
> +		answering
> +  -ENOBUFS	Too many pending messages on the receiver side
> +  -EREMCHG	Both a well-known name and a unique name (ID) was given, but
> +		the name is not currently owned by that connection.
> +  -EXFULL	The memory pool of the receiver is full
> +  -EREMOTEIO	While synchronously waiting for a reply, the remote peer
> +		failed with an I/O error.
> +
> +For KDBUS_CMD_RECV:
> +
> +  -EINVAL	Invalid flags or offset
> +  -EAGAIN	No message found in the queue
> +  -ENOMSG	No message of the requested priority found
> +  -EOVERFLOW	Broadcast messages have been lost
> +
> +For KDBUS_CMD_FREE:
> +
> +  -ENXIO	No pool slice found at given offset
> +  -EINVAL	Invalid flags provided, the offset is valid, but the user is
> +		not allowed to free the slice. This happens, for example, if
> +		the offset was retrieved with KDBUS_RECV_PEEK.

It would be easier to read if this was written to clarify that there are 
two distinct error cases:

  -EINVAL	Invalid flags provided.
  -EINVAL       The offset is valid, but the user is
		not allowed to free the slice. This happens, for example, if
		the offset was retrieved with KDBUS_RECV_PEEK.

> +For KDBUS_CMD_NAME_ACQUIRE:
> +
> +  -EINVAL	Illegal command flags, illegal name provided, or an activator
> +		tried to acquire a second name
> +  -EPERM	Policy prohibited name ownership
> +  -EALREADY	Connection already owns that name
> +  -EEXIST	The name already exists and can not be taken over
> +  -E2BIG	The maximum number of well-known names per connection
> +		is exhausted
> +  -ECONNRESET	The connection was reset during the call
> +
> +For KDBUS_CMD_NAME_RELEASE:
> +
> +  -EINVAL	Invalid command flags, or invalid name provided
> +  -ESRCH	Name is not found found in the registry
> +  -EADDRINUSE	Name is owned by a different connection and can't be released
> +
> +For KDBUS_CMD_NAME_LIST:
> +
> +  -EINVAL	Invalid flags
> +  -ENOBUFS	No available memory in the connection's pool.
> +
> +For KDBUS_CMD_CONN_INFO:
> +
> +  -EINVAL	Invalid flags, or neither an ID nor a name was provided,
> +		or the name is invalid.
> +  -ESRCH	Connection lookup by name failed
> +  -ENXIO	No connection with the provided connection ID found
> +
> +For KDBUS_CMD_CONN_UPDATE:
> +
> +  -EINVAL	Illegal flags or items
> +  -EOPNOTSUPP	Operation not supported by connection.
> +  -E2BIG	Too many policy items attached
> +  -EINVAL	Wildcards submitted in policy entries, or illegal sequence
> +		of policy items
> +
> +For KDBUS_CMD_ENDPOINT_UPDATE:
> +
> +  -E2BIG	Too many policy items attached
> +  -EINVAL	Invalid flags, or wildcards submitted in policy entries,
> +		or illegal sequence of policy items
> +
> +For KDBUS_CMD_MATCH_ADD:
> +
> +  -EINVAL	Illegal flags or items
> +  -EDOM		Illegal bloom filter size
> +  -EMFILE	Too many matches for this connection
> +
> +For KDBUS_CMD_MATCH_REMOVE:
> +
> +  -EINVAL	Illegal flags
> +  -ENOENT	A match entry with the given cookie could not be found.
> +
> +
> +15. Internal object relations
> +===============================================================================
> +
> +This is a simplified outline of the internal kdbus object relations, for
> +those interested in the inner life of the driver implementation.
> +
> +From the a mount point's (domain's) perspective:
> +
> +struct kdbus_domain
> +  |» struct kdbus_domain_user *user (many, owned)
> +  '» struct kdbus_node node (embedded)
> +      |» struct kdbus_node children (many, referenced)
> +      |» struct kdbus_node *parent (pinned)
> +      '» struct kdbus_bus (many, pinned)
> +          |» struct kdbus_node node (embedded)
> +          '» struct kdbus_ep (many, pinned)
> +              |» struct kdbus_node node (embedded)
> +              |» struct kdbus_bus *bus (pinned)
> +              |» struct kdbus_conn conn_list (many, pinned)
> +              |   |» struct kdbus_ep *ep (pinned)
> +              |   |» struct kdbus_name_entry *activator_of (owned)
> +              |   |» struct kdbus_match_db *match_db (owned)
> +              |   |» struct kdbus_meta *meta (owned)
> +              |   |» struct kdbus_match_db *match_db (owned)
> +              |   |    '» struct kdbus_match_entry (many, owned)
> +              |   |
> +              |   |» struct kdbus_pool *pool (owned)
> +              |   |    '» struct kdbus_pool_slice *slices (many, owned)
> +              |   |       '» struct kdbus_pool *pool (pinned)
> +              |   |
> +              |   |» struct kdbus_domain_user *user (pinned)
> +              |   `» struct kdbus_queue_entry entries (many, embedded)
> +              |        |» struct kdbus_pool_slice *slice (pinned)
> +              |        |» struct kdbus_conn_reply *reply (owned)
> +              |        '» struct kdbus_domain_user *user (pinned)
> +              |
> +              '» struct kdbus_domain_user *user (pinned)
> +                  '» struct kdbus_policy_db policy_db (embedded)
> +                       |» struct kdbus_policy_db_entry (many, owned)
> +                       |   |» struct kdbus_conn (pinned)
> +                       |   '» struct kdbus_ep (pinned)
> +                       |
> +                       '» struct kdbus_policy_db_cache_entry (many, owned)
> +                           '» struct kdbus_conn (pinned)
> +
> +
> +For the life-time of a file descriptor derived from calling open() on a file
> +inside the mount point:
> +
> +struct kdbus_handle
> +  |» struct kdbus_meta *meta (owned)
> +  |» struct kdbus_ep *ep (pinned)
> +  |» struct kdbus_conn *conn (owned)
> +  '» struct kdbus_ep *ep (owned)

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 13:58     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 13:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Daniel Mack

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> documentation about the kernel level API design.

And now the details feedback.

Please note that for the various points I raise below, even in
cases where I don't suggest/request a fix, the fact that I've
needed to answer a question probably suggests a deficiency in 
the documentation that probably needs to be remedied.

Many of my comments below are wording and typo fixes. While
these may sem trivial, the existence of various wording problems
and typos is a significant distraction, especially while trying
to grok a document of this size.

I've marked one or two notable questions about the API with "###".

> Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> Signed-off-by: David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> ---
>  Documentation/kdbus.txt | 2107 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 2107 insertions(+)
>  create mode 100644 Documentation/kdbus.txt
> 
> diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
> new file mode 100644
> index 000000000000..2592a7e37079
> --- /dev/null
> +++ b/Documentation/kdbus.txt
> @@ -0,0 +1,2107 @@
> +D-Bus is a system for powerful, easy to use interprocess communication (IPC).
> +
> +The focus of this document is an overview of the low-level, native kernel D-Bus
> +transport called kdbus. Kdbus exposes its functionality via files in a
> +filesystem called 'kdbusfs'. All communication between processes takes place
> +via ioctls on files exposed through the mount point of a kdbusfs. The default
> +mount point of kdbusfs is /sys/fs/kdbus.
> +
> +Please note that kdbus was designed as transport layer for D-Bus, but is in no
> +way limited, nor controlled by the D-Bus protocol specification. The D-Bus
> +protocol is one possible application layer on top of kdbus.
> +
> +For the general D-Bus protocol specification, the payload format, the
> +marshaling, and the communication semantics, please refer to:
> +  http://dbus.freedesktop.org/doc/dbus-specification.html
> +
> +For a kdbus specific userspace library implementation please refer to:
> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
> +
> +Articles about D-Bus and kdbus:
> +  http://lwn.net/Articles/580194/
> +
> +
> +1. Terminology
> +===============================================================================
> +
> +  Domain:
> +    A domain is created each time a kdbusfs is mounted. Each process that is
> +    capable to mount a new instance of a kdbusfs will have its own kdbus

s/is capable to mount/mounts/

> +    hierarchy. Each domain (ie, each mount point) offers its own "control"
> +    file to create new buses. Domains have no connection to each other and
> +    cannot see nor talk to each other. See section 5 for more details.

Smoother would be:

    s/cannot see nor talk/can neither see nor talk/

> +
> +  Bus:
> +    A bus is a named object inside a domain. Clients exchange messages
> +    over a bus. Multiple buses themselves have no connection to each other;
> +    messages can only be exchanged on the same bus. The default endpoint of
> +    a bus, where clients establish the connection to, is the "bus" file

Maybe:

where clients establish the connection to
==>
to which clients establish connections
?

> +    /sys/fs/kdbus/<bus name>/bus.
> +    Common operating system setups create one "system bus" per system, and one
> +    "user bus" for every logged-in user. Applications or services may create

At the kdbus level is there any difference between such system and user buses?
If not, it would perhaps be good to insert a parenthetical aside to say 
that.

> +    their own private buses.  See section 5 for more details.
> +
> +  Endpoint:
> +    An endpoint provides a file to talk to a bus. Opening an endpoint
> +    creates a new connection to the bus to which the endpoint belongs. All
> +    endpoints have unique names and are accessible as files underneath the
> +    directory of a bus, e.g., /sys/fs/kdbus/<bus>/<endpoint>
> +    Every bus has a default endpoint called "bus". A bus can optionally offer
> +    additional endpoints with custom names to provide restricted access to the
> +    bus. Custom endpoints carry additional policy which can be used to create
> +    sandboxes with locked-down, limited, filtered access to a bus.  See
> +    section 5 for more details.
> +
> +  Connection:
> +    A connection to a bus is created by opening an endpoint file of a bus and
> +    becoming an active client with the HELLO exchange. Every ordinary client
> +    connection has a unique identifier on the bus and can address messages to
> +    every other connection on the same bus by using the peer's connection id
> +    as the destination.  See section 6 for more details.
> +
> +  Pool:
> +    Each connection allocates a piece of shmem-backed memory that is used
> +    to receive messages and answers to ioctl commands from the kernel. It is
> +    never used to send anything to the kernel. In order to access that memory,
> +    userspace must mmap() it into its address space.

s/userspace/a userspace application/

> +    See section 12 for more details.
> +
> +  Well-known Name:
> +    A connection can, in addition to its implicit unique connection id, request
> +    the ownership of a textual well-known name. Well-known names are noted in
> +    reverse-domain notation, such as com.example.service1. Connections offering
> +    a service on a bus are usually reached by its well-known name. The analogy

Noun/pronoun number disagreement at "Connections... its".
==>
    A connection that offers a service on a bus is usually reached by its 
    well-known name.

> +    of connection id and well-known name is an IP address and a DNS name

Doing s/id/ID/ throughout the doc would help readability. (The doc 
already uses "ID" in some places, so consistency is a further argument 
for this change.)

> +    associated with that address.
> +
> +  Message:
> +    Connections can exchange messages with other connections by addressing
> +    the peers with their connection id or well-known name. A message consists
> +    of a message header with kernel-specific information on how to route the

What does "kernel-specific" mean here? Something needs explaining (or removing).

> +    message, and the message payload, which is a logical byte stream of
> +    arbitrary size. Messages can carry additional file descriptors to be passed

So, this is FD passing like UNIX domain sockets? If yes, it would be helpful
here to mention that analogy.

> +    from one connection to another. Every connection can specify which set of
> +    metadata the kernel should attach to the message when it is delivered
> +    to the receiving connection. Metadata contains information like: system
> +    timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm,

s/well-known-names/well-known names/

See the note on "ID" above. I think UID, GID, TID throughout would help 
readability.

> +    process exe, process argv, cgroup, capabilities, seclabel, audit session,
> +    loginuid and the connection's human-readable name.
> +    See section 7 and 13 for more details.
> +
> +  Item:
> +    The API of kdbus implements a notion of items, submitted through and

s/a notion/the notion/

> +    returned by most ioctls, and stored inside data structures in the
> +    connection's pool. See section 4 for more details.
> +
> +  Broadcast and Match:
> +    Broadcast messages are potentially sent to all connections of a bus. By
> +    default, the connections will not actually receive any of the sent
> +    broadcast messages; only after installing a match for specific message
> +    properties, a broadcast message passes this filter.

"Filter" suddenly gets mentioned here without previously being defined. I suspect 
the last piece should read more like:

  a connection will receive a broadcast message only after it installs a filter
  that matches the specific message properties of the broadcast message.

> +    See section 10 for more details.
> +
> +  Policy:
> +    A policy is a set of rules that define which connections can see, talk to,
> +    or register a well-know name on the bus. A policy is attached to buses and

s/know/know/

> +    custom endpoints, and modified by policy holder connections or owners of
> +    custom endpoints. See section 11 for more details.
> +    See section 11 for more details.

Repeated last sentence.

> +  Privileged bus users:
> +    A user connecting to the bus is considered privileged if it is either the
> +    creator of the bus, or if it has the CAP_IPC_OWNER capability flag set.
> +
> +
> +2. Control Files Layout
> +===============================================================================
> +
> +The kdbus interface is exposed through files in its kdbusfs mount point
> +(defaults to /sys/fs/kdbus):
> +
> +  /sys/fs/kdbus                 (mount point of kdbusfs)
> +  |-- control                   (domain control-file)
> +  |-- 0-system                  (bus of user uid=0)
> +  |   |-- bus                   (default endpoint of bus '0-system')
> +  |   `-- ep.apache             (custom endpoint of bus '0-system')
> +  |-- 1000-user                 (bus of user uid=1000)
> +  |   `-- bus                   (default endpoint of bus '1000-user')
> +  `-- 2702-user                 (bus of user uid=2702)
> +      |-- bus                   (default endpoint of bus '2702-user')
> +      `-- ep.app                (custom endpoint of bus '2702-user')
> +
> +
> +3. Data Structures and flags
> +===============================================================================
> +
> +3.1 Data structures and interconnections
> +----------------------------------------
> +
> +  +--------------------------------------------------------------------------+
> +  | Domain (Mount Point)                                                     |
> +  | /sys/fs/kdbus/control                                                    |
> +  | +----------------------------------------------------------------------+ |
> +  | | Bus (System Bus)                                                     | |
> +  | | /sys/fs/kdbus/0-system/                                              | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | | Endpoint                      | | Endpoint                       | | |
> +  | | | /sys/fs/kdbus/0-system/bus    | | /sys/fs/kdbus/0-system/ep.app  | | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
> +  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | +----------------------------------------------------------------------+ |
> +  |                                                                          |
> +  | +----------------------------------------------------------------------+ |
> +  | | Bus (User Bus for UID 2702)                                          | |
> +  | | /sys/fs/kdbus/2702-user/                                             | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | | Endpoint                      | | Endpoint                       | | |
> +  | | | /sys/fs/kdbus/2702-user/bus   | | /sys/fs/kdbus/2702-user/ep.app | | |
> +  | | +-------------------------------+ +--------------------------------+ | |
> +  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
> +  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
> +  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
> +  | | +--------------+ +--------------+ +--------------------------------+ | |
> +  | +----------------------------------------------------------------------+ |
> +  +--------------------------------------------------------------------------+
> +
> +The above description uses the D-Bus notation of unique connection names that
> +adds a ":1." prefix to the connection's unique ID. kdbus itself doesn't
> +use that notation, neither internally nor externally. However, libraries and
> +other userspace code that aims for compatibility to D-Bus might.

s/compatibility to/compatibility with/

> +
> +3.2 Flags
> +---------
> +
> +All ioctls used in the communication with the driver contain three 64-bit
> +fields: 'flags', 'kernel_flags' and 'return_flags'. All of them are specific
> +to the ioctl used.
> +
> +In 'flags', the behavior of the command can be tweaked. All bits that are not
> +recognized by the kernel in this field are rejected, and the ioctl fails with
> +-EINVAL.
> +
> +In 'kernel_flags', the kernel driver writes back the mask of supported bits
> +upon each call, and sets the KDBUS_FLAGS_KERNEL bit. This is a way to probe
> +possible kernel features and make userspace code forward and backward
> +compatible.

So, is "kernel_flags' a bounding superset of what the caller may specify 
in 'flags'? If yes, please make that clearer.

> +
> +In 'return_flags', the kernel can return results of the command, in addition
> +to the actual return value. This is mostly to inform userspace about non-fatal
> +conditions that occurred during the execution of the command.
> +
> +
> +4. Items
> +===============================================================================
> +
> +To flexibly augment transport structures, data blobs of type struct kdbus_item
> +can be attached to the structs passed into the ioctls. Some ioctls make items
> +of certain types mandatory, others are optional. Unsupported items will cause
> +the ioctl to fail -EINVAL.
> +
> +The total size of an item is variable and is in some cases defined by the item
> +type. In other cases, they can be of arbitrary length (for instance, a string).
> +
> +Items are also used for information stored in a connection's pool, such as
> +received messages, name lists or requested connection or bus owner information.
> +
> +Whenever items are used as part of the kdbus kernel API, they are embedded in
> +structs that have an overall size of their own, so there can be multiple items

"have an overall size of their own" as hard to grok. I think you mean

    ... are embedded insides structs that themselves include a size field
    containing the overall size of the structure. This allows multiple 
    items per ioctl.

> +per ioctl.
> +
> +The kernel expects all items to be aligned to 8-byte boundaries. Unaligned
> +items or such that are unsupported by the ioctl are rejected.

s/such/items/?
(Or otherwise replace "such" with whatever it actually means.)

> +A simple iterator in userspace would iterate over the items until the items
> +have reached the embedding structure's overall size. An example implementation
> +of such an iterator can be found in tools/testing/selftests/kdbus/kdbus-util.h.
> +
> +
> +5. Creation of new domains, buses and endpoints
> +===============================================================================
> +
> +
> +5.1 Domains
> +-----------
> +
> +A domain is a container of buses. Domains themselves do not provide any IPC
> +functionality. Their sole purpose is to manage buses allocated in their
> +domain. Each time kdbusfs is mounted, a new kdbus domain is created, with its
> +own 'control' file. The lifetime of the domain ends once the user has unmounted
> +the kdbusfs. If you mount kdbusfs multiple times, each will have its own kdbus
> +domain internally. 

What does that last sentence mean? Somehow it needs to be reworded to better
convey whatever sense it is trying to convey.

> Operations performed on one domain do not affect any
> +other domain.
> +
> +The full kdbusfs hierarchy, any sub-directory, or file can be bind-mounted to
> +an external mount point and will remain fully functional. The kdbus domain and
> +any linked resources stay available until the original mount and all subsequent
> +bind-mounts have been unmounted.
> +
> +During creation, domains pin the user-namespace of the creator and use
> +it as controlling user-namespace for this domain. Any user accounting is done

s/as/as the/

> +relative to that user-namespace.
> +
> +Newly created kdbus domains do not have any bus pre-created. The only resource
> +available is a 'control' file, which is used to manage kdbus domains.
> +Currently, 'control' files are exclusively used to create buses via

s/exclusively used/used exclusively/

> +KDBUS_CMD_BUS_MAKE, but further ioctls might be added in the future.
> +
> +
> +5.2 Buses
> +---------
> +
> +A bus is a shared resource between connections to transmit messages. Each bus

==> A bus is a resource that is shared between connections in order to 
    transmit messages

> +is independent and operations on the bus will not have any effect on other
> +buses. A bus is a management entity, that controls the addresses of its

s/,//

> +connections, their policies and message transactions performed via this bus.
> +
> +Each bus is bound to the domain it was created on. It has a custom name that is
> +unique across all buses of a domain. In kdbusfs, a bus is presented as a
> +directory. No operations can be performed on the bus itself, instead you need

s/,/;/

> +to perform those on an endpoint associated with the bus. Endpoints are

s/those/operations/

> +accessible as files underneath the bus directory. A default endpoint called
> +"bus" is provided on each bus.
> +
> +Bus names may be chosen freely except for one restriction: the name
> +must be prefixed with the numeric UID of the creator and a dash. This

s/UID/effective  UID/ 
(I assume it's the effective UID...)

> +is required to avoid namespace clashes between different users. When
> +creating a bus the name must be passed in properly formatted, or the

the name must be passed in properly formatted
==>
that name that is passed in must be properly formatted

> +kernel will refuse creation of the bus. Example: "1047-foobar" is an
> +OK name for a bus registered by a user with UID 1047. However,

s/OK/acceptable/

> +"1024-foobar" is not, and neither is "foobar".
> +The UID must be provided in the user-namespace of the parent domain.
> +
> +To create a new bus, you need to open the control file of a domain and run the

s/run/employ/ (One doesn't "run" a system call.)

> +KDBUS_CMD_BUS_MAKE ioctl. The control file descriptor that was used to issue
> +KDBUS_CMD_BUS_MAKE must not have been used for any other control-ioctl before

Maybe better:

have been used for any other control-ioctl before
==>
previously have been used for any other control-ioctl

> +and needs to be kept open for the entire life-time of the created bus. Closing

s/needs/must/ (just reads smoother)

> +it will immediately cleanup the entire bus and all its associated resources and
> +endpoints. Every control file descriptor can only be used to create a single
> +new bus; from that point on, it is not used for any further communication until
> +the final close().
> +
> +Each bus will generate a random, 128-bit UUID upon creation. It will be

/It/This UUID/

> +returned to creators of connections through kdbus_cmd_hello.id128 and can
> +be used by userspace to uniquely identify buses, even across different machines
> +or containers. The UUID will have its variant bits set to 'DCE', and denote
> +version 4 (random).

I find that last sentence rather difficult to grasp. I think more detail
needs to be added.

> +When creating buses, a variable list of items that must be passed in
> +the items array is expected otherwise bus creation will fail.

What does "a variable list of items that must be passed in
the items array" mean? Something needs fixing, I think.

> +
> +5.3 Endpoints
> +-------------
> +
> +Endpoints are entry points to a bus. By default, each bus has a default
> +endpoint called 'bus'. The bus owner has the ability to create custom
> +endpoints with specific names, permissions, and policy databases (see below).
> +An endpoint is presented as file underneath the directory of the parent bus.
> +
> +To create a custom endpoint, open the default endpoint ('bus') and use the
> +KDBUS_CMD_ENDPOINT_MAKE ioctl with "struct kdbus_cmd_make". Custom endpoints
> +always have a policy database that, by default, forbids any operation. You have
> +to explicitly install policy entries to allow any operation on this endpoint.
> +Once KDBUS_CMD_ENDPOINT_MAKE succeeded, this file descriptor will manage the
> +newly created endpoint resource. It cannot be used to manage further resources.
> +
> +Endpoint names may be chosen freely except for one restriction: the name
> +must be prefixed with the numeric UID of the creator and a dash. This

s/UID/effective UID/

> +is required to avoid namespace clashes between different users. When
> +creating an endpoint the name must be passed in properly formatted, or the

creating an endpoint the name must be passed in properly formatted
==>
creating an endpoint, the name that is passed in must be properly formatted

> +kernel will refuse creation of the endpoint. Example: "1047-foobar" is an
> +OK name for an endpoint registered by a user with UID 1047. However,

s/OK/acceptable/

> +"1024-foobar" is not, and neither is "foobar".

Because this text reads almost exactly as the bus text in 5.2, I did 
a double take here. I suggest making the text more distinct in each case.

So, for example:

    Example: "1047-my-endpoint" is an OK name for an endpoint registered 
    by a user with UID 1047. However, "1024-my-endpoint" is not, and 
    neither is "my-endpoint".

(And you could do similar in the bus text in section 5.2.)

> +The UID must be provided in the user-namespace of the parent domain.
> +
> +To create connections to a bus, you use KDBUS_CMD_HELLO. See section 6 for
> +details. Note that once KDBUS_CMD_HELLO succeeded, this file descriptor manages

Note that once KDBUS_CMD_HELLO succeeded,
==>
Note that after a successful KDBUS_CMD_HELLO,

> +the newly created connection resource. It cannot be used to manage further
> +resources.
> +
> +
> +5.4 Creating buses and endpoints
> +--------------------------------
> +
> +KDBUS_CMD_BUS_MAKE, and KDBUS_CMD_ENDPOINT_MAKE take a

s/,//

> +struct kdbus_cmd_make argument.
> +
> +struct kdbus_cmd_make {
> +  __u64 size;
> +    The overall size of the struct, including its items.
> +
> +  __u64 flags;
> +    The flags for creation.
> +
> +    KDBUS_MAKE_ACCESS_GROUP
> +      Make the bus or endpoint file group-accessible
> +
> +    KDBUS_MAKE_ACCESS_WORLD
> +      Make the bus or endpoint file world-accessible
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    A list of items that has specific meanings for KDBUS_CMD_BUS_MAKE

s/has/have/

> +    and KDBUS_CMD_ENDPOINT_MAKE (see above).
> +
> +    Following items are expected for KDBUS_CMD_BUS_MAKE:
> +    KDBUS_ITEM_MAKE_NAME
> +      Contains a string to identify the bus name.

So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
which subfield holds the pointer to the string?

Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
I think.

> +
> +    KDBUS_ITEM_BLOOM_PARAMETER
> +      Bus-wide bloom parameters passed in a dbus_bloom_parameter struct
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_RECV
> +      An optional item that contains a set of required attach flags
> +      that connections must allow. This item is used as a negotiation
> +      measure during connection creation. If connections do not satisfy
> +      the bus requirements, they are not allowed on the bus.
> +      If not set, the bus does not require any metadata to be attached,

s/,/;/

> +      in this case connections are free to set their own attach flags.
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_SEND
> +      An optional item that contains a set of attach flags that are
> +      returned to connections when they query the bus creator metadata.
> +      If not set, no metadata is returned.
> +
> +    Unrecognized items are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +6. Connections
> +===============================================================================
> +
> +
> +6.1 Connection IDs and well-known connection names
> +--------------------------------------------------
> +
> +Connections are identified by their connection id, internally implemented as a
> +uint64_t counter. The IDs of every newly created bus start at 1, and every new
> +connection will increment the counter by 1. The ids are not reused.

Again, please change "ids" to IDs" throughout.

> +
> +In higher level tools, the user visible representation of a connection is
> +defined by the D-Bus protocol specification as ":1.<id>".
> +
> +Messages with a specific uint64_t destination id are directly delivered to
> +the connection with the corresponding id. Messages with the special destination
> +id KDBUS_DST_ID_BROADCAST are broadcast messages and are potentially delivered
> +to all known connections on the bus; clients interested in broadcast messages
> +need to subscribe to the specific messages they are interested, though before
> +any broadcast message reaches them.

The piece following the semicolon would be better as this separate sentence, 
I think:

    However, in order to receive any broadcast messages, clients must
    to subscribe to the specific messages in which they are interested.

> +
> +Messages synthesized and sent directly by the kernel will carry the special
> +source id KDBUS_SRC_ID_KERNEL (0).
> +
> +In addition to the unique uint64_t connection id, established connections can
> +request the ownership of well-known names, under which they can be found and
> +addressed by other bus clients. A well-known name is associated with one and
> +only one connection at a time. See section 8 on name acquisition and the
> +name registry, and the validity of names.
> +
> +Messages can specify the special destination id 0 and carry a well-known name
> +in the message data. Such a message is delivered to the destination connection
> +which owns that well-known name.
> +
> +  +-------------------------------------------------------------------------+
> +  | +---------------+     +---------------------------+                     |
> +  | | Connection    |     | Message                   | -----------------+  |
> +  | | :1.22         | --> | src: 22                   |                  |  |
> +  | |               |     | dst: 25                   |                  |  |
> +  | |               |     |                           |                  |  |
> +  | |               |     |                           |                  |  |
> +  | |               |     +---------------------------+                  |  |
> +  | |               |                                                    |  |
> +  | |               | <--------------------------------------+           |  |
> +  | +---------------+                                        |           |  |
> +  |                                                          |           |  |
> +  | +---------------+     +---------------------------+      |           |  |
> +  | | Connection    |     | Message                   | -----+           |  |
> +  | | :1.25         | --> | src: 25                   |                  |  |
> +  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
> +  | |               |     |  (KDBUS_DST_ID_BROADCAST) |              |   |  |
> +  | |               |     |                           | ---------+   |   |  |
> +  | |               |     +---------------------------+          |   |   |  |
> +  | |               |                                            |   |   |  |
> +  | |               | <--------------------------------------------------+  |
> +  | +---------------+                                            |   |      |
> +  |                                                              |   |      |
> +  | +---------------+     +---------------------------+          |   |      |
> +  | | Connection    |     | Message                   | --+      |   |      |
> +  | | :1.55         | --> | src: 55                   |   |      |   |      |
> +  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
> +  | |               |     |                           |   |      |   |      |
> +  | |               |     |                           |   |      |   |      |
> +  | |               |     +---------------------------+   |      |   |      |
> +  | |               |                                     |      |   |      |
> +  | |               | <------------------------------------------+   |      |
> +  | +---------------+                                     |          |      |
> +  |                                                       |          |      |
> +  | +---------------+                                     |          |      |
> +  | | Connection    |                                     |          |      |
> +  | | :1.81         |                                     |          |      |
> +  | | org.foo.bar   |                                     |          |      |
> +  | |               |                                     |          |      |
> +  | |               |                                     |          |      |
> +  | |               | <-----------------------------------+          |      |
> +  | |               |                                                |      |
> +  | |               | <----------------------------------------------+      |
> +  | +---------------+                                                       |
> +  +-------------------------------------------------------------------------+
> +
> +
> +6.2 Creating connections
> +------------------------
> +
> +A connection to a bus is created by opening an endpoint file of a bus and
> +becoming an active client with the KDBUS_CMD_HELLO ioctl. Every connected client
> +connection has a unique identifier on the bus and can address messages to every
> +other connection on the same bus by using the peer's connection id as the
> +destination.
> +
> +The KDBUS_CMD_HELLO ioctl takes the following struct as argument.
> +
> +struct kdbus_cmd_hello {
> +  __u64 size;
> +    The overall size of the struct, including all attached items.
> +
> +  __u64 flags;
> +    Flags to apply to this connection:
> +
> +    KDBUS_HELLO_ACCEPT_FD
> +      When this flag is set, the connection can be sent file descriptors
> +      as message payload. If it's not set, any attempt of doing so will

s/any attempt of doing so/an attempt to send file descriptors/

> +      result in -ECOMM on the sender's side.
> +
> +    KDBUS_HELLO_ACTIVATOR
> +      Make this connection an activator (see below). With this bit set,
> +      an item of type KDBUS_ITEM_NAME has to be attached which describes

s/attached which describes/attached. This item describes/

> +      the well-known name this connection should be an activator for.
> +
> +    KDBUS_HELLO_POLICY_HOLDER
> +      Make this connection a policy holder (see below). With this bit set,
> +      an item of type KDBUS_ITEM_NAME has to be attached which describes

s/attached which describes/attached. This item describes/

> +      the well-known name this connection should hold a policy for.
> +
> +    KDBUS_HELLO_MONITOR
> +      Make this connection an eaves-dropping connection. See section 6.8 for

s/eaves-dropping/eavesdropping/

> +      more information.
> +
> +To also receive broadcast messages,
   ^
Indentation error.

> +      the connection has to upload appropriate matches as well.
> +      This flag is only valid for privileged bus connections.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 attach_flags_send;
> +      Set the bits for metadata this connection permits to be sent to the
> +      receiving peer. Only metadata items that are both allowed to be sent by
> +      the sender and that are requested by the receiver will effectively be
> +      attached to the message eventually. Note, however, that the bus may

What does "eventually" mean here?

> +      optionally enforce some of those bits to be set. If the match fails,

s/enforce/require/ ?

> +      -ECONNREFUSED will be returned. In either case, this field will be set
> +      to the mask of metadata items that are enforced by the bus. The
> +      KDBUS_FLAGS_KERNEL bit will as well be set.
> +
> +  __u64 attach_flags_recv;
> +      Request the attachment of metadata for each message received by this
> +      connection. The metadata actually attached may actually augment the list

Seems like two "actually" in the previous line is one too many.

> +      of requested items. See section 13 for more details.
> +
> +  __u64 bus_flags;
> +      Upon successful completion of the ioctl, this member will contain the
> +      flags of the bus it connected to.
> +
> +  __u64 id;
> +      Upon successful completion of the ioctl, this member will contain the
> +      id of the new connection.
> +
> +  __u64 pool_size;
> +      The size of the communication pool, in bytes. The pool can be accessed
> +      by calling mmap() on the file descriptor that was used to issue the
> +      KDBUS_CMD_HELLO ioctl.
> +
> +  __u64 offset;
> +      The kernel will return the offset in the pool where returned details
> +      will be stored.
> +
> +  __u8 id128[16];
> +      Upon successful completion of the ioctl, this member will contain the
> +      128 bit wide UUID of the connected bus.

s/128 bit wide/128-bit/

> +
> +  struct kdbus_item items[0];
> +      Variable list of items to add optional additional information. The

s/to add optional additional/containing optional additional/

> +      following items are currently expected/valid:
> +
> +      KDBUS_ITEM_CONN_DESCRIPTION
> +        Contains a string to describes this connection's name, so it can be

s/to/that/

> +        identified later.
> +
> +      KDBUS_ITEM_NAME
> +      KDBUS_ITEM_POLICY_ACCESS
> +        For activators and policy holders only, combinations of these two
> +        items describe policy access entries (see section about policy).
> +
> +      KDBUS_ITEM_CREDS
> +      KDBUS_ITEM_PIDS
> +      KDBUS_ITEM_SECLABEL
> +        Privileged bus users may submit these types in order to create
> +        connections with faked credentials. This information will be returned
> +        when peer information is queried by KDBUS_CMD_CONN_INFO. See section
> +        13 for more information.
> +
> +      Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +At the offset returned in the 'offset' field of struct kdbus_cmd_hello, the
> +kernel will store items of the following types:
> +
> +  KDBUS_ITEM_BLOOM_PARAMETER
> +      Bloom filter parameter as defined by the bus creator (see below).
> +
> +The offset in the pool has to be freed with the KDBUS_CMD_FREE ioctl.

As far as I can tell, KDBUS_CMD_FREE detailed anywhere in this file. It
needs a detailed description soemwhere.

> +
> +6.3 Activator and policy holder connection
> +------------------------------------------
> +
> +An activator connection is a placeholder for a well-known name. Messages sent
> +to such a connection can be used by userspace to start an implementer
> +connection, which will then get all the messages from the activator copied
> +over. An activator connection cannot be used to send any message.
> +
> +A policy holder connection only installs a policy for one or more names.
> +These policy entries are kept active as long as the connection is alive, and
> +are removed once it terminates. Such a policy connection type can be used to
> +deploy restrictions for names that are not yet active on the bus. A policy
> +holder connection cannot be used to send any message.
> +
> +The creation of activator, policy holder or monitor connections is an operation

What is a "monitor connection"? That term springs up unannounced. Is it
an "eavesdropping connection" as described above? Either define the term
"monitor connection" or use consistent terminology. (Actually, further down
in the document, it is clarified that "monitor connection" == "eavesdropper".
But that is not clear at THIS point in the document. It needs to be clearer.)

> +restricted to privileged users on the bus (see section "Terminology").
> +
> +
> +6.4 Retrieving information on a connection
> +------------------------------------------
> +
> +The KDBUS_CMD_CONN_INFO ioctl can be used to retrieve credentials and
> +properties of the initial creator of a connection. This ioctl uses the
> +following struct:
> +
> +struct kdbus_cmd_info {
> +  __u64 size;
> +    The overall size of the struct, including the name with its 0-byte string
> +    terminator.

"0-byte string terminator" reads strangely. I assume you mean "terminating 
null byte" / "null-terminated string" Best to use those more standard terms.

So, maybe:

    including the name with its terminating null byte

or:

    including the null-terminated 'name' string
    

There are multiple other instances of 0-byte in the doc, and I think they 
should also be fixed in a similar fashion.

> +
> +  __u64 flags;
> +    Specify which metadata items should be attached to the answer.

s/Specify/Specifies/

> +    See section 13 for more details.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 id;
> +    The connection's numerical ID to retrieve information for. If set to

Hard to parse sentence.

==> The numerical ID of the connection for which information is to 
    be retrieved.

> +    non-zero value, the 'name' field is ignored.

s/non-zero value/a non-zero value/

> +
> +  __u64 offset;
> +    When the ioctl returns, this value will yield the offset of the connection

s/value will yield/field will contain/

> +    information inside the caller's pool.
> +
> +  __u64 info_size;
> +    The kernel will return the size of the returned information, so applications
> +    can optionally mmap specific parts of the pool.
> +
> +  struct kdbus_item items[0];
> +    The optional item list, containing the well-known name to look up as
> +    a KDBUS_ITEM_OWNED_NAME. Only required if the 'id' field is set to 0.
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +After the ioctl returns, the following struct will be stored in the caller's
> +pool at 'offset'.
> +
> +struct kdbus_info {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 id;
> +    The connection's unique ID.
> +
> +  __u64 flags;
> +    The connection's flags as specified when it was created.
> +
> +  struct kdbus_item items[0];
> +    Depending on the 'flags' field in struct kdbus_cmd_info, items of
> +    types KDBUS_ITEM_OWNED_NAME and KDBUS_ITEM_CONN_DESCRIPTION are followed

s/are followed/follow/

> +    here.
> +};
> +
> +Once the caller is finished with parsing the return buffer, it needs to call
> +KDBUS_CMD_FREE for the offset.
> +
> +
> +6.5 Getting information about a connection's bus creator
> +--------------------------------------------------------
> +
> +The KDBUS_CMD_BUS_CREATOR_INFO ioctl takes the same struct as
> +KDBUS_CMD_CONN_INFO but is used to retrieve information about the creator of
> +the bus the connection is attached to. The metadata returned by this call is
> +collected during the creation of the bus and is never altered afterwards, so
> +it provides pristine information on the task that created the bus, at the
> +moment when it did so.
> +
> +In response to this call, a slice in the connection's pool is allocated and
> +filled with an object of type struct kdbus_info, pointed to by the ioctl's
> +'offset' field.
> +
> +struct kdbus_info {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 id;
> +    The bus ID
> +
> +  __u64 flags;
> +    The bus flags as specified when it was created.

s/it/the bus/

> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  struct kdbus_item items[0];
> +    Metadata information is stored in items here. The item list contains
> +    a KDBUS_ITEM_MAKE_NAME item that indicates the bus name of the
> +    calling connection.
> +};
> +
> +Once the caller is finished with parsing the return buffer, it needs to call
> +KDBUS_CMD_FREE for the offset.
> +
> +
> +6.6 Updating connection details
> +-------------------------------
> +
> +Some of a connection's details can be updated with the KDBUS_CMD_CONN_UPDATE
> +ioctl, using the file descriptor that was used to create the connection.
> +The update command uses the following struct.
> +
> +struct kdbus_cmd_update {
> +  __u64 size;
> +    The overall size of the struct, including all its items.
> +
> +  __u64 flags;
> +    Currently no flags are supported. Reserved for future use.

> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to describe the connection details to be updated. The following item
> +    types are supported:
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_SEND
> +      Supply a new set of items that this connection permits to be sent along
> +      with messages.
> +
> +    KDBUS_ITEM_ATTACH_FLAGS_RECV
> +      Supply a new set of items to be attached to each message.
> +
> +    KDBUS_ITEM_NAME
> +    KDBUS_ITEM_POLICY_ACCESS
> +      Policy holder connections may supply a new set of policy information
> +      with these items. For other connection types, -EOPNOTSUPP is returned.
> +
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +6.7 Termination
> +---------------
> +
> +A connection can be terminated by simply closing its file descriptor. All
> +pending incoming messages will be discarded, and the memory in the pool will
> +be freed.
> +
> +An alternative way of closing down a connection is calling the KDBUS_CMD_BYEBYE
> +ioctl on it, which will only succeed if the message queue of the connection is
> +empty at the time of closing, otherwise, -EBUSY is returned.

The preceding is a little hard to parse. I suggest:

    An alternative way of closing down a connection is via the KDBUS_CMD_BYEBYE
    ioctl. This ioctly will succeed only if the message queue of the connection is
    empty at the time of closing; otherwise, -EBUSY is returned.

> +
> +When this ioctl returns successfully, the connection has been terminated and
> +won't accept any new messages from remote peers. This way, a connection can
> +be terminated race-free, without losing any messages.
> +
> +
> +6.8 Monitor connections ('eavesdropper')
> +----------------------------------------
> +
> +Eavesdropping connections are created by setting the KDBUS_HELLO_MONITOR flag
> +in struct kdbus_hello.flags. Such connections have all properties of any other,
> +regular connection, except for the following details:
> +
> +  * They will get every message sent over the bus, both unicasts and broadcasts
> +
> +  * Installing matches for broadcast messages is neither necessary nor allowed
> +
> +  * They cannot send messages or be directly addressed as receiver
> +
> +  * Their creation and destruction will not cause KDBUS_ITEM_ID_{ADD,REMOVE}
> +    notifications to be generated, so other connections cannot detect the
> +    presence of an eavesdropper.
> +
> +
> +7. Messages
> +===============================================================================
> +
> +Messages consist of a fixed-size header followed directly by a list of
> +variable-sized data 'items'. The overall message size is specified in the
> +header of the message. The chain of data items can contain well-defined
> +message metadata fields, raw data, references to data, or file descriptors.
> +
> +
> +7.1 Sending messages
> +--------------------
> +
> +Messages are passed to the kernel with the KDBUS_CMD_SEND ioctl. Depending
> +on the destination address of the message, the kernel delivers the message to
> +the specific destination connection or to all connections on the same bus.
> +Sending messages across buses is not possible. Messages are always queued in
> +the memory pool of the destination connection (see below).
> +
> +The KDBUS_CMD_SEND ioctl uses struct kdbus_cmd_send to describe the message
> +transfer.
> +
> +struct kdbus_cmd_send {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    Flags for message delivery:
> +
> +    KDBUS_SEND_SYNC_REPLY
> +      By default, all calls to kdbus are considered asynchronous,
> +      non-blocking. However, as there are many use cases that need to wait
> +      for a remote peer to answer a method call, there's a way to send a
> +      message and wait for a reply in a synchronous fashion. This is what
> +      the KDBUS_MSG_SYNC_REPLY controls. The KDBUS_CMD_SEND ioctl will block
> +      until the reply has arrived, the timeout limit is reached, in case the
> +      remote connection was shut down, or if interrupted by a signal before
> +      any reply; see signal(7).
> +
> +      The offset of the reply message in the sender's pool is stored in in
> +      'offset_reply' when the ioctl has returned without error. Hence, there
> +      is no need for another KDBUS_CMD_RECV ioctl or anything else to receive
> +      the reply.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call of
> +    KDBUS_CMD_SEND.
> +
> +  __u64 kernel_msg_flags;
> +    Valid bits for message flags, returned by the kernel upon each call of
> +    KDBUS_CMD_SEND.
> +
> +  __u64 return_flags;
> +    Kernel-provided flags, returning non-fatal errors that occurred during
> +    send. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 msg_address;
> +    Userspace has to provide a pointer to a message (struct kdbus_msg) to send.
> +
> +  struct kdbus_msg_info reply;
> +    Only used for synchronous replies. See description of struct kdbus_cmd_recv
> +    for more details.
> +
> +  struct kdbus_item items[0];
> +    The following items are currently recognized:
> +
> +    KDBUS_ITEM_CANCEL_FD
> +      When this optional item is passed in, and the call is executed as SYNC
> +      call, the passed in file descriptor can be used as alternative
> +      cancellation point. The kernel will call poll() on this file descriptor,
> +      and if it reports any incoming bytes, the blocking send operation will
> +      be canceled, and the call will return -ECANCELED. Any type of file
> +      descriptor that implements poll() can be used as payload to this item.
> +      For asynchronous message sending, this item is accepted but ignored.
> +
> +    All other items are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +The message referenced by 'msg_address' above has the following layout.
> +
> +struct kdbus_msg {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    KDBUS_MSG_EXPECT_REPLY
> +      Expect a reply from the remote peer to this message. With this bit set,

s/from the remote peer to this message/to this message from the remote peer/

> +      the timeout_ns field must be set to a non-zero number of nanoseconds in
> +      which the receiving peer is expected to reply. If such a reply is not
> +      received in time, the sender will be notified with a timeout message
> +      (see below). The value must be an absolute value, in nanoseconds and
> +      based on CLOCK_MONOTONIC.

Why is the option of a relative timeout not available?

> +      For a message to be accepted as reply, it must be a direct message to
> +      the original sender (not a broadcast), and its kdbus_msg.reply_cookie
> +      must match the previous message's kdbus_msg.cookie.
> +
> +      Expected replies also temporarily open the policy of the sending
> +      connection, so the other peer is allowed to respond within the given
> +      time window.
> +
> +    KDBUS_MSG_NO_AUTO_START
> +      By default, when a message is sent to an activator connection, the
> +      activator notified and will start an implementer. This flag inhibits
> +      that behavior. With this bit set, and the remote being an activator,
> +      -EADDRNOTAVAIL is returned from the ioctl.
> +
> +  __s64 priority;
> +    The priority of this message. Receiving messages (see below) may
> +    optionally be constrained to messages of a minimal priority. This
> +    allows for use cases where timing critical data is interleaved with
> +    control data on the same connection. If unused, the priority should be
> +    set to zero.
> +
> +  __u64 dst_id;
> +    The numeric ID of the destination connection, or KDBUS_DST_ID_BROADCAST
> +    (~0ULL) to address every peer on the bus, or KDBUS_DST_ID_NAME (0) to look
> +    it up dynamically from the bus' name registry. In the latter case, an item
> +    of type KDBUS_ITEM_DST_NAME is mandatory.
> +
> +  __u64 src_id;
> +    Upon return of the ioctl, this member will contain the sending
> +    connection's numerical ID. Should be 0 at send time.
> +
> +  __u64 payload_type;
> +    Type of the payload in the actual data records. Currently, only
> +    KDBUS_PAYLOAD_DBUS is accepted as input value of this field. When
> +    receiving messages that are generated by the kernel (notifications),
> +    this field will yield KDBUS_PAYLOAD_KERNEL.

s/yield/cointain/ ?

> +
> +  __u64 cookie;
> +    Cookie of this message, for later recognition. Also, when replying
> +    to a message (see above), the cookie_reply field must match this value.
> +
> +  __u64 timeout_ns;
> +    If the message sent requires a reply from the remote peer (see above),
> +    this field contains the timeout in absolute nanoseconds based on
> +    CLOCK_MONOTONIC.
> +
> +  __u64 cookie_reply;
> +    If the message sent is a reply to another message, this field must
> +    match the cookie of the formerly received message.
> +
> +  struct kdbus_item items[0];
> +    A dynamically sized list of items to contain additional information.
> +    The following items are expected/valid:
> +
> +    KDBUS_ITEM_PAYLOAD_VEC
> +    KDBUS_ITEM_PAYLOAD_MEMFD
> +    KDBUS_ITEM_FDS
> +      Actual data records containing the payload. See section "Passing of
> +      Payload Data".
> +
> +    KDBUS_ITEM_BLOOM_FILTER
> +      Bloom filter for matches (see below).
> +
> +    KDBUS_ITEM_DST_NAME
> +      Well-known name to send this message to. Required if dst_id is set
> +      to KDBUS_DST_ID_NAME. If a connection holding the given name can't
> +      be found, -ESRCH is returned.
> +      For messages to a unique name (ID), this item is optional. If present,
> +      the kernel will make sure the name owner matches the given unique name.
> +      This allows userspace tie the message sending to the condition that a
> +      name is currently owned by a certain unique name.
> +};
> +
> +The message will be augmented by the requested metadata items when queued into
> +the receiver's pool. See also section 13.2 ("Metadata and namespaces").
> +
> +
> +7.2 Message layout
> +------------------
> +
> +The layout of a message is shown below.
> +
> +  +-------------------------------------------------------------------------+
> +  | Message                                                                 |
> +  | +---------------------------------------------------------------------+ |
> +  | | Header                                                              | |
> +  | | size:          overall message size, including the data records     | |
> +  | | destination:   connection id of the receiver                        | |
> +  | | source:        connection id of the sender (set by kernel)          | |
> +  | | payload_type:  "DBusDBus" textual identifier stored as uint64_t     | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size (without padding)                        | |
> +  | | type:  type of data                                                 | |
> +  | | data:  reference to data (address or file descriptor)               | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | padding bytes to the next 8 byte alignment                          | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size (without padding)                        | |
> +  | | ...                                                                 | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | padding bytes to the next 8 byte alignment                          | |
> +  | +---------------------------------------------------------------------+ |
> +  | +---------------------------------------------------------------------+ |
> +  | | Data Record                                                         | |
> +  | | size:  overall record size                                          | |
> +  | | ...                                                                 | |
> +  | +---------------------------------------------------------------------+ |
> +  |   ... further data records ...                                          |
> +  +-------------------------------------------------------------------------+
> +
> +
> +7.3 Passing of Payload Data
> +---------------------------
> +
> +When connecting to the bus, receivers request a memory pool of a given size,
> +large enough to carry all backlog of data enqueued for the connection. The
> +pool is internally backed by a shared memory file which can be mmap()ed by
> +the receiver.
> +
> +KDBUS_MSG_PAYLOAD_VEC:
> +  Messages are directly copied by the sending process into the receiver's pool,

s/,/./ and then start a new sentence.

> +  that way two peers can exchange data by effectively doing a single-copy from
> +  one process to another, the kernel will not buffer the data anywhere else.
s/,/;/

> +
> +KDBUS_MSG_PAYLOAD_MEMFD:
> +  Messages can reference memfd files which contain the data.
> +  memfd files are tmpfs-backed files that allow sealing of the content of the
> +  file, which prevents all writable access to the file content.
> +  Only memfds that have (F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL) set
> +  are accepted as payload data, which enforces reliable passing of data.
> +  The receiver can assume that neither the sender nor anyone else can alter the
> +  content after the message is sent.
> +  Apart from the sender filling-in the content into memfd files, the data will
> +  be passed as zero-copy from one process to another, read-only, shared between
> +  the peers.
> +
> +The sender must not make any assumptions on the type how data is received by the

Wording error at "the type how". Some fix is needed.

> +remote peer. The kernel is free to re-pack multiple VEC and MEMFD payloads. For
> +instance, the kernel may decide to merge multiple VECs into a single VEC, inline
> +MEMFD payloads into memory or merge all passed VECs into a single MEMFD.
> +However, the kernel preserves the order of passed data. This means, the order of

s/,//

> +all VEC and MEMFD items is not changed in respect to each other.

s/in/with/

> +
> +In other words: All passed VEC and MEMFD data payloads are treated as a single
> +stream of data that may be received by the remote peer in a different set of
> +hunks than it was sent as.
> +
> +
> +7.4 Receiving messages
> +----------------------
> +
> +Messages are received by the client with the KDBUS_CMD_RECV ioctl. The endpoint
> +file of the bus supports poll() to wake up the receiving process when new

s%poll()%poll/select()/epoll%  ?

> +messages are queued up to be received.
> +
> +With the KDBUS_CMD_RECV ioctl, a struct kdbus_cmd_recv is used.
> +
> +struct kdbus_cmd_recv {
> +  __u64 size;
> +    The overall size of the struct, including the attached items.
> +
> +  __u64 flags;
> +    Flags to control the receive command.
> +
> +    KDBUS_RECV_PEEK
> +      Just return the location of the next message. Do not install file
> +      descriptors or anything else. This is usually used to determine the
> +      sender of the next queued message.
> +
> +    KDBUS_RECV_DROP
> +      Drop the next message without doing anything else with it, and free the
> +      pool slice. This a short-cut for KDBUS_RECV_PEEK and KDBUS_CMD_FREE.
> +
> +    KDBUS_RECV_USE_PRIORITY
> +      Use the priority field (see below).
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Kernel-provided flags, returning non-fatal errors that occurred during
> +    send. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __s64 priority;
> +    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
> +    the queue with at least the given priority. If no such message is waiting
> +    in the queue, -ENOMSG is returned.

###
How do I simply select the highest priority message, without knowing what 
its priority is?

> +
> +  __u64 dropped_msgs;
> +    If the CMD_RECV ioctl fails with EOVERFLOW, this field is filled by
> +    the kernel with the number of messages that couldn't be transmitted to
> +    this connection. In that case, the @offset member must not be accessed.
> +
> +  struct kdbus_msg_info msg;
> +   Embedded struct to be filled when the command succeeded (see below).
> +
> +  struct kdbus_item items[0];
> +    Items to specify further details for the receive command. Currently unused.
> +};
> +
> +Both 'struct kdbus_cmd_recv' and 'struct kdbus_cmd_send' embed 'struct
> +kdbus_msg_info'. For the SEND ioctl, it is used to catch synchronous replies,
> +if one was requested, and is unused otherwise.
> +
> +struct kdbus_msg_info {
> +  __u64 offset;
> +    Upon return of the ioctl, this field contains the offset in the receiver's
> +    memory pool. The memory must be freed with KDBUS_CMD_FREE.
> +
> +  __u64 msg_size;
> +    Upon successful return of the ioctl, this field contains the size of the
> +    allocated slice at offset @offset. It is the combination of the size of
> +    the stored kdbus_msg object plus all appended VECs. You can use it in
> +    combination with @offset to map a single message, instead of mapping the
> +    whole pool.
> +
> +  __u64 return_flags;
> +    Kernel-provided return flags. Currently, the following flags are defined.
> +
> +      KDBUS_RECV_RETURN_INCOMPLETE_FDS
> +        The message contained file descriptors which couldn't be installed
> +        into the receiver's task. Most probably that happened because the
> +        maximum number of file descriptors for that task were exceeded.
> +        The message is still delivered, so this is not a fatal condition.
> +        File descriptors inside the KDBUS_ITEM_FDS item that could not be
> +        installed will be set to -1.
> +};
> +
> +Unless KDBUS_RECV_DROP was passed, and given that the ioctl succeeded, the

s/and given that the ioctl succeeded/after a successful KDBUS_CMD_RECV ioctl/

> +offset field contains the location of the new message inside the receiver's
> +pool. The message is stored as struct kdbus_msg at this offset, and can be
> +interpreted with the semantics described above.
> +
> +Also, if the connection allowed for file descriptor to be passed
> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.

###
"after"??? When exactly?

> +The receiving task is obliged to close all of them appropriately. If
> +KDBUS_RECV_PEEK is set, no file descriptors are installed. This allows for
> +peeking at a message and dropping it via KDBUS_RECV_DROP, without installing
> +the passed file descriptors into the receiving process.
> +
> +The caller is obliged to call KDBUS_CMD_FREE with the returned offset when
> +the memory is no longer needed.
> +
> +
> +8. Name registry
> +===============================================================================
> +
> +Each bus instantiates a name registry to resolve well-known names into unique
> +connection IDs for message delivery. The registry will be queried when a
> +message is sent with kdbus_msg.dst_id set to KDBUS_DST_ID_NAME, or when a
> +registry dump is requested.
> +
> +All of the below is subject to policy rules for SEE and OWN permissions.
> +
> +
> +8.1 Name validity
> +-----------------
> +
> +A name has to comply to the following rules to be considered valid:

s/comply to/comply with/

> +
> + - The name has two or more elements separated by a period ('.') character
> + - All elements must contain at least one character
> + - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_"
> +   and must not begin with a digit
> + - The name must contain at least one '.' (period) character
> +   (and thus at least two elements)
> + - The name must not begin with a '.' (period) character
> + - The name must not exceed KDBUS_NAME_MAX_LEN (255)
> +
> +
> +8.2 Acquiring a name
> +--------------------
> +
> +To acquire a name, a client uses the KDBUS_CMD_NAME_ACQUIRE ioctl with the
> +following data structure.
> +
> +struct kdbus_cmd_name {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 flags;
> +    Flags to control details in the name acquisition.
> +
> +    KDBUS_NAME_REPLACE_EXISTING
> +      Acquiring a name that is already present usually fails, unless this flag
> +      is set in the call, and KDBUS_NAME_ALLOW_REPLACEMENT or (see below) was
> +      set when the current owner of the name acquired it, or if the current
> +      owner is an activator connection (see below).
> +
> +    KDBUS_NAME_ALLOW_REPLACEMENT
> +      Allow other connections to take over this name. When this happens, the
> +      former owner of the connection will be notified of the name loss.
> +
> +    KDBUS_NAME_QUEUE (acquire)
> +      A name that is already acquired by a connection, and which wasn't
> +      requested with the KDBUS_NAME_ALLOW_REPLACEMENT flag set can not be
> +      acquired again. However, a connection can put itself in a queue of
> +      connections waiting for the name to be released. Once that happens, the
> +      first connection in that queue becomes the new owner and is notified
> +      accordingly.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
> +    expected and allowed, and the contained string must be a valid bus name.
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +8.3 Releasing a name
> +--------------------
> +
> +A connection may release a name explicitly with the KDBUS_CMD_NAME_RELEASE
> +ioctl. If the connection was an implementer of an activatable name, its
> +pending messages are moved back to the activator. If there are any connections
> +queued up as waiters for the name, the oldest one of them will become the new
> +owner. The same happens implicitly for all names once a connection terminates.
> +
> +The KDBUS_CMD_NAME_RELEASE ioctl uses the same data structure as the
> +acquisition call, but with slightly different field usage.
> +
> +struct kdbus_cmd_name {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 flags;
> +    Flags to the command. Currently unused.

And, so presumably must be 0?  Best to note that.

> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +  struct kdbus_item items[0];
> +    Items to submit the name. Currently, one item of type KDBUS_ITEM_NAME is
> +    expected and allowed, and the contained string must be a valid bus name.
> +};
> +
> +
> +8.4 Dumping the name registry
> +-----------------------------
> +
> +A connection may request a complete or filtered dump of currently active bus
> +names with the KDBUS_CMD_NAME_LIST ioctl, which takes a struct
> +kdbus_cmd_name_list as argument.
> +
> +struct kdbus_cmd_name_list {
> +  __u64 flags;
> +    Any combination of flags to specify which names should be dumped.
> +
> +    KDBUS_NAME_LIST_UNIQUE
> +      List the unique (numeric) IDs of the connection, whether it owns a name
> +      or not.
> +
> +    KDBUS_NAME_LIST_NAMES
> +      List well-known names stored in the database which are actively owned by
> +      a real connection (not an activator).
> +
> +    KDBUS_NAME_LIST_ACTIVATORS
> +      List names that are owned by an activator.
> +
> +    KDBUS_NAME_LIST_QUEUED
> +      List connections that are not yet owning a name but are waiting for it
> +      to become available.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  __u64 offset;
> +    When the ioctl returns successfully, the offset to the name registry dump
> +    inside the connection's pool will be stored in this field.
> +};
> +
> +The returned list of names is stored in a struct kdbus_name_list that in turn
> +contains a dynamic number of struct kdbus_cmd_name that carry the actual
> +information. The fields inside that struct kdbus_cmd_name is described next.
> +
> +struct kdbus_name_info {
> +  __u64 size;
> +    The overall size of this struct, including the name with its 0-byte string
> +    terminator.
> +
> +  __u64 owner_id;
> +    The owning connection's unique ID.
> +
> +  __u64 conn_flags;
> +    The flags of the owning connection.
> +
> +  struct kdbus_item items[0];
> +    Items containing the actual name. Currently, one item of type
> +    KDBUS_ITEM_OWNED_NAME will be attached, including the name's flags. In that
> +    item, the flags field of the name may carry the following bits:
> +
> +    KDBUS_NAME_ALLOW_REPLACEMENT
> +      Other connections are allowed to take over this name from the
> +      connection that owns it.
> +
> +    KDBUS_NAME_IN_QUEUE (list)
> +      When retrieving a list of currently acquired name in the registry, this
> +      flag indicates whether the connection actually owns the name or is
> +      currently waiting for it to become available.
> +
> +    KDBUS_NAME_ACTIVATOR (list)
> +      An activator connection owns a name as a placeholder for an implementer,
> +      which is started on demand as soon as the first message arrives. There's
> +      some more information on this topic below. In contrast to
> +      KDBUS_NAME_REPLACE_EXISTING, when a name is taken over from an activator
> +      connection, all the messages that have been queued in the activator
> +      connection will be moved over to the new owner. The activator connection
> +      will still be tracked for the name and will take control again if the
> +      implementer connection terminates.
> +      This flag can not be used when acquiring a name, but is implicitly set
> +      through KDBUS_CMD_HELLO with KDBUS_HELLO_ACTIVATOR set in
> +      kdbus_cmd_hello.conn_flags.
> +};
> +
> +The returned buffer must be freed with the KDBUS_CMD_FREE ioctl when the user
> +is finished with it.
> +
> +
> +9. Notifications
> +===============================================================================
> +
> +The kernel will notify its users of the following events.
> +
> +  * When connection A is terminated while connection B is waiting for a reply
> +    from it, connection B is notified with a message with an item of type
> +    KDBUS_ITEM_REPLY_DEAD.
> +
> +  * When connection A does not receive a reply from connection B within the
> +    specified timeout window, connection A will receive a message with an item
> +    of type KDBUS_ITEM_REPLY_TIMEOUT.
> +
> +  * When an ordinary connection (not a monitor) is created on or removed from
> +    a bus, messages with an item of type KDBUS_ITEM_ID_ADD or
> +    KDBUS_ITEM_ID_REMOVE, respectively, are sent to all bus members that match
> +    these messages through their match database. Eavesdroppers (monitor
> +    connections) do not cause such notifications to be sent. They are invisible
> +    on the bus.
> +
> +  * When a connection gains or loses ownership of a name, messages with an item
> +    of type KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE or
> +    KDBUS_ITEM_NAME_CHANGE are sent to all bus members that match these
> +    messages through their match database.
> +
> +A kernel notification is a regular kdbus message with the following details.
> +
> +  * kdbus_msg.src_id == KDBUS_SRC_ID_KERNEL
> +  * kdbus_msg.dst_id == KDBUS_DST_ID_BROADCAST
> +  * kdbus_msg.payload_type == KDBUS_PAYLOAD_KERNEL
> +  * Has exactly one of the aforementioned items attached
> +
> +Kernel notifications have an item of type KDBUS_ITEM_TIMESTAMP attached.
> +
> +
> +10. Message Matching, Bloom filters
> +===============================================================================
> +
> +10.1 Matches for broadcast messages from other connections
> +----------------------------------------------------------
> +
> +A message addressed at the connection ID KDBUS_DST_ID_BROADCAST (~0ULL) is a
> +broadcast message, delivered to all connected peers which installed a rule to
> +match certain properties of the message. Without any rules installed in the
> +connection, no broadcast message or kernel-side notifications will be delivered
> +to the connection. Broadcast messages are subject to policy rules and TALK
> +access checks.
> +
> +See section 11 for details on policies, and section 11.5 for more
> +details on implicit policies.
> +
> +Matches for messages from other connections (not kernel notifications) are
> +implemented as bloom filters. The sender adds certain properties of the message
> +as elements to a bloom filter bit field, and sends that along with the
> +broadcast message.
> +
> +The connection adds the message properties it is interested as elements to a
> +bloom mask bit field, and uploads the mask to the match rules of the
> +connection.
> +
> +The kernel will match the broadcast message's bloom filter against the
> +connections bloom mask (simply by &-ing it), and decide whether the message
> +should be delivered to the connection.
> +
> +The kernel has no notion of any specific properties of the message, all it
> +sees are the bit fields of the bloom filter and mask to match against. The
> +use of bloom filters allows simple and efficient matching, without exposing
> +any message properties or internals to the kernel side. Clients need to deal
> +with the fact that they might receive broadcasts which they did not subscribe
> +to, as the bloom filter might allow false-positives to pass the filter.
> +
> +To allow the future extension of the set of elements in the bloom filter, the
> +filter specifies a "generation" number. A later generation must always contain
> +all elements of the set of the previous generation, but can add new elements
> +to the set. The match rules mask can carry an array with all previous
> +generations of masks individually stored. When the filter and mask are matched
> +by the kernel, the mask with the closest matching "generation" is selected
> +as the index into the mask array.
> +
> +
> +10.2 Matches for kernel notifications
> +------------------------------------
> +
> +To receive kernel generated notifications (see section 9), a connection must
> +install special match rules that are different from the bloom filter matches
> +described in the section above. They can be filtered by a sender connection's
> +ID, by one of the name the sender connection owns at the time of sending the
> +message, or by type of the notification (id/name add/remove/change).

s/type/the type/

> +
> +10.3 Adding a match
> +-------------------
> +
> +To add a match, the KDBUS_CMD_MATCH_ADD ioctl is used, which takes a struct
> +of the struct described below.
> +
> +Note that each of the items attached to this command will internally create
> +one match 'rule', and the collection of them, which is submitted as one block
> +via the ioctl is called a 'match'. To allow a message to pass, all rules of a

s/ioctl/ioctl,/

> +match have to be satisfied. Hence, adding more items to the command will only
> +narrow the possibility of a match to effectively let the message pass, and will
> +cause the connection's user space process to wake up less likely.

Make that last line

decrease the chance that the connection's userspace process wil be woken up

> +
> +Multiple matches can be installed per connection. As long as one of it has a

s/it/them/ ?
(If that change is not correct, then the sentence is quite confused.)

> +set of rules which allows the message to pass, this one will be decisive.
> +
> +struct kdbus_cmd_match {
> +  __u64 size;
> +    The overall size of the struct, including its items.
> +
> +  __u64 cookie;
> +    A cookie which identifies the match, so it can be referred to at removal
> +    time.
> +
> +  __u64 flags;
> +    Flags to control the behavior of the ioctl.
> +
> +    KDBUS_MATCH_REPLACE:
> +      Remove all entries with the given cookie before installing the new one.
> +      This allows for race-free replacement of matches.
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Items to define the actual rules of the matches. The following item types
> +    are expected. Each item will cause one new match rule to be created.
> +
> +    KDBUS_ITEM_BLOOM_MASK
> +      An item that carries the bloom filter mask to match against in its
> +      data field. The payload size must match the bloom filter size that
> +      was specified when the bus was created.
> +      See section 10.4 for more information.
> +
> +    KDBUS_ITEM_NAME
> +      Specify a name that a sending connection must own at a time of sending

s/a time/the time/

> +      a broadcast message in order to match this rule.
> +
> +    KDBUS_ITEM_ID
> +      Specify a sender connection's ID that will match this rule.
> +
> +    KDBUS_ITEM_NAME_ADD
> +    KDBUS_ITEM_NAME_REMOVE
> +    KDBUS_ITEM_NAME_CHANGE
> +      These items request delivery of broadcast messages that describe a name
> +      acquisition, loss, or change. The details are stored in the item's
> +      kdbus_notify_name_change member. All information specified must be
> +      matched in order to make the message pass. Use KDBUS_MATCH_ID_ANY to
> +      match against any unique connection ID.
> +
> +    KDBUS_ITEM_ID_ADD
> +    KDBUS_ITEM_ID_REMOVE
> +      These items request delivery of broadcast messages that are generated
> +      when a connection is created or terminated. struct kdbus_notify_id_change
> +      is used to store the actual match information. This item can be used to
> +      monitor one particular connection ID, or, when the id field is set to
> +      KDBUS_MATCH_ID_ANY, all of them.
> +
> +    Items of other types are rejected, and the ioctl will fail with -EINVAL.
> +};
> +
> +
> +10.4 Bloom filters
> +------------------
> +
> +Bloom filters allow checking whether a given word is present in a dictionary.
> +This allows connections to set up a mask for information it is interested in,
> +and will be delivered signal messages that have a matching filter.
> +
> +For general information on bloom filters, see
> +
> +  https://en.wikipedia.org/wiki/Bloom_filter
> +
> +The size of the bloom filter is defined per bus when it is created, in
> +kdbus_bloom_parameter.size. All bloom filters attached to signals on the bus
> +must match this size, and all bloom filter matches uploaded by connections must
> +also match the size, or a multiple thereof (see below).
> +
> +The calculation of the mask has to be done on the userspace side. The kernel
> +just checks the bitmasks to decide whether or not to let the message pass. All
> +bits in the mask must match the filter in and bit-wise AND logic, but the

Parse error at "in and bit-wise AND logic". I am not sure what you are meaning 
there, buth something needs fixing.

> +mask may have more bits set than the filter. Consequently, false positive
> +matches are expected to happen, and userspace must deal with that fact.
> +
> +Masks are entities that are always passed to the kernel as part of a match
> +(with an item of type KDBUS_ITEM_BLOOM_MASK), and filters can be attached to
> +signals, with an item of type KDBUS_ITEM_BLOOM_FILTER.
> +
> +For a filter to match, all its bits have to be set in the match mask as well.
> +For example, consider a bus has a bloom size of 8 bytes, and the following

s/has/that has/

> +mask/filter combinations:
> +
> +    filter  0x0101010101010101
> +    mask    0x0101010101010101
> +            -> matches
> +
> +    filter  0x0303030303030303
> +    mask    0x0101010101010101
> +            -> doesn't match
> +
> +    filter  0x0101010101010101
> +    mask    0x0303030303030303
> +            -> matches
> +
> +Hence, in order to catch all messages, a mask filled with 0xff bytes can be
> +installed as a wildcard match rule.
> +
> +Uploaded matches may contain multiple masks, each of which in the size of the

Parse error at "each of which in the size of"
s/in/is/?

> +bloom size defined by the bus. Each block of a mask is called a 'generation',
> +starting at index 0.
> +
> +At match time, when a signal is about to be delivered, a bloom mask generation
> +is passed, which denotes which of the bloom masks the filter should be matched
> +against. This allows userspace to provide backward compatible masks at upload
> +time, while older clients can still match against older versions of filters.
> +
> +
> +10.5 Removing a match
> +--------------------
> +
> +Matches can be removed through the KDBUS_CMD_MATCH_REMOVE ioctl, which again
> +takes struct kdbus_cmd_match as argument, but its fields are used slightly
> +differently.
> +
> +struct kdbus_cmd_match {
> +  __u64 size;
> +    The overall size of the struct. As it has no items in this use case, the
> +    value should yield 16.

s/yield/contain/ ?
> +
> +  __u64 cookie;
> +    The cookie of the match, as it was passed when the match was added.
> +    All matches that have this cookie will be removed.
> +
> +  __u64 flags;
> +    Unused for this use case,
> +
> +  __u64 kernel_flags;
> +    Valid flags for this command, returned by the kernel upon each call.
> +
> +  __u64 return_flags;
> +    Flags returned by the kernel. Currently unused.

And, so presumably always returned as 0?  Best to note that.

> +
> +  struct kdbus_item items[0];
> +    Unused und not allowed for this use case.
> +};
> +
> +
> +11. Policy
> +===============================================================================
> +
> +A policy databases restrict the possibilities of connections to own, see and
> +talk to well-known names. It can be associated with a bus (through a policy

s/It/A policy/

> +holder connection) or a custom endpoint.
> +
> +See section 8.1 for more details on the validity of well-known names.
> +
> +Default endpoints of buses always have a policy database. The default
> +policy is to deny all operations except for operations that are covered by
> +implicit policies. Custom endpoints always have a policy, and by default,
> +a policy database is empty. Therefore, unless policy rules are added, all

s/a policy database/the policy database/

> +operations will also be denied by default.
> +
> +See section 11.5 for more details on implicit policies.
> +
> +A set of policy rules is described by a name and multiple access rules, defined
> +by the following struct.
> +
> +struct kdbus_policy_access {
> +  __u64 type;	/* USER, GROUP, WORLD */
> +    One of the following.
> +
> +    KDBUS_POLICY_ACCESS_USER
> +      Grant access to a user with the uid stored in the 'id' field.
> +
> +    KDBUS_POLICY_ACCESS_GROUP
> +      Grant access to a user with the gid stored in the 'id' field.
> +
> +    KDBUS_POLICY_ACCESS_WORLD
> +      Grant access to everyone. The 'id' field is ignored.
> +
> +  __u64 access;	/* OWN, TALK, SEE */
> +    The access to grant.
> +
> +    KDBUS_POLICY_SEE
> +      Allow the name to be seen.
> +
> +    KDBUS_POLICY_TALK
> +      Allow the name to be talked to.
> +
> +    KDBUS_POLICY_OWN
> +      Allow the name to be owned.
> +
> +  __u64 id;
> +    For KDBUS_POLICY_ACCESS_USER, stores the uid.
> +    For KDBUS_POLICY_ACCESS_GROUP, stores the gid.
> +};
> +
> +Policies are set through KDBUS_CMD_HELLO (when creating a policy holder
> +connection), KDBUS_CMD_CONN_UPDATE (when updating a policy holder connection),
> +KDBUS_CMD_ENDPOINT_MAKE (creating a custom endpoint) or
> +KDBUS_CMD_ENDPOINT_UPDATE (updating a custom endpoint). In all cases, the name
> +and policy access information is stored in items of type KDBUS_ITEM_NAME and
> +KDBUS_ITEM_POLICY_ACCESS. For this transport, the following rules apply.
> +
> +  * An item of type KDBUS_ITEM_NAME must be followed by at least one
> +    KDBUS_ITEM_POLICY_ACCESS item
> +  * An item of type KDBUS_ITEM_NAME can be followed by an arbitrary number of
> +    KDBUS_ITEM_POLICY_ACCESS items
> +  * An arbitrary number of groups of names and access levels can be passed
> +
> +uids and gids are internally always stored in the kernel's view of global ids,
> +and are translated back and forth on the ioctl level accordingly.
> +
> +
> +11.2 Wildcard names
> +-------------------
> +
> +Policy holder connections may upload names that contain the wild card suffix

s/wild card/wildcard/

> +(".*"). That way, a policy can be uploaded that is effective for every
> +well-known name that extends the provided name by exactly one more level.
> +
> +For example, if an item of a set up uploaded policy rules contains the name

I find that last line very difficult to parse. Someehting is broken. 
What are you trying to say?

> +"foo.bar.*", both "foo.bar.baz" and "foo.bar.bazbaz" are valid, but

s/both/then both/ to help parsing.

> +"foo.bar.baz.baz" is not.
> +
> +This allows connections to take control over multiple names that the policy
> +holder doesn't need to know about when uploading the policy.
> +
> +Such wildcard entries are not allowed for custom endpoints.
> +
> +
> +11.3 Policy example
> +-------------------
> +
> +For example, a set of policy rules may look like this:
> +
> +  KDBUS_ITEM_NAME: str='org.foo.bar'
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=1000
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, id=1001
> +  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
> +  KDBUS_ITEM_NAME: str='org.blah.baz'
> +  KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=0
> +  KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
> +
> +That means that 'org.foo.bar' may only be owned by uid 1000, but every user on
> +the bus is allowed to see the name. However, only uid 1001 may actually send
> +a message to the connection and receive a reply from it.
> +
> +The second rule allows 'org.blah.baz' to be owned by uid 0 only, but every user
> +may talk to it.
> +
> +
> +11.4 TALK access and multiple well-known names per connection
> +-------------------------------------------------------------
> +
> +Note that TALK access is checked against all names of a connection.
> +For example, if a connection owns both 'org.foo.bar' and 'org.blah.baz', and
> +the policy database allows 'org.blah.baz' to be talked to by WORLD, then this
> +permission is also granted to 'org.foo.bar'. That might sound illogical, but
> +after all, we allow messages to be directed to either the ID or a well-known
> +name, and policy is applied to the connection, not the name. In other words,
> +the effective TALK policy for a connection is the most permissive of all names
> +the connection owns.
> +
> +For broadcast messages, the receiver needs TALK permissions to the sender to
> +receive the broadcast.
> +
> +If a policy database exists for a bus (because a policy holder created one on
> +demand) or for a custom endpoint (which always has one), each one is consulted

By "created one on demand" do you mean "explicitly created one"? If so
please use the latter wording, since its clearer.

> +during name registry listing, name owning or message delivery. If either one

What is "name owning". I think this could be worded better.

> +fails, the operation is failed with -EPERM.
> +
> +For best practices, connections that own names with a restricted TALK
> +access should not install matches. This avoids cases where the sent
> +message may pass the bloom filter due to false-positives and may also
> +satisfy the policy rules.
> +
> +
> +11.5 Implicit policies
> +----------------------
> +
> +Depending on the type of the endpoint, a set of implicit rules that
> +override installed policies might be enforced.
> +
> +On default endpoints, the following set is enforced and checked before
> +any user-supplied policy is checked.
> +
> +  * Privileged connections always override any installed policy. Those
> +    connections could easily install their own policies, so there is no
> +    reason to enforce installed policies.
> +  * Connections can always talk to connections of the same user. This
> +    includes broadcast messages.
> +
> +Custom endpoints have stricter policies. The following rules apply:
> +
> +  * Policy rules are always enforced, even if the connection is a privileged
> +    connection.
> +  * Policy rules are always enforced for TALK access, even if both ends are
> +    running under the same user. This includes broadcast messages.
> +  * To restrict the set of names that can be seen, endpoint policies can
> +    install "SEE" policies.
> +
> +
> +12. Pool
> +===============================================================================
> +
> +A pool for data received from the kernel is installed for every connection of
> +the bus, and is sized according to the information stored in the
> +KDBUS_ITEM_BLOOM_PARAMETER item that is returned by KDBUS_CMD_HELLO.
> +
> +The pool is written to by the kernel when one of the following ioctls is issued:
> +
> +  * KDBUS_CMD_HELLO, to receive details about the bus the connection was made to
> +  * KDBUS_CMD_RECV, to receive a message
> +  * KDBUS_CMD_NAME_LIST, to dump the name registry
> +  * KDBUS_CMD_CONN_INFO, to retrieve information on a connection
> +
> +The offsets returned by either one of the aforementioned ioctls describe offsets
> +inside the pool. In order to make the slice available for subsequent calls,
> +KDBUS_CMD_FREE has to be called on the offset.
> +
> +To access the memory, the caller is expected to mmap() it to its task, like

s/to its task//

> +this:
> +
> +  /*
> +   * POOL_SIZE has to be a multiple of PAGE_SIZE, and it must match the
> +   * value that was previously returned through the KDBUS_ITEM_BLOOM_PARAMETER
> +   * item field when the KDBUS_CMD_HELLO ioctl returned.
> +   */
> +
> +  buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, conn_fd, 0);
> +
> +Alternatively, instead of mapping the entire pool buffer, only parts of it can
> +be mapped. The length of the response is returned by the kernel along with the
> +offset for each of the ioctls listed above.
> +
> +
> +13. Metadata
> +===============================================================================
> +
> +kdbus records data about the system in certain situations. Such metadata can
> +refer to the currently active process (creds, PIDs, current user groups, process
> +names and its executable path, cgroup membership, capabilities, security label
> +and audit information), connection information (description string, currently
> +owned names) and the timestamp.
> +
> +Metadata is collected in the following circumstances:
> +
> +  * When a bus is created (KDBUS_CMD_MAKE), information about the calling task
> +    is collected. This data is returned by the kernel via the
> +    KDBUS_CMD_BUS_CREATOR_INFO call-

s/-/./

> +
> +  * When a connection is created (KDBUS_CMD_HELLO), information about the
> +    calling task is collected. Alternatively, a privileged connection may
> +    provide 'faked' information about credentials, PIDs and a security labels

s/a security/security/

> +    which will be taken instead. This data is returned by the kernel as
> +    information on a connection (KDBUS_CMD_CONN_INFO).
> +
> +  * When a message is sent (KDBUS_CMD_SEND), information about the sending task
> +    and the sending connection are collected. This metadata will be attached
> +    to the message when it arrives in the receiver's pool. If the connection
> +    sending the message installed faked credentials (see above), the message
> +    will not be augmented by any information about the currently sending task.
> +
> +Which metadata items are actually delivered depends on the following sets and
> +masks:
> +
> +    (a) the system-wide kmod creds mask (module parameter 'attach_flags_mask')
> +    (b) the per-connection send creds mask, set by the connecting client
> +    (c) the per-connection receive creds mask, set by the connecting client
> +    (d) the per-bus minimal creds mask, set by the bus creator
> +    (e) the per-bus owner creds mask, set by the bus creator
> +    (f) the mask specified when querying creds of a bus peer
> +    (g) the mask specified when querying creds of a bus owner
> +
> +With the following rules:
> +
> +    [1] The creds attached to messages are determined as (a & b & c).
> +    [2] When connecting to a bus (KDBUS_CMD_HELLO), and (~b & d) != 0, the call
> +        will fail, the connection is refused.
> +    [3] When querying creds of a bus peer, the creds returned are  (a & b & f)
> +    [4] When querying creds of a bus owner, the creds returned are (a & e & g)
> +    [5] When creating a new bus, and (d & ~a) != 0, then bus creation will fail
> +
> +Hence, userspace might not always get all requested metadata items that it
> +requested. Code must be written so that it can cope with this fact.
> +
> +
> +13.1 Known item types
> +---------------------
> +
> +The following attach flags are currently supported.
> +
> +  KDBUS_ATTACH_TIMESTAMP
> +    Attaches an item of type KDBUS_ITEM_TIMESTAMP which contains both the
> +    monotonic and the realtime timestamp, taken when the message was
> +    processed on the kernel side.
> +
> +  KDBUS_ATTACH_CREDS
> +    Attaches an item of type KDBUS_ITEM_CREDS, containing credentials as
> +    described in struct kdbus_creds: the user and group IDs in the usual four
> +    flavors: real, effective, saved and file-system related.
> +
> +  KDBUS_ATTACH_PIDS
> +    Attaches an item of type KDBUS_ITEM_PIDS, containing information on the
> +    process. In particular, the PID (process ID), TID (thread ID), and PPID
> +    (PID of the parent process).
> +
> +  KDBUS_ATTACH_AUXGROUPS
> +    Attaches an item of type KDBUS_ITEM_AUXGROUPS, containing a dynamic
> +    number of auxiliary groups the sending task was a member of.
> +
> +  KDBUS_ATTACH_NAMES
> +    Attaches items of type KDBUS_ITEM_OWNED_NAME, one for each name the sending
> +    connection currently owns. The name and flags are stored in kdbus_item.name
> +    for each of them.
> +
> +  KDBUS_ATTACH_TID_COMM [*]
> +    Attaches an items of type KDBUS_ITEM_TID_COMM, transporting the sending
> +    task's 'comm', for the tid.  The string is stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_PID_COMM [*]
> +    Attaches an items of type KDBUS_ITEM_PID_COMM, transporting the sending
> +    task's 'comm', for the pid.  The string is stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_EXE [*]
> +    Attaches an item of type KDBUS_ITEM_EXE, containing the path to the
> +    executable of the sending task, stored in kdbus_item.str.
> +
> +  KDBUS_ATTACH_CMDLINE [*]
> +    Attaches an item of type KDBUS_ITEM_CMDLINE, containing the command line
> +    arguments of the sending task, as an array of strings, stored in
> +    kdbus_item.str.
> +
> +  KDBUS_ATTACH_CGROUP
> +    Attaches an item of type KDBUS_ITEM_CGROUP with the task's cgroup path.
> +
> +  KDBUS_ATTACH_CAPS
> +    Attaches an item of type KDBUS_ITEM_CAPS, carrying sets of capabilities
> +    that should be accessed via kdbus_item.caps.caps. Also, userspace should
> +    be written in a way that it takes kdbus_item.caps.last_cap into account,
> +    and derive the number of sets and rows from the item size and the reported
> +    number of valid capability bits.
> +
> +  KDBUS_ATTACH_SECLABEL
> +    Attaches an item of type KDBUS_ITEM_SECLABEL, which contains the SELinux
> +    security label of the sending task. SELinux and other MACs might want to
> +    do additional per-service security checks. For example, a service manager
> +    might want to check the security label of a service file against the
> +    security label of the client process checking the SELinux database before
> +    allowing access.  The label can be accessed via kdbus_item->str.
> +
> +  KDBUS_ATTACH_AUDIT
> +    Attaches an item of type KDBUS_ITEM_AUDIT, which contains the audit
> +    sessionid and loginuid of the sending task. Access via kdbus_item->audit.
> +
> +  KDBUS_ATTACH_CONN_DESCRIPTION
> +    Attaches an item of type KDBUS_ITEM_CONN_DESCRIPTION that contains a
> +    descriptive string of the sending peer. That string can be supplied
> +    during HELLO by attaching an item of type KDBUS_ITEM_CONN_DESCRIPTION.
> +
> +
> +[*] Note that the content stored in these items can easily be tampered by
> +    the sending tasks. Therefore, they should NOT be used for any sort of
> +    security relevant assumptions. The only reason why they are transmitted is
> +    to let receivers know about details that were set when metadata was
> +    collected, even though the task they were collected from is not active any
> +    longer when the items are received.
> +
> +
> +13.2 Metadata and namespaces
> +----------------------------
> +
> +Metadata such as PIDs, UIDs or GIDs are automatically translated to the
> +namespaces of the task that receives them.
> +
> +
> +14. Error codes
> +===============================================================================
> +
> +Below is a list of error codes that might be returned by the individual
> +ioctl commands. The list focuses on the return values from kdbus code itself,
> +and might not cover those of all kernel internal functions.
> +
> +For all ioctls:
> +
> +  -ENOMEM	The kernel memory is exhausted
> +  -ENOTTY	Illegal ioctl command issued for the file descriptor

Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
explanation in this document.)

> +  -ENOSYS	The requested functionality is not available

Maybe add here an explanation or examples of why the functionality is 
not available?

> +  -EINVAL	Unsupported item attached to command
> +
> +For all ioctls that carry a struct as payload:
> +
> +  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
> +		inaccessible from the kernel side.
> +  -EINVAL	The size inside the supplied struct was smaller than expected
> +  -EMSGSIZE	The size inside the supplied struct was bigger than expected

Why two different errors for smaller and larger than expected? (If you keep things this
way, pelase explain the reason in this document.)

> +  -ENAMETOOLONG	A supplied name is larger than the allowed maximum size
> +
> +For KDBUS_CMD_BUS_MAKE:
> +
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid or
> +		the supplied name does not start with the current uid and a '-'
> +  -EEXIST	A bus of that name already exists
> +  -ESHUTDOWN	The domain for the bus is already shut down
> +  -EMFILE	The maximum number of buses for the current user is exhausted
> +
> +For KDBUS_CMD_ENDPOINT_MAKE:
> +
> +  -EPERM	The calling user is not privileged (see Terminology)
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid
> +  -EEXIST	An endpoint of that name already exists
> +
> +For KDBUS_CMD_HELLO:
> +
> +  -EFAULT	The supplied pool size was 0 or not a multiple of the page size
> +  -EINVAL	The flags supplied in the kdbus_cmd_make struct are invalid, or
> +		an illegal combination of KDBUS_HELLO_MONITOR,
> +		KDBUS_HELLO_ACTIVATOR and KDBUS_HELLO_POLICY_HOLDER was passed
> +		in the flags, or an invalid set of items was supplied
> +  -ECONNREFUSED	The attach_flags_send field did not satisfy the requirements of
> +		the bus
> +  -EPERM	An KDBUS_ITEM_CREDS items was supplied, but the current user is
> +		not privileged
> +  -ESHUTDOWN	The bus has already been shut down
> +  -EMFILE	The maximum number of connection on the bus has been reached

s/connection/connections/

> +  -EOPNOTSUPP	The endpoint does not support the connection flags
> +		supplied in the kdbus_cmd_hello struct
> +
> +For KDBUS_CMD_BYEBYE:
> +
> +  -EALREADY	The connection has already been shut down
> +  -EBUSY	There are still messages queued up in the connection's pool
> +
> +For KDBUS_CMD_SEND:
> +
> +  -EOPNOTSUPP	The connection is not an ordinary connection, or the passed
> +		file descriptors are either kdbus handles or unix domain
> +		sockets. Both are currently unsupported
> +  -EINVAL	The submitted payload type is KDBUS_PAYLOAD_KERNEL,
> +		KDBUS_MSG_EXPECT_REPLY was set without timeout or cookie
> +		values, KDBUS_MSG_SYNC_REPLY was set without
> +		KDBUS_MSG_EXPECT_REPLY, an invalid item was supplied,
> +		src_id was != 0 and different from the current connection's ID,

s/src_id was != 0 and different/src_id was nonzero and was different/

> +		a supplied memfd had a size of 0, a string was not properly
> +		null-terminated
> +  -ENOTUNIQ	The supplied destination is KDBUS_DST_ID_BROADCAST, a file
> +		descriptor was passed, KDBUS_MSG_EXPECT_REPLY was set,
> +		or a timeout was given for a broadcast message
> +  -E2BIG	Too many items
> +  -EMSGSIZE	The size of the message header and items or the payload vector
> +		is too big.
> +  -EEXIST	Multiple KDBUS_ITEM_FDS, KDBUS_ITEM_BLOOM_FILTER or
> +		KDBUS_ITEM_DST_NAME items were supplied
> +  -EBADF	The supplied KDBUS_ITEM_FDS or KDBUS_MSG_PAYLOAD_MEMFD items
> +		contained an illegal file descriptor
> +  -EMEDIUMTYPE	The supplied memfd is not a sealed kdbus memfd
> +  -EMFILE	Too many file descriptors inside a KDBUS_ITEM_FDS
> +  -EBADMSG	An item had illegal size, both a dst_id and a
> +		KDBUS_ITEM_DST_NAME was given, or both a name and a bloom
> +		filter was given
> +  -ETXTBSY	The supplied kdbus memfd file cannot be sealed or the seal
> +		was removed, because it is shared with other processes or
> +		still mmap()ed
> +  -ECOMM	A peer does not accept the file descriptors addressed to it
> +  -EFAULT	The supplied bloom filter size was not 64-bit aligned
> +  -EDOM		The supplied bloom filter size did not match the bloom filter
> +		size of the bus
> +  -EDESTADDRREQ	dst_id was set to KDBUS_DST_ID_NAME, but no KDBUS_ITEM_DST_NAME
> +		was attached
> +  -ESRCH	The name to look up was not found in the name registry
> +  -EADDRNOTAVAIL KDBUS_MSG_NO_AUTO_START was given but the destination
> +		 connection is an activator.
> +  -ENXIO	The passed numeric destination connection ID couldn't be found,
> +		or is not connected
> +  -ECONNRESET	The destination connection is no longer active
> +  -ETIMEDOUT	Timeout while synchronously waiting for a reply
> +  -EINTR	System call interrupted while synchronously waiting for a reply
> +  -EPIPE	When sending a message, a synchronous reply from the receiving
> +		connection was expected but the connection died before
> +		answering
> +  -ENOBUFS	Too many pending messages on the receiver side
> +  -EREMCHG	Both a well-known name and a unique name (ID) was given, but
> +		the name is not currently owned by that connection.
> +  -EXFULL	The memory pool of the receiver is full
> +  -EREMOTEIO	While synchronously waiting for a reply, the remote peer
> +		failed with an I/O error.
> +
> +For KDBUS_CMD_RECV:
> +
> +  -EINVAL	Invalid flags or offset
> +  -EAGAIN	No message found in the queue
> +  -ENOMSG	No message of the requested priority found
> +  -EOVERFLOW	Broadcast messages have been lost
> +
> +For KDBUS_CMD_FREE:
> +
> +  -ENXIO	No pool slice found at given offset
> +  -EINVAL	Invalid flags provided, the offset is valid, but the user is
> +		not allowed to free the slice. This happens, for example, if
> +		the offset was retrieved with KDBUS_RECV_PEEK.

It would be easier to read if this was written to clarify that there are 
two distinct error cases:

  -EINVAL	Invalid flags provided.
  -EINVAL       The offset is valid, but the user is
		not allowed to free the slice. This happens, for example, if
		the offset was retrieved with KDBUS_RECV_PEEK.

> +For KDBUS_CMD_NAME_ACQUIRE:
> +
> +  -EINVAL	Illegal command flags, illegal name provided, or an activator
> +		tried to acquire a second name
> +  -EPERM	Policy prohibited name ownership
> +  -EALREADY	Connection already owns that name
> +  -EEXIST	The name already exists and can not be taken over
> +  -E2BIG	The maximum number of well-known names per connection
> +		is exhausted
> +  -ECONNRESET	The connection was reset during the call
> +
> +For KDBUS_CMD_NAME_RELEASE:
> +
> +  -EINVAL	Invalid command flags, or invalid name provided
> +  -ESRCH	Name is not found found in the registry
> +  -EADDRINUSE	Name is owned by a different connection and can't be released
> +
> +For KDBUS_CMD_NAME_LIST:
> +
> +  -EINVAL	Invalid flags
> +  -ENOBUFS	No available memory in the connection's pool.
> +
> +For KDBUS_CMD_CONN_INFO:
> +
> +  -EINVAL	Invalid flags, or neither an ID nor a name was provided,
> +		or the name is invalid.
> +  -ESRCH	Connection lookup by name failed
> +  -ENXIO	No connection with the provided connection ID found
> +
> +For KDBUS_CMD_CONN_UPDATE:
> +
> +  -EINVAL	Illegal flags or items
> +  -EOPNOTSUPP	Operation not supported by connection.
> +  -E2BIG	Too many policy items attached
> +  -EINVAL	Wildcards submitted in policy entries, or illegal sequence
> +		of policy items
> +
> +For KDBUS_CMD_ENDPOINT_UPDATE:
> +
> +  -E2BIG	Too many policy items attached
> +  -EINVAL	Invalid flags, or wildcards submitted in policy entries,
> +		or illegal sequence of policy items
> +
> +For KDBUS_CMD_MATCH_ADD:
> +
> +  -EINVAL	Illegal flags or items
> +  -EDOM		Illegal bloom filter size
> +  -EMFILE	Too many matches for this connection
> +
> +For KDBUS_CMD_MATCH_REMOVE:
> +
> +  -EINVAL	Illegal flags
> +  -ENOENT	A match entry with the given cookie could not be found.
> +
> +
> +15. Internal object relations
> +===============================================================================
> +
> +This is a simplified outline of the internal kdbus object relations, for
> +those interested in the inner life of the driver implementation.
> +
> +From the a mount point's (domain's) perspective:
> +
> +struct kdbus_domain
> +  |» struct kdbus_domain_user *user (many, owned)
> +  '» struct kdbus_node node (embedded)
> +      |» struct kdbus_node children (many, referenced)
> +      |» struct kdbus_node *parent (pinned)
> +      '» struct kdbus_bus (many, pinned)
> +          |» struct kdbus_node node (embedded)
> +          '» struct kdbus_ep (many, pinned)
> +              |» struct kdbus_node node (embedded)
> +              |» struct kdbus_bus *bus (pinned)
> +              |» struct kdbus_conn conn_list (many, pinned)
> +              |   |» struct kdbus_ep *ep (pinned)
> +              |   |» struct kdbus_name_entry *activator_of (owned)
> +              |   |» struct kdbus_match_db *match_db (owned)
> +              |   |» struct kdbus_meta *meta (owned)
> +              |   |» struct kdbus_match_db *match_db (owned)
> +              |   |    '» struct kdbus_match_entry (many, owned)
> +              |   |
> +              |   |» struct kdbus_pool *pool (owned)
> +              |   |    '» struct kdbus_pool_slice *slices (many, owned)
> +              |   |       '» struct kdbus_pool *pool (pinned)
> +              |   |
> +              |   |» struct kdbus_domain_user *user (pinned)
> +              |   `» struct kdbus_queue_entry entries (many, embedded)
> +              |        |» struct kdbus_pool_slice *slice (pinned)
> +              |        |» struct kdbus_conn_reply *reply (owned)
> +              |        '» struct kdbus_domain_user *user (pinned)
> +              |
> +              '» struct kdbus_domain_user *user (pinned)
> +                  '» struct kdbus_policy_db policy_db (embedded)
> +                       |» struct kdbus_policy_db_entry (many, owned)
> +                       |   |» struct kdbus_conn (pinned)
> +                       |   '» struct kdbus_ep (pinned)
> +                       |
> +                       '» struct kdbus_policy_db_cache_entry (many, owned)
> +                           '» struct kdbus_conn (pinned)
> +
> +
> +For the life-time of a file descriptor derived from calling open() on a file
> +inside the mount point:
> +
> +struct kdbus_handle
> +  |» struct kdbus_meta *meta (owned)
> +  |» struct kdbus_ep *ep (pinned)
> +  |» struct kdbus_conn *conn (owned)
> +  '» struct kdbus_ep *ep (owned)

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 14:05   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 14:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: mtk.manpages, daniel, dh.herrmann, tixxdz

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.
> 
> Full details on what has changed from the v2 submission are at the
> bottom of this email.
> 
> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
> 
> - performance: fewer process context switches, fewer copies, fewer
>   syscalls, larger memory chunks via memfd.  This is really important
>   for a whole class of userspace programs that are ported from other
>   operating systems that are run on tiny ARM systems that rely on
>   hundreds of thousands of messages passed at boot time, and at
>   "critical" times in their user interaction loops.
> - security: the peers which communicate do not have to trust each other,
>   as the only trustworthy compoenent in the game is the kernel which
>   adds metadata and ensures that all data passed as payload is either
>   copied or sealed, so that the receiver can parse the data without
>   having to protect against changing memory while parsing buffers.  Also,
>   all the data transfer is controlled by the kernel, so that LSMs can
>   track and control what is going on, without involving userspace.
>   Because of the LSM issue, security people are much happier with this
>   model than the current scheme of having to hook into dbus to mediate
>   things.
> - more metadata can be attached to messages than in userspace
> - semantics for apps with heavy data payloads (media apps, for instance)
>   with optinal priority message dequeuing, and global message ordering.
>   Some "crazy" people are playing with using kdbus for audio data in the
>   system.  I'm not saying that this is the best model for this, but
>   until now, there wasn't any other way to do this without having to
>   create custom "busses", one for each application library.
> - being in the kernle closes a lot of races which can't be fixed with
>   the current userspace solutions.  For example, with kdbus, there is a
>   way a client can disconnect from a bus, but do so only if no further
>   messages present in its queue, which is crucial for implementing
>   race-free "exit-on-idle" services
> - eavesdropping on the kernel level, so privileged users can hook into
>   the message stream without hacking support for that into their
>   userspace processes
> - a number of smaller benefits: for example kdbus learned a way to peek
>   full messages without dequeing them, which is really useful for
>   logging metadata when handling bus-activation requests.
> 
> Of course, some of the bits above could be implemented in userspace
> alone, for example with more sophisticated memory management APIs, but
> this is usually done by losing out on the other details.  For example,
> for many of the memory management APIs, it's hard to not require the
> communicating peers to fully trust each other.  And we _really_ don't
> want peers to have to trust each other.
> 
> Another benefit of having this in the kernel, rather than as a userspace
> daemon, is that you can now easily use the bus from the initrd, or up to
> the very end when the system shuts down.  On current userspace D-Bus,
> this is not really possible, as this requires passing the bus instance
> around between initrd and the "real" system.  Such a transition of all
> fds also requires keeping full state of what has already been read from
> the connection fds.  kdbus makes this much simpler, as we can change the
> ownership of the bus, just by passing one fd over from one part to the
> other.

I tend to think that much of the above should also be part of the 
documentation file (patch 01/13).

Cheers,

Michael


 
> Regarding binder: binder and kdbus follow very different design
> concepts.  Binder implies the use of thread-pools to dispatch incoming
> method calls.  This is a very efficient scheme, and completely natural
> in programming languages like Java.  On most Linux programs, however,
> there's a much stronger focus on central poll() loops that dispatch all
> sources a program cares about.  kdbus is much more usable in such
> environments, as it doesn't enforce a threading model, and it is happy
> with serialized dispatching.  In fact, this major difference had an
> effect on much of the design decisions: binder does not guarantee global
> message ordering due to the parallel dispatching in the thread-pools,
> but  kdbus does.  Moreover, there's also a difference in the way message
> handling.  In kdbus, every message is basically taken and dispatched as
> one blob, while in binder, continious connections to other peers are
> created, which are then used to send messages on.  Hence, the models are
> quite different, and they serve different needs.  I believe that the
> D-Bus/kdbus model is more compatible and friendly with how Linux
> programs are usually implemented.
> 
> This can also be found in a git tree, the kdbus branch of char-misc.git at:
>         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/
> 
> Changes since v2:
> 
>   * Add FS_USERNS_MOUNT to the file system flags, so users can mount
>     their own kdbusfs instances without being root in the parent
>     user-ns. Spotted by Andy Lutomirski.
> 
>   * Rewrite major parts of the metadata implementation to allow for
>     per-recipient namespace translations. For this, namespaces are
>     now not pinned by domains anymore. Instead, metadata is recorded
>     in kernel scope, and exported into the currently active namespaces
>     at the time of message installing.
> 
>   * Split PID and TID from KDBUS_ITEM_CREDS into KDBUS_ITEM_PIDS.
>     The starttime is there to detect re-used PIDs, so move it to that
>     new item type as well. Consequently, introduce struct kdbus_pids
>     to accommodate the information. Requested by Andy Lutomirski.
> 
>   * Add {e,s,fs}{u,g}id to KDBUS_ITEM_CREDS, so users have a way to
>     get more fine-grained credential information.
> 
>   * Removed KDBUS_CMD_CANCEL. The interface was not usable from
>     threaded userspace implementation due to inherent races. Instead,
>     add an item type CANCEL_FD which can be used to pass a file
>     descriptor to the CMD_SEND ioctl. When the SEND is done
>     synchronously, it will get cancelled as soon as the passed
>     FD signals POLLIN.
> 
>   * Dropped startttime from KDBUS_ITEM_PIDS
> 
>   * Restrict names of custom endpoints to names with a "<uid>-" prefix,
>     just like we do for buses.
> 
>   * Provide module-parameter "kdbus.attach_flags_mask" to specify the
>     a mask of metadata items that is applied on all exported items.
> 
>   * Monitors are now entirely invisible (IOW, there won't be any
>     notification when they are created) and they don't need to install
>     filters for broadcast messages anymore.
> 
>   * All information exposed via a connection's pool now also reports
>     the length in addition to the offset. That way, userspace
>     applications can mmap() only parts of the pool on demand.
> 
>   * Due to the metadata rework, KDBUS_ITEM_PAYLOAD_OFF items now
>     describe the offset relative to the pool, where they used to be
>     relative to the message header.
> 
>   * Added return_flags bitmask to all kdbus_cmd_* structs, so the
>     kernel can report details of the command processing. This is
>     mostly reserved for future extensions.
> 
>   * Some fixes in kdbus.txt and tests, spotted by Harald Hoyer, Andy
>     Lutomirski, Michele Curti, Sergei Zviagintsev, Sheng Yong, Torstein
>     Husebø and Hristo Venev.
> 
>   * Fixed compiler warnings in test-message by Michele Curti
> 
>   * Unexpected items are now rejected with -EINVAL
> 
>   * Split signal and broadcast handling. Unicast signals are now
>     supported, and messages have a new KDBUS_MSG_SIGNAL flag.
> 
>   * KDBUS_CMD_MSG_SEND was renamed to KDBUS_CMD_SEND, and now takes
>     a struct kdbus_cmd_send instead of a kdbus_msg.
> 
>   * KDBUS_CMD_MSG_RECV was renamed to KDBUS_CMD_RECV.
> 
>   * Test case memory leak plugged, and various other cleanups and
>     fixes, by Rui Miguel Silva.
> 
>   * Build fix for s390
> 
>   * Test case fix for 32bit archs
> 
>   * The test framework now supports mount, pid and user namespaces.
> 
>   * The test framework learned a --tap command line parameter to
>     format its output in the "Test Anything Protocol". This format
>     is chosen by default when "make kselftest" is invoked.
> 
>   * Fixed buses and custom endpoints name validation, reported by
>     Andy Lutomirski.
> 
>   * copy_from_user() return code issue fixed, reported by
>     Dan Carpenter.
> 
>   * Avoid signed int overflow on archs without atomic_sub
> 
>   * Avoid variable size stack items. Fixes a sparse warning in queue.c.
> 
>   * New test case for kernel notification quota
> 
>   * Switched back to enums for the list of ioctls. This has advantages
>     for userspace code as gdb, for instance, is able to resolve the
>     numbers into names. Added features can easily be detected with
>     autotools, and new iotcls can get #defines as well. Having #defines
>     for the initial set of ioctls is uncecessary.
> 
> Daniel Mack (13):
>   kdbus: add documentation
>   kdbus: add header file
>   kdbus: add driver skeleton, ioctl entry points and utility functions
>   kdbus: add connection pool implementation
>   kdbus: add connection, queue handling and message validation code
>   kdbus: add node and filesystem implementation
>   kdbus: add code to gather metadata
>   kdbus: add code for notifications and matches
>   kdbus: add code for buses, domains and endpoints
>   kdbus: add name registry implementation
>   kdbus: add policy database implementation
>   kdbus: add Makefile, Kconfig and MAINTAINERS entry
>   kdbus: add selftests
> 
>  Documentation/ioctl/ioctl-number.txt              |    1 +
>  Documentation/kdbus.txt                           | 2107 +++++++++++++++++++++
>  MAINTAINERS                                       |   12 +
>  include/uapi/linux/Kbuild                         |    1 +
>  include/uapi/linux/kdbus.h                        | 1049 ++++++++++
>  include/uapi/linux/magic.h                        |    2 +
>  init/Kconfig                                      |   12 +
>  ipc/Makefile                                      |    2 +-
>  ipc/kdbus/Makefile                                |   22 +
>  ipc/kdbus/bus.c                                   |  553 ++++++
>  ipc/kdbus/bus.h                                   |  103 +
>  ipc/kdbus/connection.c                            | 2004 ++++++++++++++++++++
>  ipc/kdbus/connection.h                            |  262 +++
>  ipc/kdbus/domain.c                                |  350 ++++
>  ipc/kdbus/domain.h                                |   84 +
>  ipc/kdbus/endpoint.c                              |  232 +++
>  ipc/kdbus/endpoint.h                              |   68 +
>  ipc/kdbus/fs.c                                    |  519 +++++
>  ipc/kdbus/fs.h                                    |   25 +
>  ipc/kdbus/handle.c                                | 1134 +++++++++++
>  ipc/kdbus/handle.h                                |   20 +
>  ipc/kdbus/item.c                                  |  309 +++
>  ipc/kdbus/item.h                                  |   57 +
>  ipc/kdbus/limits.h                                |   95 +
>  ipc/kdbus/main.c                                  |   72 +
>  ipc/kdbus/match.c                                 |  535 ++++++
>  ipc/kdbus/match.h                                 |   32 +
>  ipc/kdbus/message.c                               |  598 ++++++
>  ipc/kdbus/message.h                               |  133 ++
>  ipc/kdbus/metadata.c                              | 1066 +++++++++++
>  ipc/kdbus/metadata.h                              |   52 +
>  ipc/kdbus/names.c                                 |  891 +++++++++
>  ipc/kdbus/names.h                                 |   82 +
>  ipc/kdbus/node.c                                  |  910 +++++++++
>  ipc/kdbus/node.h                                  |   87 +
>  ipc/kdbus/notify.c                                |  244 +++
>  ipc/kdbus/notify.h                                |   30 +
>  ipc/kdbus/policy.c                                |  481 +++++
>  ipc/kdbus/policy.h                                |   51 +
>  ipc/kdbus/pool.c                                  |  784 ++++++++
>  ipc/kdbus/pool.h                                  |   47 +
>  ipc/kdbus/queue.c                                 |  505 +++++
>  ipc/kdbus/queue.h                                 |  108 ++
>  ipc/kdbus/reply.c                                 |  262 +++
>  ipc/kdbus/reply.h                                 |   68 +
>  ipc/kdbus/util.c                                  |  317 ++++
>  ipc/kdbus/util.h                                  |  133 ++
>  tools/testing/selftests/Makefile                  |    1 +
>  tools/testing/selftests/kdbus/.gitignore          |   11 +
>  tools/testing/selftests/kdbus/Makefile            |   46 +
>  tools/testing/selftests/kdbus/kdbus-enum.c        |   95 +
>  tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
>  tools/testing/selftests/kdbus/kdbus-test.c        |  920 +++++++++
>  tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
>  tools/testing/selftests/kdbus/kdbus-util.c        | 1646 ++++++++++++++++
>  tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
>  tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
>  tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++
>  tools/testing/selftests/kdbus/test-benchmark.c    |  427 +++++
>  tools/testing/selftests/kdbus/test-bus.c          |  174 ++
>  tools/testing/selftests/kdbus/test-chat.c         |  123 ++
>  tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++
>  tools/testing/selftests/kdbus/test-daemon.c       |   66 +
>  tools/testing/selftests/kdbus/test-endpoint.c     |  344 ++++
>  tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++
>  tools/testing/selftests/kdbus/test-free.c         |   36 +
>  tools/testing/selftests/kdbus/test-match.c        |  442 +++++
>  tools/testing/selftests/kdbus/test-message.c      |  658 +++++++
>  tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++
>  tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
>  tools/testing/selftests/kdbus/test-names.c        |  184 ++
>  tools/testing/selftests/kdbus/test-policy-ns.c    |  633 +++++++
>  tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 +++++++++++++
>  tools/testing/selftests/kdbus/test-policy.c       |   81 +
>  tools/testing/selftests/kdbus/test-race.c         |  313 +++
>  tools/testing/selftests/kdbus/test-sync.c         |  368 ++++
>  tools/testing/selftests/kdbus/test-timeout.c      |   99 +
>  77 files changed, 27818 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/kdbus.txt
>  create mode 100644 include/uapi/linux/kdbus.h
>  create mode 100644 ipc/kdbus/Makefile
>  create mode 100644 ipc/kdbus/bus.c
>  create mode 100644 ipc/kdbus/bus.h
>  create mode 100644 ipc/kdbus/connection.c
>  create mode 100644 ipc/kdbus/connection.h
>  create mode 100644 ipc/kdbus/domain.c
>  create mode 100644 ipc/kdbus/domain.h
>  create mode 100644 ipc/kdbus/endpoint.c
>  create mode 100644 ipc/kdbus/endpoint.h
>  create mode 100644 ipc/kdbus/fs.c
>  create mode 100644 ipc/kdbus/fs.h
>  create mode 100644 ipc/kdbus/handle.c
>  create mode 100644 ipc/kdbus/handle.h
>  create mode 100644 ipc/kdbus/item.c
>  create mode 100644 ipc/kdbus/item.h
>  create mode 100644 ipc/kdbus/limits.h
>  create mode 100644 ipc/kdbus/main.c
>  create mode 100644 ipc/kdbus/match.c
>  create mode 100644 ipc/kdbus/match.h
>  create mode 100644 ipc/kdbus/message.c
>  create mode 100644 ipc/kdbus/message.h
>  create mode 100644 ipc/kdbus/metadata.c
>  create mode 100644 ipc/kdbus/metadata.h
>  create mode 100644 ipc/kdbus/names.c
>  create mode 100644 ipc/kdbus/names.h
>  create mode 100644 ipc/kdbus/node.c
>  create mode 100644 ipc/kdbus/node.h
>  create mode 100644 ipc/kdbus/notify.c
>  create mode 100644 ipc/kdbus/notify.h
>  create mode 100644 ipc/kdbus/policy.c
>  create mode 100644 ipc/kdbus/policy.h
>  create mode 100644 ipc/kdbus/pool.c
>  create mode 100644 ipc/kdbus/pool.h
>  create mode 100644 ipc/kdbus/queue.c
>  create mode 100644 ipc/kdbus/queue.h
>  create mode 100644 ipc/kdbus/reply.c
>  create mode 100644 ipc/kdbus/reply.h
>  create mode 100644 ipc/kdbus/util.c
>  create mode 100644 ipc/kdbus/util.h
>  create mode 100644 tools/testing/selftests/kdbus/.gitignore
>  create mode 100644 tools/testing/selftests/kdbus/Makefile
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
>  create mode 100644 tools/testing/selftests/kdbus/test-activator.c
>  create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
>  create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
>  create mode 100644 tools/testing/selftests/kdbus/test-bus.c
>  create mode 100644 tools/testing/selftests/kdbus/test-chat.c
>  create mode 100644 tools/testing/selftests/kdbus/test-connection.c
>  create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
>  create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
>  create mode 100644 tools/testing/selftests/kdbus/test-fd.c
>  create mode 100644 tools/testing/selftests/kdbus/test-free.c
>  create mode 100644 tools/testing/selftests/kdbus/test-match.c
>  create mode 100644 tools/testing/selftests/kdbus/test-message.c
>  create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
>  create mode 100644 tools/testing/selftests/kdbus/test-names.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy.c
>  create mode 100644 tools/testing/selftests/kdbus/test-race.c
>  create mode 100644 tools/testing/selftests/kdbus/test-sync.c
>  create mode 100644 tools/testing/selftests/kdbus/test-timeout.c
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-20 14:05   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 14:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.
> 
> Full details on what has changed from the v2 submission are at the
> bottom of this email.
> 
> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
> 
> - performance: fewer process context switches, fewer copies, fewer
>   syscalls, larger memory chunks via memfd.  This is really important
>   for a whole class of userspace programs that are ported from other
>   operating systems that are run on tiny ARM systems that rely on
>   hundreds of thousands of messages passed at boot time, and at
>   "critical" times in their user interaction loops.
> - security: the peers which communicate do not have to trust each other,
>   as the only trustworthy compoenent in the game is the kernel which
>   adds metadata and ensures that all data passed as payload is either
>   copied or sealed, so that the receiver can parse the data without
>   having to protect against changing memory while parsing buffers.  Also,
>   all the data transfer is controlled by the kernel, so that LSMs can
>   track and control what is going on, without involving userspace.
>   Because of the LSM issue, security people are much happier with this
>   model than the current scheme of having to hook into dbus to mediate
>   things.
> - more metadata can be attached to messages than in userspace
> - semantics for apps with heavy data payloads (media apps, for instance)
>   with optinal priority message dequeuing, and global message ordering.
>   Some "crazy" people are playing with using kdbus for audio data in the
>   system.  I'm not saying that this is the best model for this, but
>   until now, there wasn't any other way to do this without having to
>   create custom "busses", one for each application library.
> - being in the kernle closes a lot of races which can't be fixed with
>   the current userspace solutions.  For example, with kdbus, there is a
>   way a client can disconnect from a bus, but do so only if no further
>   messages present in its queue, which is crucial for implementing
>   race-free "exit-on-idle" services
> - eavesdropping on the kernel level, so privileged users can hook into
>   the message stream without hacking support for that into their
>   userspace processes
> - a number of smaller benefits: for example kdbus learned a way to peek
>   full messages without dequeing them, which is really useful for
>   logging metadata when handling bus-activation requests.
> 
> Of course, some of the bits above could be implemented in userspace
> alone, for example with more sophisticated memory management APIs, but
> this is usually done by losing out on the other details.  For example,
> for many of the memory management APIs, it's hard to not require the
> communicating peers to fully trust each other.  And we _really_ don't
> want peers to have to trust each other.
> 
> Another benefit of having this in the kernel, rather than as a userspace
> daemon, is that you can now easily use the bus from the initrd, or up to
> the very end when the system shuts down.  On current userspace D-Bus,
> this is not really possible, as this requires passing the bus instance
> around between initrd and the "real" system.  Such a transition of all
> fds also requires keeping full state of what has already been read from
> the connection fds.  kdbus makes this much simpler, as we can change the
> ownership of the bus, just by passing one fd over from one part to the
> other.

I tend to think that much of the above should also be part of the 
documentation file (patch 01/13).

Cheers,

Michael


 
> Regarding binder: binder and kdbus follow very different design
> concepts.  Binder implies the use of thread-pools to dispatch incoming
> method calls.  This is a very efficient scheme, and completely natural
> in programming languages like Java.  On most Linux programs, however,
> there's a much stronger focus on central poll() loops that dispatch all
> sources a program cares about.  kdbus is much more usable in such
> environments, as it doesn't enforce a threading model, and it is happy
> with serialized dispatching.  In fact, this major difference had an
> effect on much of the design decisions: binder does not guarantee global
> message ordering due to the parallel dispatching in the thread-pools,
> but  kdbus does.  Moreover, there's also a difference in the way message
> handling.  In kdbus, every message is basically taken and dispatched as
> one blob, while in binder, continious connections to other peers are
> created, which are then used to send messages on.  Hence, the models are
> quite different, and they serve different needs.  I believe that the
> D-Bus/kdbus model is more compatible and friendly with how Linux
> programs are usually implemented.
> 
> This can also be found in a git tree, the kdbus branch of char-misc.git at:
>         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/
> 
> Changes since v2:
> 
>   * Add FS_USERNS_MOUNT to the file system flags, so users can mount
>     their own kdbusfs instances without being root in the parent
>     user-ns. Spotted by Andy Lutomirski.
> 
>   * Rewrite major parts of the metadata implementation to allow for
>     per-recipient namespace translations. For this, namespaces are
>     now not pinned by domains anymore. Instead, metadata is recorded
>     in kernel scope, and exported into the currently active namespaces
>     at the time of message installing.
> 
>   * Split PID and TID from KDBUS_ITEM_CREDS into KDBUS_ITEM_PIDS.
>     The starttime is there to detect re-used PIDs, so move it to that
>     new item type as well. Consequently, introduce struct kdbus_pids
>     to accommodate the information. Requested by Andy Lutomirski.
> 
>   * Add {e,s,fs}{u,g}id to KDBUS_ITEM_CREDS, so users have a way to
>     get more fine-grained credential information.
> 
>   * Removed KDBUS_CMD_CANCEL. The interface was not usable from
>     threaded userspace implementation due to inherent races. Instead,
>     add an item type CANCEL_FD which can be used to pass a file
>     descriptor to the CMD_SEND ioctl. When the SEND is done
>     synchronously, it will get cancelled as soon as the passed
>     FD signals POLLIN.
> 
>   * Dropped startttime from KDBUS_ITEM_PIDS
> 
>   * Restrict names of custom endpoints to names with a "<uid>-" prefix,
>     just like we do for buses.
> 
>   * Provide module-parameter "kdbus.attach_flags_mask" to specify the
>     a mask of metadata items that is applied on all exported items.
> 
>   * Monitors are now entirely invisible (IOW, there won't be any
>     notification when they are created) and they don't need to install
>     filters for broadcast messages anymore.
> 
>   * All information exposed via a connection's pool now also reports
>     the length in addition to the offset. That way, userspace
>     applications can mmap() only parts of the pool on demand.
> 
>   * Due to the metadata rework, KDBUS_ITEM_PAYLOAD_OFF items now
>     describe the offset relative to the pool, where they used to be
>     relative to the message header.
> 
>   * Added return_flags bitmask to all kdbus_cmd_* structs, so the
>     kernel can report details of the command processing. This is
>     mostly reserved for future extensions.
> 
>   * Some fixes in kdbus.txt and tests, spotted by Harald Hoyer, Andy
>     Lutomirski, Michele Curti, Sergei Zviagintsev, Sheng Yong, Torstein
>     Husebø and Hristo Venev.
> 
>   * Fixed compiler warnings in test-message by Michele Curti
> 
>   * Unexpected items are now rejected with -EINVAL
> 
>   * Split signal and broadcast handling. Unicast signals are now
>     supported, and messages have a new KDBUS_MSG_SIGNAL flag.
> 
>   * KDBUS_CMD_MSG_SEND was renamed to KDBUS_CMD_SEND, and now takes
>     a struct kdbus_cmd_send instead of a kdbus_msg.
> 
>   * KDBUS_CMD_MSG_RECV was renamed to KDBUS_CMD_RECV.
> 
>   * Test case memory leak plugged, and various other cleanups and
>     fixes, by Rui Miguel Silva.
> 
>   * Build fix for s390
> 
>   * Test case fix for 32bit archs
> 
>   * The test framework now supports mount, pid and user namespaces.
> 
>   * The test framework learned a --tap command line parameter to
>     format its output in the "Test Anything Protocol". This format
>     is chosen by default when "make kselftest" is invoked.
> 
>   * Fixed buses and custom endpoints name validation, reported by
>     Andy Lutomirski.
> 
>   * copy_from_user() return code issue fixed, reported by
>     Dan Carpenter.
> 
>   * Avoid signed int overflow on archs without atomic_sub
> 
>   * Avoid variable size stack items. Fixes a sparse warning in queue.c.
> 
>   * New test case for kernel notification quota
> 
>   * Switched back to enums for the list of ioctls. This has advantages
>     for userspace code as gdb, for instance, is able to resolve the
>     numbers into names. Added features can easily be detected with
>     autotools, and new iotcls can get #defines as well. Having #defines
>     for the initial set of ioctls is uncecessary.
> 
> Daniel Mack (13):
>   kdbus: add documentation
>   kdbus: add header file
>   kdbus: add driver skeleton, ioctl entry points and utility functions
>   kdbus: add connection pool implementation
>   kdbus: add connection, queue handling and message validation code
>   kdbus: add node and filesystem implementation
>   kdbus: add code to gather metadata
>   kdbus: add code for notifications and matches
>   kdbus: add code for buses, domains and endpoints
>   kdbus: add name registry implementation
>   kdbus: add policy database implementation
>   kdbus: add Makefile, Kconfig and MAINTAINERS entry
>   kdbus: add selftests
> 
>  Documentation/ioctl/ioctl-number.txt              |    1 +
>  Documentation/kdbus.txt                           | 2107 +++++++++++++++++++++
>  MAINTAINERS                                       |   12 +
>  include/uapi/linux/Kbuild                         |    1 +
>  include/uapi/linux/kdbus.h                        | 1049 ++++++++++
>  include/uapi/linux/magic.h                        |    2 +
>  init/Kconfig                                      |   12 +
>  ipc/Makefile                                      |    2 +-
>  ipc/kdbus/Makefile                                |   22 +
>  ipc/kdbus/bus.c                                   |  553 ++++++
>  ipc/kdbus/bus.h                                   |  103 +
>  ipc/kdbus/connection.c                            | 2004 ++++++++++++++++++++
>  ipc/kdbus/connection.h                            |  262 +++
>  ipc/kdbus/domain.c                                |  350 ++++
>  ipc/kdbus/domain.h                                |   84 +
>  ipc/kdbus/endpoint.c                              |  232 +++
>  ipc/kdbus/endpoint.h                              |   68 +
>  ipc/kdbus/fs.c                                    |  519 +++++
>  ipc/kdbus/fs.h                                    |   25 +
>  ipc/kdbus/handle.c                                | 1134 +++++++++++
>  ipc/kdbus/handle.h                                |   20 +
>  ipc/kdbus/item.c                                  |  309 +++
>  ipc/kdbus/item.h                                  |   57 +
>  ipc/kdbus/limits.h                                |   95 +
>  ipc/kdbus/main.c                                  |   72 +
>  ipc/kdbus/match.c                                 |  535 ++++++
>  ipc/kdbus/match.h                                 |   32 +
>  ipc/kdbus/message.c                               |  598 ++++++
>  ipc/kdbus/message.h                               |  133 ++
>  ipc/kdbus/metadata.c                              | 1066 +++++++++++
>  ipc/kdbus/metadata.h                              |   52 +
>  ipc/kdbus/names.c                                 |  891 +++++++++
>  ipc/kdbus/names.h                                 |   82 +
>  ipc/kdbus/node.c                                  |  910 +++++++++
>  ipc/kdbus/node.h                                  |   87 +
>  ipc/kdbus/notify.c                                |  244 +++
>  ipc/kdbus/notify.h                                |   30 +
>  ipc/kdbus/policy.c                                |  481 +++++
>  ipc/kdbus/policy.h                                |   51 +
>  ipc/kdbus/pool.c                                  |  784 ++++++++
>  ipc/kdbus/pool.h                                  |   47 +
>  ipc/kdbus/queue.c                                 |  505 +++++
>  ipc/kdbus/queue.h                                 |  108 ++
>  ipc/kdbus/reply.c                                 |  262 +++
>  ipc/kdbus/reply.h                                 |   68 +
>  ipc/kdbus/util.c                                  |  317 ++++
>  ipc/kdbus/util.h                                  |  133 ++
>  tools/testing/selftests/Makefile                  |    1 +
>  tools/testing/selftests/kdbus/.gitignore          |   11 +
>  tools/testing/selftests/kdbus/Makefile            |   46 +
>  tools/testing/selftests/kdbus/kdbus-enum.c        |   95 +
>  tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
>  tools/testing/selftests/kdbus/kdbus-test.c        |  920 +++++++++
>  tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
>  tools/testing/selftests/kdbus/kdbus-util.c        | 1646 ++++++++++++++++
>  tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
>  tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
>  tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++
>  tools/testing/selftests/kdbus/test-benchmark.c    |  427 +++++
>  tools/testing/selftests/kdbus/test-bus.c          |  174 ++
>  tools/testing/selftests/kdbus/test-chat.c         |  123 ++
>  tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++
>  tools/testing/selftests/kdbus/test-daemon.c       |   66 +
>  tools/testing/selftests/kdbus/test-endpoint.c     |  344 ++++
>  tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++
>  tools/testing/selftests/kdbus/test-free.c         |   36 +
>  tools/testing/selftests/kdbus/test-match.c        |  442 +++++
>  tools/testing/selftests/kdbus/test-message.c      |  658 +++++++
>  tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++
>  tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
>  tools/testing/selftests/kdbus/test-names.c        |  184 ++
>  tools/testing/selftests/kdbus/test-policy-ns.c    |  633 +++++++
>  tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 +++++++++++++
>  tools/testing/selftests/kdbus/test-policy.c       |   81 +
>  tools/testing/selftests/kdbus/test-race.c         |  313 +++
>  tools/testing/selftests/kdbus/test-sync.c         |  368 ++++
>  tools/testing/selftests/kdbus/test-timeout.c      |   99 +
>  77 files changed, 27818 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/kdbus.txt
>  create mode 100644 include/uapi/linux/kdbus.h
>  create mode 100644 ipc/kdbus/Makefile
>  create mode 100644 ipc/kdbus/bus.c
>  create mode 100644 ipc/kdbus/bus.h
>  create mode 100644 ipc/kdbus/connection.c
>  create mode 100644 ipc/kdbus/connection.h
>  create mode 100644 ipc/kdbus/domain.c
>  create mode 100644 ipc/kdbus/domain.h
>  create mode 100644 ipc/kdbus/endpoint.c
>  create mode 100644 ipc/kdbus/endpoint.h
>  create mode 100644 ipc/kdbus/fs.c
>  create mode 100644 ipc/kdbus/fs.h
>  create mode 100644 ipc/kdbus/handle.c
>  create mode 100644 ipc/kdbus/handle.h
>  create mode 100644 ipc/kdbus/item.c
>  create mode 100644 ipc/kdbus/item.h
>  create mode 100644 ipc/kdbus/limits.h
>  create mode 100644 ipc/kdbus/main.c
>  create mode 100644 ipc/kdbus/match.c
>  create mode 100644 ipc/kdbus/match.h
>  create mode 100644 ipc/kdbus/message.c
>  create mode 100644 ipc/kdbus/message.h
>  create mode 100644 ipc/kdbus/metadata.c
>  create mode 100644 ipc/kdbus/metadata.h
>  create mode 100644 ipc/kdbus/names.c
>  create mode 100644 ipc/kdbus/names.h
>  create mode 100644 ipc/kdbus/node.c
>  create mode 100644 ipc/kdbus/node.h
>  create mode 100644 ipc/kdbus/notify.c
>  create mode 100644 ipc/kdbus/notify.h
>  create mode 100644 ipc/kdbus/policy.c
>  create mode 100644 ipc/kdbus/policy.h
>  create mode 100644 ipc/kdbus/pool.c
>  create mode 100644 ipc/kdbus/pool.h
>  create mode 100644 ipc/kdbus/queue.c
>  create mode 100644 ipc/kdbus/queue.h
>  create mode 100644 ipc/kdbus/reply.c
>  create mode 100644 ipc/kdbus/reply.h
>  create mode 100644 ipc/kdbus/util.c
>  create mode 100644 ipc/kdbus/util.h
>  create mode 100644 tools/testing/selftests/kdbus/.gitignore
>  create mode 100644 tools/testing/selftests/kdbus/Makefile
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
>  create mode 100644 tools/testing/selftests/kdbus/test-activator.c
>  create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
>  create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
>  create mode 100644 tools/testing/selftests/kdbus/test-bus.c
>  create mode 100644 tools/testing/selftests/kdbus/test-chat.c
>  create mode 100644 tools/testing/selftests/kdbus/test-connection.c
>  create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
>  create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
>  create mode 100644 tools/testing/selftests/kdbus/test-fd.c
>  create mode 100644 tools/testing/selftests/kdbus/test-free.c
>  create mode 100644 tools/testing/selftests/kdbus/test-match.c
>  create mode 100644 tools/testing/selftests/kdbus/test-message.c
>  create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
>  create mode 100644 tools/testing/selftests/kdbus/test-names.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy.c
>  create mode 100644 tools/testing/selftests/kdbus/test-race.c
>  create mode 100644 tools/testing/selftests/kdbus/test-sync.c
>  create mode 100644 tools/testing/selftests/kdbus/test-timeout.c
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-20 13:24                   ` Johannes Stezenbach
  (?)
@ 2015-01-20 14:12                   ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 14:12 UTC (permalink / raw)
  To: Johannes Stezenbach, Greg Kroah-Hartman
  Cc: mtk.manpages, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel, daniel, dh.herrmann, tixxdz

On 01/20/2015 02:24 PM, Johannes Stezenbach wrote:
> On Tue, Jan 20, 2015 at 07:26:09PM +0800, Greg Kroah-Hartman wrote:
>> On Tue, Jan 20, 2015 at 11:57:12AM +0100, Johannes Stezenbach wrote:

>>> My guess is that the people porting from QNX were just confused
>>> and their use of D-Bus was in error.  Maybe they should've used
>>> plain sockets, capnproto, ZeroMQ or whatever.
>>
>> I tend to trust that they knew what they were doing, they wouldn't have
>> picked D-Bus for no good reason.
> 
> The automotive developers I had the pleasure to work with would
> use anything which is available via a mouse click in the
> commercial Embedded Linux SDK IDE of their choice :)
> Let's face it: QNX has a single IPC solution while Linux has
> a confusing multitude of possibilities.

Greg, from my spell in IVI, I too have to say your faith in the
wisdom of IVI developers' choices is touching. I think D-Bus was 
in the main picked because it had some nice features, but then 
people realized it had no bandwidth, and the solution has been 
"make D-Bus faster", rather than "maybe we should explore 
other (mixed model) solutions". This isn't to say that I'm
against adding kdbus, but I don't think there's much strength to
the argument you make above.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
  2015-01-16 19:16 ` Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  (?)
@ 2015-01-20 14:15 ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-20 14:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: mtk.manpages, Daniel Mack, dh.herrmann, tixxdz

[Bother. Futzed Daniel Mack's email address. Resending]

On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> kdbus is a kernel-level IPC implementation that aims for resemblance to
> the the protocol layer with the existing userspace D-Bus daemon while
> enabling some features that couldn't be implemented before in userspace.
> 
> The documentation in the first patch in this series explains the
> protocol and the API details.
> 
> Full details on what has changed from the v2 submission are at the
> bottom of this email.
> 
> Reasons why this should be done in the kernel, instead of userspace as
> it is currently done today include the following:
> 
> - performance: fewer process context switches, fewer copies, fewer
>   syscalls, larger memory chunks via memfd.  This is really important
>   for a whole class of userspace programs that are ported from other
>   operating systems that are run on tiny ARM systems that rely on
>   hundreds of thousands of messages passed at boot time, and at
>   "critical" times in their user interaction loops.
> - security: the peers which communicate do not have to trust each other,
>   as the only trustworthy compoenent in the game is the kernel which
>   adds metadata and ensures that all data passed as payload is either
>   copied or sealed, so that the receiver can parse the data without
>   having to protect against changing memory while parsing buffers.  Also,
>   all the data transfer is controlled by the kernel, so that LSMs can
>   track and control what is going on, without involving userspace.
>   Because of the LSM issue, security people are much happier with this
>   model than the current scheme of having to hook into dbus to mediate
>   things.
> - more metadata can be attached to messages than in userspace
> - semantics for apps with heavy data payloads (media apps, for instance)
>   with optinal priority message dequeuing, and global message ordering.
>   Some "crazy" people are playing with using kdbus for audio data in the
>   system.  I'm not saying that this is the best model for this, but
>   until now, there wasn't any other way to do this without having to
>   create custom "busses", one for each application library.
> - being in the kernle closes a lot of races which can't be fixed with
>   the current userspace solutions.  For example, with kdbus, there is a
>   way a client can disconnect from a bus, but do so only if no further
>   messages present in its queue, which is crucial for implementing
>   race-free "exit-on-idle" services
> - eavesdropping on the kernel level, so privileged users can hook into
>   the message stream without hacking support for that into their
>   userspace processes
> - a number of smaller benefits: for example kdbus learned a way to peek
>   full messages without dequeing them, which is really useful for
>   logging metadata when handling bus-activation requests.
> 
> Of course, some of the bits above could be implemented in userspace
> alone, for example with more sophisticated memory management APIs, but
> this is usually done by losing out on the other details.  For example,
> for many of the memory management APIs, it's hard to not require the
> communicating peers to fully trust each other.  And we _really_ don't
> want peers to have to trust each other.
> 
> Another benefit of having this in the kernel, rather than as a userspace
> daemon, is that you can now easily use the bus from the initrd, or up to
> the very end when the system shuts down.  On current userspace D-Bus,
> this is not really possible, as this requires passing the bus instance
> around between initrd and the "real" system.  Such a transition of all
> fds also requires keeping full state of what has already been read from
> the connection fds.  kdbus makes this much simpler, as we can change the
> ownership of the bus, just by passing one fd over from one part to the
> other.

I tend to think that much of the above should also be part of the 
documentation file (patch 01/13).

Cheers,

Michael


 
> Regarding binder: binder and kdbus follow very different design
> concepts.  Binder implies the use of thread-pools to dispatch incoming
> method calls.  This is a very efficient scheme, and completely natural
> in programming languages like Java.  On most Linux programs, however,
> there's a much stronger focus on central poll() loops that dispatch all
> sources a program cares about.  kdbus is much more usable in such
> environments, as it doesn't enforce a threading model, and it is happy
> with serialized dispatching.  In fact, this major difference had an
> effect on much of the design decisions: binder does not guarantee global
> message ordering due to the parallel dispatching in the thread-pools,
> but  kdbus does.  Moreover, there's also a difference in the way message
> handling.  In kdbus, every message is basically taken and dispatched as
> one blob, while in binder, continious connections to other peers are
> created, which are then used to send messages on.  Hence, the models are
> quite different, and they serve different needs.  I believe that the
> D-Bus/kdbus model is more compatible and friendly with how Linux
> programs are usually implemented.
> 
> This can also be found in a git tree, the kdbus branch of char-misc.git at:
>         https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/
> 
> Changes since v2:
> 
>   * Add FS_USERNS_MOUNT to the file system flags, so users can mount
>     their own kdbusfs instances without being root in the parent
>     user-ns. Spotted by Andy Lutomirski.
> 
>   * Rewrite major parts of the metadata implementation to allow for
>     per-recipient namespace translations. For this, namespaces are
>     now not pinned by domains anymore. Instead, metadata is recorded
>     in kernel scope, and exported into the currently active namespaces
>     at the time of message installing.
> 
>   * Split PID and TID from KDBUS_ITEM_CREDS into KDBUS_ITEM_PIDS.
>     The starttime is there to detect re-used PIDs, so move it to that
>     new item type as well. Consequently, introduce struct kdbus_pids
>     to accommodate the information. Requested by Andy Lutomirski.
> 
>   * Add {e,s,fs}{u,g}id to KDBUS_ITEM_CREDS, so users have a way to
>     get more fine-grained credential information.
> 
>   * Removed KDBUS_CMD_CANCEL. The interface was not usable from
>     threaded userspace implementation due to inherent races. Instead,
>     add an item type CANCEL_FD which can be used to pass a file
>     descriptor to the CMD_SEND ioctl. When the SEND is done
>     synchronously, it will get cancelled as soon as the passed
>     FD signals POLLIN.
> 
>   * Dropped startttime from KDBUS_ITEM_PIDS
> 
>   * Restrict names of custom endpoints to names with a "<uid>-" prefix,
>     just like we do for buses.
> 
>   * Provide module-parameter "kdbus.attach_flags_mask" to specify the
>     a mask of metadata items that is applied on all exported items.
> 
>   * Monitors are now entirely invisible (IOW, there won't be any
>     notification when they are created) and they don't need to install
>     filters for broadcast messages anymore.
> 
>   * All information exposed via a connection's pool now also reports
>     the length in addition to the offset. That way, userspace
>     applications can mmap() only parts of the pool on demand.
> 
>   * Due to the metadata rework, KDBUS_ITEM_PAYLOAD_OFF items now
>     describe the offset relative to the pool, where they used to be
>     relative to the message header.
> 
>   * Added return_flags bitmask to all kdbus_cmd_* structs, so the
>     kernel can report details of the command processing. This is
>     mostly reserved for future extensions.
> 
>   * Some fixes in kdbus.txt and tests, spotted by Harald Hoyer, Andy
>     Lutomirski, Michele Curti, Sergei Zviagintsev, Sheng Yong, Torstein
>     Husebø and Hristo Venev.
> 
>   * Fixed compiler warnings in test-message by Michele Curti
> 
>   * Unexpected items are now rejected with -EINVAL
> 
>   * Split signal and broadcast handling. Unicast signals are now
>     supported, and messages have a new KDBUS_MSG_SIGNAL flag.
> 
>   * KDBUS_CMD_MSG_SEND was renamed to KDBUS_CMD_SEND, and now takes
>     a struct kdbus_cmd_send instead of a kdbus_msg.
> 
>   * KDBUS_CMD_MSG_RECV was renamed to KDBUS_CMD_RECV.
> 
>   * Test case memory leak plugged, and various other cleanups and
>     fixes, by Rui Miguel Silva.
> 
>   * Build fix for s390
> 
>   * Test case fix for 32bit archs
> 
>   * The test framework now supports mount, pid and user namespaces.
> 
>   * The test framework learned a --tap command line parameter to
>     format its output in the "Test Anything Protocol". This format
>     is chosen by default when "make kselftest" is invoked.
> 
>   * Fixed buses and custom endpoints name validation, reported by
>     Andy Lutomirski.
> 
>   * copy_from_user() return code issue fixed, reported by
>     Dan Carpenter.
> 
>   * Avoid signed int overflow on archs without atomic_sub
> 
>   * Avoid variable size stack items. Fixes a sparse warning in queue.c.
> 
>   * New test case for kernel notification quota
> 
>   * Switched back to enums for the list of ioctls. This has advantages
>     for userspace code as gdb, for instance, is able to resolve the
>     numbers into names. Added features can easily be detected with
>     autotools, and new iotcls can get #defines as well. Having #defines
>     for the initial set of ioctls is uncecessary.
> 
> Daniel Mack (13):
>   kdbus: add documentation
>   kdbus: add header file
>   kdbus: add driver skeleton, ioctl entry points and utility functions
>   kdbus: add connection pool implementation
>   kdbus: add connection, queue handling and message validation code
>   kdbus: add node and filesystem implementation
>   kdbus: add code to gather metadata
>   kdbus: add code for notifications and matches
>   kdbus: add code for buses, domains and endpoints
>   kdbus: add name registry implementation
>   kdbus: add policy database implementation
>   kdbus: add Makefile, Kconfig and MAINTAINERS entry
>   kdbus: add selftests
> 
>  Documentation/ioctl/ioctl-number.txt              |    1 +
>  Documentation/kdbus.txt                           | 2107 +++++++++++++++++++++
>  MAINTAINERS                                       |   12 +
>  include/uapi/linux/Kbuild                         |    1 +
>  include/uapi/linux/kdbus.h                        | 1049 ++++++++++
>  include/uapi/linux/magic.h                        |    2 +
>  init/Kconfig                                      |   12 +
>  ipc/Makefile                                      |    2 +-
>  ipc/kdbus/Makefile                                |   22 +
>  ipc/kdbus/bus.c                                   |  553 ++++++
>  ipc/kdbus/bus.h                                   |  103 +
>  ipc/kdbus/connection.c                            | 2004 ++++++++++++++++++++
>  ipc/kdbus/connection.h                            |  262 +++
>  ipc/kdbus/domain.c                                |  350 ++++
>  ipc/kdbus/domain.h                                |   84 +
>  ipc/kdbus/endpoint.c                              |  232 +++
>  ipc/kdbus/endpoint.h                              |   68 +
>  ipc/kdbus/fs.c                                    |  519 +++++
>  ipc/kdbus/fs.h                                    |   25 +
>  ipc/kdbus/handle.c                                | 1134 +++++++++++
>  ipc/kdbus/handle.h                                |   20 +
>  ipc/kdbus/item.c                                  |  309 +++
>  ipc/kdbus/item.h                                  |   57 +
>  ipc/kdbus/limits.h                                |   95 +
>  ipc/kdbus/main.c                                  |   72 +
>  ipc/kdbus/match.c                                 |  535 ++++++
>  ipc/kdbus/match.h                                 |   32 +
>  ipc/kdbus/message.c                               |  598 ++++++
>  ipc/kdbus/message.h                               |  133 ++
>  ipc/kdbus/metadata.c                              | 1066 +++++++++++
>  ipc/kdbus/metadata.h                              |   52 +
>  ipc/kdbus/names.c                                 |  891 +++++++++
>  ipc/kdbus/names.h                                 |   82 +
>  ipc/kdbus/node.c                                  |  910 +++++++++
>  ipc/kdbus/node.h                                  |   87 +
>  ipc/kdbus/notify.c                                |  244 +++
>  ipc/kdbus/notify.h                                |   30 +
>  ipc/kdbus/policy.c                                |  481 +++++
>  ipc/kdbus/policy.h                                |   51 +
>  ipc/kdbus/pool.c                                  |  784 ++++++++
>  ipc/kdbus/pool.h                                  |   47 +
>  ipc/kdbus/queue.c                                 |  505 +++++
>  ipc/kdbus/queue.h                                 |  108 ++
>  ipc/kdbus/reply.c                                 |  262 +++
>  ipc/kdbus/reply.h                                 |   68 +
>  ipc/kdbus/util.c                                  |  317 ++++
>  ipc/kdbus/util.h                                  |  133 ++
>  tools/testing/selftests/Makefile                  |    1 +
>  tools/testing/selftests/kdbus/.gitignore          |   11 +
>  tools/testing/selftests/kdbus/Makefile            |   46 +
>  tools/testing/selftests/kdbus/kdbus-enum.c        |   95 +
>  tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
>  tools/testing/selftests/kdbus/kdbus-test.c        |  920 +++++++++
>  tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
>  tools/testing/selftests/kdbus/kdbus-util.c        | 1646 ++++++++++++++++
>  tools/testing/selftests/kdbus/kdbus-util.h        |  216 +++
>  tools/testing/selftests/kdbus/test-activator.c    |  319 ++++
>  tools/testing/selftests/kdbus/test-attach-flags.c |  751 ++++++++
>  tools/testing/selftests/kdbus/test-benchmark.c    |  427 +++++
>  tools/testing/selftests/kdbus/test-bus.c          |  174 ++
>  tools/testing/selftests/kdbus/test-chat.c         |  123 ++
>  tools/testing/selftests/kdbus/test-connection.c   |  611 ++++++
>  tools/testing/selftests/kdbus/test-daemon.c       |   66 +
>  tools/testing/selftests/kdbus/test-endpoint.c     |  344 ++++
>  tools/testing/selftests/kdbus/test-fd.c           |  710 +++++++
>  tools/testing/selftests/kdbus/test-free.c         |   36 +
>  tools/testing/selftests/kdbus/test-match.c        |  442 +++++
>  tools/testing/selftests/kdbus/test-message.c      |  658 +++++++
>  tools/testing/selftests/kdbus/test-metadata-ns.c  |  507 +++++
>  tools/testing/selftests/kdbus/test-monitor.c      |  158 ++
>  tools/testing/selftests/kdbus/test-names.c        |  184 ++
>  tools/testing/selftests/kdbus/test-policy-ns.c    |  633 +++++++
>  tools/testing/selftests/kdbus/test-policy-priv.c  | 1270 +++++++++++++
>  tools/testing/selftests/kdbus/test-policy.c       |   81 +
>  tools/testing/selftests/kdbus/test-race.c         |  313 +++
>  tools/testing/selftests/kdbus/test-sync.c         |  368 ++++
>  tools/testing/selftests/kdbus/test-timeout.c      |   99 +
>  77 files changed, 27818 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/kdbus.txt
>  create mode 100644 include/uapi/linux/kdbus.h
>  create mode 100644 ipc/kdbus/Makefile
>  create mode 100644 ipc/kdbus/bus.c
>  create mode 100644 ipc/kdbus/bus.h
>  create mode 100644 ipc/kdbus/connection.c
>  create mode 100644 ipc/kdbus/connection.h
>  create mode 100644 ipc/kdbus/domain.c
>  create mode 100644 ipc/kdbus/domain.h
>  create mode 100644 ipc/kdbus/endpoint.c
>  create mode 100644 ipc/kdbus/endpoint.h
>  create mode 100644 ipc/kdbus/fs.c
>  create mode 100644 ipc/kdbus/fs.h
>  create mode 100644 ipc/kdbus/handle.c
>  create mode 100644 ipc/kdbus/handle.h
>  create mode 100644 ipc/kdbus/item.c
>  create mode 100644 ipc/kdbus/item.h
>  create mode 100644 ipc/kdbus/limits.h
>  create mode 100644 ipc/kdbus/main.c
>  create mode 100644 ipc/kdbus/match.c
>  create mode 100644 ipc/kdbus/match.h
>  create mode 100644 ipc/kdbus/message.c
>  create mode 100644 ipc/kdbus/message.h
>  create mode 100644 ipc/kdbus/metadata.c
>  create mode 100644 ipc/kdbus/metadata.h
>  create mode 100644 ipc/kdbus/names.c
>  create mode 100644 ipc/kdbus/names.h
>  create mode 100644 ipc/kdbus/node.c
>  create mode 100644 ipc/kdbus/node.h
>  create mode 100644 ipc/kdbus/notify.c
>  create mode 100644 ipc/kdbus/notify.h
>  create mode 100644 ipc/kdbus/policy.c
>  create mode 100644 ipc/kdbus/policy.h
>  create mode 100644 ipc/kdbus/pool.c
>  create mode 100644 ipc/kdbus/pool.h
>  create mode 100644 ipc/kdbus/queue.c
>  create mode 100644 ipc/kdbus/queue.h
>  create mode 100644 ipc/kdbus/reply.c
>  create mode 100644 ipc/kdbus/reply.h
>  create mode 100644 ipc/kdbus/util.c
>  create mode 100644 ipc/kdbus/util.h
>  create mode 100644 tools/testing/selftests/kdbus/.gitignore
>  create mode 100644 tools/testing/selftests/kdbus/Makefile
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
>  create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
>  create mode 100644 tools/testing/selftests/kdbus/test-activator.c
>  create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
>  create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
>  create mode 100644 tools/testing/selftests/kdbus/test-bus.c
>  create mode 100644 tools/testing/selftests/kdbus/test-chat.c
>  create mode 100644 tools/testing/selftests/kdbus/test-connection.c
>  create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
>  create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
>  create mode 100644 tools/testing/selftests/kdbus/test-fd.c
>  create mode 100644 tools/testing/selftests/kdbus/test-free.c
>  create mode 100644 tools/testing/selftests/kdbus/test-match.c
>  create mode 100644 tools/testing/selftests/kdbus/test-message.c
>  create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
>  create mode 100644 tools/testing/selftests/kdbus/test-names.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
>  create mode 100644 tools/testing/selftests/kdbus/test-policy.c
>  create mode 100644 tools/testing/selftests/kdbus/test-race.c
>  create mode 100644 tools/testing/selftests/kdbus/test-sync.c
>  create mode 100644 tools/testing/selftests/kdbus/test-timeout.c
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:31       ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-20 14:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack, Djalal Harouni,
	Daniel Mack, Johannes Stezenbach

Hi Michael

On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack <daniel@zonque.org>
>>
>> kdbus is a system for low-latency, low-overhead, easy to use
>> interprocess communication (IPC).
>>
>> The interface to all functions in this driver is implemented via ioctls
>> on files exposed through a filesystem called 'kdbusfs'. The default
>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>> documentation about the kernel level API design.
>
> I have some details feedback on the contents of this file, and some
> bigger questions. I'll split them out into separate mails.
>
> So here, the bigger, general questions to start with. I've arrived late
> to this, so sorry if they've already been discussed, but the answers to
> some of the questions should actually be in this file, I would have
> expected.
>
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.
>
> An observation: The documentation below is substantial, but this API is
> enormous, so the documentation still feels rather thin. What would
> really help would be some example code in the doc.
>
> And on the subject of code examples... Are there any (prototype)
> working user-space applications that exercise the current kdbus
> implementation? That is, can I install these kdbus patches, and
> then find a simple example application somewhere that does
> something to exercise kdbus?

If you run a 3.18 kernel, you can install kdbus.ko from our repository
and boot a full Fedora system running Gnome3 with kdbus, given that
you compiled systemd with --enable-kdbus (which is still
experimental). No legacy dbus1 daemon is running. Instead, we have a
bus-proxy that converts classic dbus1 to kdbus, so all
bus-communication runs on kdbus.

> And then: is there any substantial real-world application (e.g., a
> full D-Bus port) that is being developed in tandem with this kernel
> side patch? (I don't mean a user-space library; I mean a seriously
> large application.) This is an incredibly complex API whose
> failings are only going to become evident through real-world use.
> Solidifying an API in the kernel and then discovering the API
> problems later when writing real-world applications would make for
> a sad story. A story something like that of inotify, an API which
> is an order of magnitude less complex than kdbus. (I can't help but
> feel that many of inotify problems that I discuss at
> https://lwn.net/Articles/605128/ might have been fixed or mitigated
> if a few real-world applications had been implemented before the
> API  was set in stone.)

I think running a whole Gnome3 stack counts as "substantial real-world
application", right? Sure, it's a dbus1-to-kdbus layer, but all the
systemd tools use kdbus natively and it works just fine. In fact, we
all run kdbus on our main-systems every day.

We've spent over a year fixing races and API misdesigns, we've talked
to other toolkit developers (glib, qt, ..) and made sure we're
backwards compatible to dbus1. I don't think the API is perfect,
everyone makes mistakes. But with bus-proxy and systemd we have two
huge users of kdbus that put a lot of pressure on API design.

>> +For a kdbus specific userspace library implementation please refer to:
>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>
> Is this library intended just for systemd? More generally, is there an
> intention to provide a general purpose library API for kdbus? Or is the
> intention that each application will roll a library suitable to its
> needs? I think an answer to that question would be useful in this
> Documentation file.

kdbus is in no way bound to systemd. There are ongoing efforts to port
glib and qt to kdbus natively. The API is pretty simple and I don't
see how a libkdbus would simplify things. In fact, even our tests only
have slim wrappers around the ioctls to simplify error-handling in
test-scenarios.

Note that most of the toolkit work is on the marshaling level, which
is independent of kdbus. kdbus just provides the transport level. DBus
is just one, yet significant, application-layer on top of kdbus. Our
test-cases use kdbus exclusively to transport raw byte streams.

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:31       ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-20 14:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack, Djalal Harouni,
	Daniel Mack, Johannes Stezenbach

Hi Michael

On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
>>
>> kdbus is a system for low-latency, low-overhead, easy to use
>> interprocess communication (IPC).
>>
>> The interface to all functions in this driver is implemented via ioctls
>> on files exposed through a filesystem called 'kdbusfs'. The default
>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>> documentation about the kernel level API design.
>
> I have some details feedback on the contents of this file, and some
> bigger questions. I'll split them out into separate mails.
>
> So here, the bigger, general questions to start with. I've arrived late
> to this, so sorry if they've already been discussed, but the answers to
> some of the questions should actually be in this file, I would have
> expected.
>
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.
>
> An observation: The documentation below is substantial, but this API is
> enormous, so the documentation still feels rather thin. What would
> really help would be some example code in the doc.
>
> And on the subject of code examples... Are there any (prototype)
> working user-space applications that exercise the current kdbus
> implementation? That is, can I install these kdbus patches, and
> then find a simple example application somewhere that does
> something to exercise kdbus?

If you run a 3.18 kernel, you can install kdbus.ko from our repository
and boot a full Fedora system running Gnome3 with kdbus, given that
you compiled systemd with --enable-kdbus (which is still
experimental). No legacy dbus1 daemon is running. Instead, we have a
bus-proxy that converts classic dbus1 to kdbus, so all
bus-communication runs on kdbus.

> And then: is there any substantial real-world application (e.g., a
> full D-Bus port) that is being developed in tandem with this kernel
> side patch? (I don't mean a user-space library; I mean a seriously
> large application.) This is an incredibly complex API whose
> failings are only going to become evident through real-world use.
> Solidifying an API in the kernel and then discovering the API
> problems later when writing real-world applications would make for
> a sad story. A story something like that of inotify, an API which
> is an order of magnitude less complex than kdbus. (I can't help but
> feel that many of inotify problems that I discuss at
> https://lwn.net/Articles/605128/ might have been fixed or mitigated
> if a few real-world applications had been implemented before the
> API  was set in stone.)

I think running a whole Gnome3 stack counts as "substantial real-world
application", right? Sure, it's a dbus1-to-kdbus layer, but all the
systemd tools use kdbus natively and it works just fine. In fact, we
all run kdbus on our main-systems every day.

We've spent over a year fixing races and API misdesigns, we've talked
to other toolkit developers (glib, qt, ..) and made sure we're
backwards compatible to dbus1. I don't think the API is perfect,
everyone makes mistakes. But with bus-proxy and systemd we have two
huge users of kdbus that put a lot of pressure on API design.

>> +For a kdbus specific userspace library implementation please refer to:
>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>
> Is this library intended just for systemd? More generally, is there an
> intention to provide a general purpose library API for kdbus? Or is the
> intention that each application will roll a library suitable to its
> needs? I think an answer to that question would be useful in this
> Documentation file.

kdbus is in no way bound to systemd. There are ongoing efforts to port
glib and qt to kdbus natively. The API is pretty simple and I don't
see how a libkdbus would simplify things. In fact, even our tests only
have slim wrappers around the ioctls to simplify error-handling in
test-scenarios.

Note that most of the toolkit work is on the marshaling level, which
is independent of kdbus. kdbus just provides the transport level. DBus
is just one, yet significant, application-layer on top of kdbus. Our
test-cases use kdbus exclusively to transport raw byte streams.

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:42         ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-20 14:42 UTC (permalink / raw)
  To: David Herrmann
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Djalal Harouni, Daniel Mack,
	Johannes Stezenbach

On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> Hi Michael
>
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack <daniel@zonque.org>
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
>
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

FWIW, we've been building a "playground" repository for the kernel
that contains this already for Fedora.  If you have a stock Fedora 21
or rawhide install, you can use:

https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/

which has the kernel+kdbus and systemd built with --enable-kdbus
already.  Easy enough to throw in a VM for testing.

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:42         ` Josh Boyer
  0 siblings, 0 replies; 143+ messages in thread
From: Josh Boyer @ 2015-01-20 14:42 UTC (permalink / raw)
  To: David Herrmann
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Djalal Harouni, Daniel Mack,
	Johannes Stezenbach

On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Michael
>
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
>
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

FWIW, we've been building a "playground" repository for the kernel
that contains this already for Fedora.  If you have a stock Fedora 21
or rawhide install, you can use:

https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/

which has the kernel+kdbus and systemd built with --enable-kdbus
already.  Easy enough to throw in a VM for testing.

josh

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:53           ` Djalal Harouni
  0 siblings, 0 replies; 143+ messages in thread
From: Djalal Harouni @ 2015-01-20 14:53 UTC (permalink / raw)
  To: Josh Boyer
  Cc: David Herrmann, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack, Johannes Stezenbach

Hi,

On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote:
> On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> > Hi Michael
> >
> > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> > <mtk.manpages@gmail.com> wrote:
> >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> >>> From: Daniel Mack <daniel@zonque.org>
> >>>
> >>> kdbus is a system for low-latency, low-overhead, easy to use
> >>> interprocess communication (IPC).
> >>>
> >>> The interface to all functions in this driver is implemented via ioctls
> >>> on files exposed through a filesystem called 'kdbusfs'. The default
> >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> >>> documentation about the kernel level API design.
> >>
> >> I have some details feedback on the contents of this file, and some
> >> bigger questions. I'll split them out into separate mails.
> >>
> >> So here, the bigger, general questions to start with. I've arrived late
> >> to this, so sorry if they've already been discussed, but the answers to
> >> some of the questions should actually be in this file, I would have
> >> expected.
> >>
> >> This is an enormous and complex API. Why is the API ioctl() based,
> >> rather than system-call-based? Have we learned nothing from the hydra
> >> that the futex() multiplexing syscall became? (And kdbus is an order
> >> of magnitude more complex, by the look of things.) At the very least,
> >> a *good* justification of why the API is ioctl()-based should be part
> >> of this documentation file.
> >>
> >> An observation: The documentation below is substantial, but this API is
> >> enormous, so the documentation still feels rather thin. What would
> >> really help would be some example code in the doc.
> >>
> >> And on the subject of code examples... Are there any (prototype)
> >> working user-space applications that exercise the current kdbus
> >> implementation? That is, can I install these kdbus patches, and
> >> then find a simple example application somewhere that does
> >> something to exercise kdbus?
> >
> > If you run a 3.18 kernel, you can install kdbus.ko from our repository
> > and boot a full Fedora system running Gnome3 with kdbus, given that
> > you compiled systemd with --enable-kdbus (which is still
> > experimental). No legacy dbus1 daemon is running. Instead, we have a
> > bus-proxy that converts classic dbus1 to kdbus, so all
> > bus-communication runs on kdbus.
> 
> FWIW, we've been building a "playground" repository for the kernel
> that contains this already for Fedora.  If you have a stock Fedora 21
> or rawhide install, you can use:
> 
> https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/
> 
> which has the kernel+kdbus and systemd built with --enable-kdbus
> already.  Easy enough to throw in a VM for testing.
> 
> josh
Yes thanks josh!

Another addition, if kdbus is installed and loaded, you could also use
systemd-nspawn to boot a full system (systemd compiled with
--enable-kdbus) in a container [1], kdbusfs will be mounted in the
container.

There is also the busctl tool to query kdbus...

http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

-- 
Djalal Harouni
http://opendz.org

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 14:53           ` Djalal Harouni
  0 siblings, 0 replies; 143+ messages in thread
From: Djalal Harouni @ 2015-01-20 14:53 UTC (permalink / raw)
  To: Josh Boyer
  Cc: David Herrmann, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack, Johannes Stezenbach

Hi,

On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote:
> On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > Hi Michael
> >
> > On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> > <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
> >>> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> >>>
> >>> kdbus is a system for low-latency, low-overhead, easy to use
> >>> interprocess communication (IPC).
> >>>
> >>> The interface to all functions in this driver is implemented via ioctls
> >>> on files exposed through a filesystem called 'kdbusfs'. The default
> >>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
> >>> documentation about the kernel level API design.
> >>
> >> I have some details feedback on the contents of this file, and some
> >> bigger questions. I'll split them out into separate mails.
> >>
> >> So here, the bigger, general questions to start with. I've arrived late
> >> to this, so sorry if they've already been discussed, but the answers to
> >> some of the questions should actually be in this file, I would have
> >> expected.
> >>
> >> This is an enormous and complex API. Why is the API ioctl() based,
> >> rather than system-call-based? Have we learned nothing from the hydra
> >> that the futex() multiplexing syscall became? (And kdbus is an order
> >> of magnitude more complex, by the look of things.) At the very least,
> >> a *good* justification of why the API is ioctl()-based should be part
> >> of this documentation file.
> >>
> >> An observation: The documentation below is substantial, but this API is
> >> enormous, so the documentation still feels rather thin. What would
> >> really help would be some example code in the doc.
> >>
> >> And on the subject of code examples... Are there any (prototype)
> >> working user-space applications that exercise the current kdbus
> >> implementation? That is, can I install these kdbus patches, and
> >> then find a simple example application somewhere that does
> >> something to exercise kdbus?
> >
> > If you run a 3.18 kernel, you can install kdbus.ko from our repository
> > and boot a full Fedora system running Gnome3 with kdbus, given that
> > you compiled systemd with --enable-kdbus (which is still
> > experimental). No legacy dbus1 daemon is running. Instead, we have a
> > bus-proxy that converts classic dbus1 to kdbus, so all
> > bus-communication runs on kdbus.
> 
> FWIW, we've been building a "playground" repository for the kernel
> that contains this already for Fedora.  If you have a stock Fedora 21
> or rawhide install, you can use:
> 
> https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/
> 
> which has the kernel+kdbus and systemd built with --enable-kdbus
> already.  Easy enough to throw in a VM for testing.
> 
> josh
Yes thanks josh!

Another addition, if kdbus is installed and loaded, you could also use
systemd-nspawn to boot a full system (systemd compiled with
--enable-kdbus) in a container [1], kdbusfs will be mounted in the
container.

There is also the busctl tool to query kdbus...

http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

-- 
Djalal Harouni
http://opendz.org

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-20 14:53           ` Djalal Harouni
  (?)
@ 2015-01-20 16:08           ` Johannes Stezenbach
  2015-01-20 17:00               ` David Herrmann
  -1 siblings, 1 reply; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 16:08 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Josh Boyer, David Herrmann, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack

Hi all,

On Tue, Jan 20, 2015 at 03:53:53PM +0100, Djalal Harouni wrote:
> On Tue, Jan 20, 2015 at 09:42:59AM -0500, Josh Boyer wrote:
> > On Tue, Jan 20, 2015 at 9:31 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> > >
> > > If you run a 3.18 kernel, you can install kdbus.ko from our repository
> > > and boot a full Fedora system running Gnome3 with kdbus, given that
> > > you compiled systemd with --enable-kdbus (which is still
> > > experimental). No legacy dbus1 daemon is running. Instead, we have a
> > > bus-proxy that converts classic dbus1 to kdbus, so all
> > > bus-communication runs on kdbus.
> > 
> > FWIW, we've been building a "playground" repository for the kernel
> > that contains this already for Fedora.  If you have a stock Fedora 21
> > or rawhide install, you can use:
> > 
> > https://copr.fedoraproject.org/coprs/jwboyer/kernel-playground/
> > 
> > which has the kernel+kdbus and systemd built with --enable-kdbus
> > already.  Easy enough to throw in a VM for testing.
> 
> Another addition, if kdbus is installed and loaded, you could also use
> systemd-nspawn to boot a full system (systemd compiled with
> --enable-kdbus) in a container [1], kdbusfs will be mounted in the
> container.
> 
> There is also the busctl tool to query kdbus...
> 
> http://www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

It is reassuring that kdbus actually works :)

However, let me repeat and rephrase my previous questions:
Is there a noticable or measurable improvement from using kdbus?
IOW, is the added complexity of kdbus worth the result?

I have stated my believe that current usage of D-Bus is not
performance sensitive and the number of messages exchanged
is low.  I would love it if you would prove me wrong.
Or if you could show that any D-Bus related bug in Gnome3
is fixed by kdbus.

I would sooo love it if someone would finally post some
data that proves kdbus is useful beyond systemd.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 17:00               ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-20 17:00 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Djalal Harouni, Josh Boyer, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack

Hi

On Tue, Jan 20, 2015 at 5:08 PM, Johannes Stezenbach <js@sig21.net> wrote:
> However, let me repeat and rephrase my previous questions:
> Is there a noticable or measurable improvement from using kdbus?
> IOW, is the added complexity of kdbus worth the result?
>
> I have stated my believe that current usage of D-Bus is not
> performance sensitive and the number of messages exchanged
> is low.  I would love it if you would prove me wrong.
> Or if you could show that any D-Bus related bug in Gnome3
> is fixed by kdbus.

DBus is not used for performance sensitive applications because DBus
is slow. We want to make it fast so we can finally use it for
low-latency, high-throughput applications. A simple DBus
method-call+reply takes 200us here, with kdbus it takes 8us (with UDS
about 2us). If I increase the packet size from 8k to 128k, kdbus even
tops UDS thanks to single-copy transfers.
The fact that there is no performance-critical application using DBus
is, imho, an argument *pro* kdbus. People haven't been capable of
making classic dbus1 fast enough for low-latency audio, thus, we
present kdbus.

Starting up 'gdm' sends ~5k dbus messages on my machine. It takes >1s
to transmit the messages alone. Each dbus1 message has to be copied 4
times for each direction. With kdbus, each message is copied only once
for each transmission (or not at all, if you use memfds, though that
doesn't mean it's necessarily faster). No intermediate context-switch
is needed. This makes kdbus capable to transmit low-latency audio data
*inline*.

DBus marshaling is the de-facto standard in all major(!) linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy,
authentication/authorization, well-known name registry, efficient
broadcasts/multicasts, peer discovery, bus discovery, metadata
transmission, and more.
It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.

kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).
Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel
name-registry changes).


Given these theoretical advantages, here're some examples:

*) The Tizen developers have been complaining about the high latency
of DBus for polkit'ish policy queries. That's why their authentication
framework uses custom UDS sockets (called 'Cynara'). If a
UI-interaction needs multiple authentication-queries, you don't want
it to take multiple milliseconds, given that you usually want to
render the result in the same frame.

*) PulseAudio doesn't use DBus for data transmission. They had to
implement their own marshaling code, transport layer and so on, just
because DBus1-latency is horrible. With kdbus, we can basically drop
this code-duplication and unify the IPC layer. Same is true for
Wayland, btw.

*) By moving broadcast-transmission into the kernel, we can use the
time-slices of the sender to perform heavy operations. This is also
true for policy decisions, etc. With a userspace daemon, we cannot
perform operations in a time-slice of the caller. This makes DoS
attacks much harder.

*) With priority-inheritance, we can do synchronous calls into trusted
peers and let them optionally use our time-slice to perform the
action. This allows syscall-like/binder-like method-calls into other
processes. Without priority-inheritance, this is not possible in a
secure manner (see 'priority-inheritance').

*) Logging-daemons often want to attach metadata to log-messages so
debugging/filtering gets easier. If short-lived programs send
log-messages, the destination peer might not be able to read such
metadata from /proc, as the process might no longer be available at
that time. Same is true for policy-decisions like polkit does. You
cannot send off method-calls and exit. You have to wait for a reply,
even though you might not even care for it. If you don't wait, the
other side might not be able to verify your identity and as such
reject the request.

*) Even though the dbus traffic on idle-systems might be low, this
doesn't mean it's not significant at boot-times or under high-load. If
you run a dbus-monitor of your choice, you will see there is an
significant number of messages exchanged during VT-switches, startup,
shutdown, suspend, wakeup, hotplugging and similar situations where
lots of control-messages are exchanged. We don't want to spend
hundreds of ms just to transmit those messages.

*) dbus-daemon is not available during early-boot or shutdown.


These are just examples off the top of my head, but I think they're
already pretty convincing.
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 17:00               ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-20 17:00 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Djalal Harouni, Josh Boyer, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack

Hi

On Tue, Jan 20, 2015 at 5:08 PM, Johannes Stezenbach <js-FF7aIK3TAVNeoWH0uzbU5w@public.gmane.org> wrote:
> However, let me repeat and rephrase my previous questions:
> Is there a noticable or measurable improvement from using kdbus?
> IOW, is the added complexity of kdbus worth the result?
>
> I have stated my believe that current usage of D-Bus is not
> performance sensitive and the number of messages exchanged
> is low.  I would love it if you would prove me wrong.
> Or if you could show that any D-Bus related bug in Gnome3
> is fixed by kdbus.

DBus is not used for performance sensitive applications because DBus
is slow. We want to make it fast so we can finally use it for
low-latency, high-throughput applications. A simple DBus
method-call+reply takes 200us here, with kdbus it takes 8us (with UDS
about 2us). If I increase the packet size from 8k to 128k, kdbus even
tops UDS thanks to single-copy transfers.
The fact that there is no performance-critical application using DBus
is, imho, an argument *pro* kdbus. People haven't been capable of
making classic dbus1 fast enough for low-latency audio, thus, we
present kdbus.

Starting up 'gdm' sends ~5k dbus messages on my machine. It takes >1s
to transmit the messages alone. Each dbus1 message has to be copied 4
times for each direction. With kdbus, each message is copied only once
for each transmission (or not at all, if you use memfds, though that
doesn't mean it's necessarily faster). No intermediate context-switch
is needed. This makes kdbus capable to transmit low-latency audio data
*inline*.

DBus marshaling is the de-facto standard in all major(!) linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy,
authentication/authorization, well-known name registry, efficient
broadcasts/multicasts, peer discovery, bus discovery, metadata
transmission, and more.
It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.

kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).
Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel
name-registry changes).


Given these theoretical advantages, here're some examples:

*) The Tizen developers have been complaining about the high latency
of DBus for polkit'ish policy queries. That's why their authentication
framework uses custom UDS sockets (called 'Cynara'). If a
UI-interaction needs multiple authentication-queries, you don't want
it to take multiple milliseconds, given that you usually want to
render the result in the same frame.

*) PulseAudio doesn't use DBus for data transmission. They had to
implement their own marshaling code, transport layer and so on, just
because DBus1-latency is horrible. With kdbus, we can basically drop
this code-duplication and unify the IPC layer. Same is true for
Wayland, btw.

*) By moving broadcast-transmission into the kernel, we can use the
time-slices of the sender to perform heavy operations. This is also
true for policy decisions, etc. With a userspace daemon, we cannot
perform operations in a time-slice of the caller. This makes DoS
attacks much harder.

*) With priority-inheritance, we can do synchronous calls into trusted
peers and let them optionally use our time-slice to perform the
action. This allows syscall-like/binder-like method-calls into other
processes. Without priority-inheritance, this is not possible in a
secure manner (see 'priority-inheritance').

*) Logging-daemons often want to attach metadata to log-messages so
debugging/filtering gets easier. If short-lived programs send
log-messages, the destination peer might not be able to read such
metadata from /proc, as the process might no longer be available at
that time. Same is true for policy-decisions like polkit does. You
cannot send off method-calls and exit. You have to wait for a reply,
even though you might not even care for it. If you don't wait, the
other side might not be able to verify your identity and as such
reject the request.

*) Even though the dbus traffic on idle-systems might be low, this
doesn't mean it's not significant at boot-times or under high-load. If
you run a dbus-monitor of your choice, you will see there is an
significant number of messages exchanged during VT-switches, startup,
shutdown, suspend, wakeup, hotplugging and similar situations where
lots of control-messages are exchanged. We don't want to spend
hundreds of ms just to transmit those messages.

*) dbus-daemon is not available during early-boot or shutdown.


These are just examples off the top of my head, but I think they're
already pretty convincing.
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-20 13:58     ` Michael Kerrisk (man-pages)
  (?)
@ 2015-01-20 17:50     ` Daniel Mack
  2015-01-21  8:57         ` Michael Kerrisk (man-pages)
  -1 siblings, 1 reply; 143+ messages in thread
From: Daniel Mack @ 2015-01-20 17:50 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz

Hi Michael,

Thanks a lot for for intense review of the documentation. Much appreciated.

I've addressed all but the below issues, following your suggestions.


On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
>> +    and KDBUS_CMD_ENDPOINT_MAKE (see above).
>> +
>> +    Following items are expected for KDBUS_CMD_BUS_MAKE:
>> +    KDBUS_ITEM_MAKE_NAME
>> +      Contains a string to identify the bus name.
> 
> So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
> which subfield holds the pointer to the string?
> 
> Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
> I think.

Hmm, you're quoting text from section 5, and section 4 actually
describes the concept of items quite well I believe?

>> +  __s64 priority;
>> +    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
>> +    the queue with at least the given priority. If no such message is waiting
>> +    in the queue, -ENOMSG is returned.
> 
> ###
> How do I simply select the highest priority message, without knowing what 
> its priority is?

The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
messages to be dequeued by their priority. The 'priority' field is
simply a filter that request a minimum priority. By setting this field
to the highest possible value, you effectively bypass the filter. I've
added a better description now.

>> +  -ENOMEM	The kernel memory is exhausted
>> +  -ENOTTY	Illegal ioctl command issued for the file descriptor
> 
> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
> explanation in this document.)

Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
that can't be handled by the FDs they're called on. 'man errno(3)' even
states: "ENOTTY   Inappropriate I/O control operation (POSIX.1)".

>> +  -EINVAL	Unsupported item attached to command
>> +
>> +For all ioctls that carry a struct as payload:
>> +
>> +  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
>> +		inaccessible from the kernel side.
>> +  -EINVAL	The size inside the supplied struct was smaller than expected
>> +  -EMSGSIZE	The size inside the supplied struct was bigger than expected
> 
> Why two different errors for smaller and larger than expected? (If you keep things this
> way, pelase explain the reason in this document.)

Providing a struct that is smaller than the minimum doesn't give the
ioctl a valid set of information to process the request. Hence, I think
-EINVAL is appropriate. However, -EMSGSIZE is something that users might
hit when they make message payloads too big, and I think it's good to
have a change to distinguish the two cases in error handling. I added
something in the document now.


Again, thanks a lot for reading the documentation so accurately!

Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 18:23       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-20 18:23 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz, Johannes Stezenbach

On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.

I think the simplest reason is because we want to be able to build kdbus
as a module. It's rather an optional driver than a core kernel feature.
IMO, kernel primitives should be syscalls, but kdbus is not a primitive
but an elaborate subsystem.

Also, the context the kdbus commands operate on originate from a
mountable special-purpose file system. Hence, we decided not to use a
global kernel interface but specific ioctls on the nodes exposed by kdbusfs.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 18:23       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-20 18:23 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach

On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
> This is an enormous and complex API. Why is the API ioctl() based,
> rather than system-call-based? Have we learned nothing from the hydra
> that the futex() multiplexing syscall became? (And kdbus is an order
> of magnitude more complex, by the look of things.) At the very least,
> a *good* justification of why the API is ioctl()-based should be part
> of this documentation file.

I think the simplest reason is because we want to be able to build kdbus
as a module. It's rather an optional driver than a core kernel feature.
IMO, kernel primitives should be syscalls, but kdbus is not a primitive
but an elaborate subsystem.

Also, the context the kdbus commands operate on originate from a
mountable special-purpose file system. Hence, we decided not to use a
global kernel interface but specific ioctls on the nodes exposed by kdbusfs.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 22:00                 ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 22:00 UTC (permalink / raw)
  To: David Herrmann
  Cc: Djalal Harouni, Josh Boyer, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack

Hi David,

On Tue, Jan 20, 2015 at 06:00:28PM +0100, David Herrmann wrote:
[big snip]
> These are just examples off the top of my head, but I think they're
> already pretty convincing.

Thank you for writing this up.  This is the information I was
looking for which puts kdbus into context and explains
the motivation for its development.  Naturally I don't agree
with all of it, but I'm content with what I learned so far.

Daniel informed me off-list that he (and probably others) does
not understand what my questions were aiming at.  I'm sorry
about that, I thought it was clear I was just lacking
the background information to understand what kdbus is and
what it is not, and why it exists -- information I couldn't find
in some hours of googling.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-20 22:00                 ` Johannes Stezenbach
  0 siblings, 0 replies; 143+ messages in thread
From: Johannes Stezenbach @ 2015-01-20 22:00 UTC (permalink / raw)
  To: David Herrmann
  Cc: Djalal Harouni, Josh Boyer, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Daniel Mack

Hi David,

On Tue, Jan 20, 2015 at 06:00:28PM +0100, David Herrmann wrote:
[big snip]
> These are just examples off the top of my head, but I think they're
> already pretty convincing.

Thank you for writing this up.  This is the information I was
looking for which puts kdbus into context and explains
the motivation for its development.  Naturally I don't agree
with all of it, but I'm content with what I learned so far.

Daniel informed me off-list that he (and probably others) does
not understand what my questions were aiming at.  I'm sorry
about that, I thought it was clear I was just lacking
the background information to understand what kdbus is and
what it is not, and why it exists -- information I couldn't find
in some hours of googling.


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  8:57         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21  8:57 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg,
	jkosina, luto, linux-api, linux-kernel
  Cc: mtk.manpages, daniel, dh.herrmann, tixxdz

Hi Daniel,

On 01/20/2015 06:50 PM, Daniel Mack wrote:
> Hi Michael,
> 
> Thanks a lot for for intense review of the documentation. Much appreciated.
> 
> I've addressed all but the below issues, following your suggestions.

Are your changes already visible somewhere?

> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
>>> +    and KDBUS_CMD_ENDPOINT_MAKE (see above).
>>> +
>>> +    Following items are expected for KDBUS_CMD_BUS_MAKE:
>>> +    KDBUS_ITEM_MAKE_NAME
>>> +      Contains a string to identify the bus name.
>>
>> So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
>> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
>> which subfield holds the pointer to the string?
>>
>> Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
>> I think.
> 
> Hmm, you're quoting text from section 5, and section 4 actually
> describes the concept of items quite well I believe?

Well, Section 4 is pretty short. My point is that most of the various 
blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
documented in kdbus.txt. They all should be, IMO.

>>> +  __s64 priority;
>>> +    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
>>> +    the queue with at least the given priority. If no such message is waiting
>>> +    in the queue, -ENOMSG is returned.
>>
>> ###
>> How do I simply select the highest priority message, without knowing what 
>> its priority is?
> 
> The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
> messages to be dequeued by their priority. The 'priority' field is
> simply a filter that request a minimum priority. By setting this field
> to the highest possible value, you effectively bypass the filter. I've
> added a better description now.

Thanks for the clarification.

>>> +  -ENOMEM	The kernel memory is exhausted
>>> +  -ENOTTY	Illegal ioctl command issued for the file descriptor
>>
>> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
>> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
>> explanation in this document.)
> 
> Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
> that can't be handled by the FDs they're called on. 'man errno(3)' even
> states: "ENOTTY   Inappropriate I/O control operation (POSIX.1)".

Okay.

>>> +  -EINVAL	Unsupported item attached to command
>>> +
>>> +For all ioctls that carry a struct as payload:
>>> +
>>> +  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
>>> +		inaccessible from the kernel side.
>>> +  -EINVAL	The size inside the supplied struct was smaller than expected
>>> +  -EMSGSIZE	The size inside the supplied struct was bigger than expected
>>
>> Why two different errors for smaller and larger than expected? (If you keep things this
>> way, pelase explain the reason in this document.)
> 
> Providing a struct that is smaller than the minimum doesn't give the
> ioctl a valid set of information to process the request. Hence, I think
> -EINVAL is appropriate. However, -EMSGSIZE is something that users might
> hit when they make message payloads too big, and I think it's good to
> have a change to distinguish the two cases in error handling. I added
> something in the document now.

Thanks.

> Again, thanks a lot for reading the documentation so accurately!

You're welcome.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  8:57         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21  8:57 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

Hi Daniel,

On 01/20/2015 06:50 PM, Daniel Mack wrote:
> Hi Michael,
> 
> Thanks a lot for for intense review of the documentation. Much appreciated.
> 
> I've addressed all but the below issues, following your suggestions.

Are your changes already visible somewhere?

> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
>>> +    and KDBUS_CMD_ENDPOINT_MAKE (see above).
>>> +
>>> +    Following items are expected for KDBUS_CMD_BUS_MAKE:
>>> +    KDBUS_ITEM_MAKE_NAME
>>> +      Contains a string to identify the bus name.
>>
>> So, up to here, I've seen no definition of 'kdbus_item', which leaves me 
>> asking questions like: what subfield is KDBUS_ITEM_MAKE_NAME stored in?
>> which subfield holds the pointer to the string?
>>
>> Somewhere earlier,  kdbus_item needs to be exaplained in more detail, 
>> I think.
> 
> Hmm, you're quoting text from section 5, and section 4 actually
> describes the concept of items quite well I believe?

Well, Section 4 is pretty short. My point is that most of the various 
blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
documented in kdbus.txt. They all should be, IMO.

>>> +  __s64 priority;
>>> +    With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
>>> +    the queue with at least the given priority. If no such message is waiting
>>> +    in the queue, -ENOMSG is returned.
>>
>> ###
>> How do I simply select the highest priority message, without knowing what 
>> its priority is?
> 
> The wording is indeed unclear here. KDBUS_RECV_USE_PRIORITY causes the
> messages to be dequeued by their priority. The 'priority' field is
> simply a filter that request a minimum priority. By setting this field
> to the highest possible value, you effectively bypass the filter. I've
> added a better description now.

Thanks for the clarification.

>>> +  -ENOMEM	The kernel memory is exhausted
>>> +  -ENOTTY	Illegal ioctl command issued for the file descriptor
>>
>> Why ENOTTY here, rather than EINVAL? The latter is, I beleive, the usual 
>> ioctl() error for invalid commands, I believe (If you keep ENOTTY, add an
>> explanation in this document.)
> 
> Hmm, no. -ENOTTY is commonly used as return code when calling ioctls
> that can't be handled by the FDs they're called on. 'man errno(3)' even
> states: "ENOTTY   Inappropriate I/O control operation (POSIX.1)".

Okay.

>>> +  -EINVAL	Unsupported item attached to command
>>> +
>>> +For all ioctls that carry a struct as payload:
>>> +
>>> +  -EFAULT	The supplied data pointer was not 64-bit aligned, or was
>>> +		inaccessible from the kernel side.
>>> +  -EINVAL	The size inside the supplied struct was smaller than expected
>>> +  -EMSGSIZE	The size inside the supplied struct was bigger than expected
>>
>> Why two different errors for smaller and larger than expected? (If you keep things this
>> way, pelase explain the reason in this document.)
> 
> Providing a struct that is smaller than the minimum doesn't give the
> ioctl a valid set of information to process the request. Hence, I think
> -EINVAL is appropriate. However, -EMSGSIZE is something that users might
> hit when they make message payloads too big, and I think it's good to
> have a change to distinguish the two cases in error handling. I added
> something in the document now.

Thanks.

> Again, thanks a lot for reading the documentation so accurately!

You're welcome.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-21  8:57         ` Michael Kerrisk (man-pages)
  (?)
@ 2015-01-21  9:07         ` Daniel Mack
  -1 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-21  9:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: daniel, dh.herrmann, tixxdz

Hi Michael,

On 01/21/2015 09:57 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 06:50 PM, Daniel Mack wrote:

>> I've addressed all but the below issues, following your suggestions.
> 
> Are your changes already visible somewhere?

Yes, in the upstream repo for the standalone module, which we also use
to build the patch set from:

  https://code.google.com/p/d-bus/source/browse/kdbus.txt

>> Hmm, you're quoting text from section 5, and section 4 actually
>> describes the concept of items quite well I believe?
> 
> Well, Section 4 is pretty short. My point is that most of the various 
> blob formats (e.g., kdbus_pids, kdbus_caps, kdbus_memfd) are not
> documented in kdbus.txt. They all should be, IMO.

Okay, I'll add some text about them.


Best regards,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  9:07       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21  9:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel, Daniel Mack
  Cc: mtk.manpages, dh.herrmann, tixxdz

Daniel,

On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack <daniel@zonque.org>
>>

[...]

>> +offset field contains the location of the new message inside the receiver's
>> +pool. The message is stored as struct kdbus_msg at this offset, and can be
>> +interpreted with the semantics described above.
>> +
>> +Also, if the connection allowed for file descriptor to be passed
>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
> 
> ###
> "after"??? When exactly?

By the way, what was the answer to this question?

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  9:07       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21  9:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Daniel Mack
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

Daniel,

On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
>>

[...]

>> +offset field contains the location of the new message inside the receiver's
>> +pool. The message is stored as struct kdbus_msg at this offset, and can be
>> +interpreted with the semantics described above.
>> +
>> +Also, if the connection allowed for file descriptor to be passed
>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
> 
> ###
> "after"??? When exactly?

By the way, what was the answer to this question?

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  9:12         ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-21  9:12 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: dh.herrmann, tixxdz

On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:

>>> +Also, if the connection allowed for file descriptor to be passed
>>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
>>
>> ###
>> "after"??? When exactly?
> 
> By the way, what was the answer to this question?

I've corrected the wording on this. The file descriptors are installed
when the RECV ioctl is called, so they are ready to use when the call
returns.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21  9:12         ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-21  9:12 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A

On 01/21/2015 10:07 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 02:58 PM, Michael Kerrisk (man-pages) wrote:

>>> +Also, if the connection allowed for file descriptor to be passed
>>> +(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
>>> +installed into the receiving process after the KDBUS_CMD_RECV ioctl returns.
>>
>> ###
>> "after"??? When exactly?
> 
> By the way, what was the answer to this question?

I've corrected the wording on this. The file descriptors are installed
when the RECV ioctl is called, so they are ready to use when the call
returns.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 10:28         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21 10:28 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, Linux API, linux-kernel,
	Daniel Mack, Djalal Harouni, Daniel Mack, Johannes Stezenbach

Hi David,

On 01/20/2015 03:31 PM, David Herrmann wrote:
> Hi Michael
> 
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack <daniel@zonque.org>
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
> 
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

Good to hear.  I think that some info like this should go out in 
the "00/" covering mails for future patch revisions, so that people
can get some sense of the testing that has been done.

>> And then: is there any substantial real-world application (e.g., a
>> full D-Bus port) that is being developed in tandem with this kernel
>> side patch? (I don't mean a user-space library; I mean a seriously
>> large application.) This is an incredibly complex API whose
>> failings are only going to become evident through real-world use.
>> Solidifying an API in the kernel and then discovering the API
>> problems later when writing real-world applications would make for
>> a sad story. A story something like that of inotify, an API which
>> is an order of magnitude less complex than kdbus. (I can't help but
>> feel that many of inotify problems that I discuss at
>> https://lwn.net/Articles/605128/ might have been fixed or mitigated
>> if a few real-world applications had been implemented before the
>> API  was set in stone.)
> 
> I think running a whole Gnome3 stack counts as "substantial real-world
> application", right? 

Yes, I'll give you that ;-).

>  Sure, it's a dbus1-to-kdbus layer, but all the
> systemd tools use kdbus natively and it works just fine. In fact, we
> all run kdbus on our main-systems every day.
> 
> We've spent over a year fixing races and API misdesigns, we've talked
> to other toolkit developers (glib, qt, ..) and made sure we're
> backwards compatible to dbus1. I don't think the API is perfect,
> everyone makes mistakes. But with bus-proxy and systemd we have two
> huge users of kdbus that put a lot of pressure on API design.

I'll say more about that in another mail in a moment. I'm not enthusiastic
about the API.

>>> +For a kdbus specific userspace library implementation please refer to:
>>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>>
>> Is this library intended just for systemd? More generally, is there an
>> intention to provide a general purpose library API for kdbus? Or is the
>> intention that each application will roll a library suitable to its
>> needs? I think an answer to that question would be useful in this
>> Documentation file.
> 
> kdbus is in no way bound to systemd. There are ongoing efforts to port
> glib and qt to kdbus natively. The API is pretty simple 
                                 ^^^^^^^^^^^^^^^^^^^^^^^^
I think you and I must have quite different definitions of "simple"...
(For more on this point, see my reply to Daniel in a moment.)

> and I don't
> see how a libkdbus would simplify things. In fact, even our tests only
> have slim wrappers around the ioctls to simplify error-handling in
> test-scenarios.

Again, the above info would be useful in the Documentation file.

> Note that most of the toolkit work is on the marshaling level, which
> is independent of kdbus. kdbus just provides the transport level. DBus
> is just one, yet significant, application-layer on top of kdbus. Our
> test-cases use kdbus exclusively to transport raw byte streams.

Okay. Thanks for the info.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 10:28         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21 10:28 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Greg Kroah-Hartman,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski, Linux API,
	linux-kernel, Daniel Mack, Djalal Harouni, Daniel Mack,
	Johannes Stezenbach

Hi David,

On 01/20/2015 03:31 PM, David Herrmann wrote:
> Hi Michael
> 
> On Tue, Jan 20, 2015 at 2:53 PM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 01/16/2015 08:16 PM, Greg Kroah-Hartman wrote:
>>> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
>>>
>>> kdbus is a system for low-latency, low-overhead, easy to use
>>> interprocess communication (IPC).
>>>
>>> The interface to all functions in this driver is implemented via ioctls
>>> on files exposed through a filesystem called 'kdbusfs'. The default
>>> mount point of kdbusfs is /sys/fs/kdbus. This patch adds detailed
>>> documentation about the kernel level API design.
>>
>> I have some details feedback on the contents of this file, and some
>> bigger questions. I'll split them out into separate mails.
>>
>> So here, the bigger, general questions to start with. I've arrived late
>> to this, so sorry if they've already been discussed, but the answers to
>> some of the questions should actually be in this file, I would have
>> expected.
>>
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
>>
>> An observation: The documentation below is substantial, but this API is
>> enormous, so the documentation still feels rather thin. What would
>> really help would be some example code in the doc.
>>
>> And on the subject of code examples... Are there any (prototype)
>> working user-space applications that exercise the current kdbus
>> implementation? That is, can I install these kdbus patches, and
>> then find a simple example application somewhere that does
>> something to exercise kdbus?
> 
> If you run a 3.18 kernel, you can install kdbus.ko from our repository
> and boot a full Fedora system running Gnome3 with kdbus, given that
> you compiled systemd with --enable-kdbus (which is still
> experimental). No legacy dbus1 daemon is running. Instead, we have a
> bus-proxy that converts classic dbus1 to kdbus, so all
> bus-communication runs on kdbus.

Good to hear.  I think that some info like this should go out in 
the "00/" covering mails for future patch revisions, so that people
can get some sense of the testing that has been done.

>> And then: is there any substantial real-world application (e.g., a
>> full D-Bus port) that is being developed in tandem with this kernel
>> side patch? (I don't mean a user-space library; I mean a seriously
>> large application.) This is an incredibly complex API whose
>> failings are only going to become evident through real-world use.
>> Solidifying an API in the kernel and then discovering the API
>> problems later when writing real-world applications would make for
>> a sad story. A story something like that of inotify, an API which
>> is an order of magnitude less complex than kdbus. (I can't help but
>> feel that many of inotify problems that I discuss at
>> https://lwn.net/Articles/605128/ might have been fixed or mitigated
>> if a few real-world applications had been implemented before the
>> API  was set in stone.)
> 
> I think running a whole Gnome3 stack counts as "substantial real-world
> application", right? 

Yes, I'll give you that ;-).

>  Sure, it's a dbus1-to-kdbus layer, but all the
> systemd tools use kdbus natively and it works just fine. In fact, we
> all run kdbus on our main-systems every day.
> 
> We've spent over a year fixing races and API misdesigns, we've talked
> to other toolkit developers (glib, qt, ..) and made sure we're
> backwards compatible to dbus1. I don't think the API is perfect,
> everyone makes mistakes. But with bus-proxy and systemd we have two
> huge users of kdbus that put a lot of pressure on API design.

I'll say more about that in another mail in a moment. I'm not enthusiastic
about the API.

>>> +For a kdbus specific userspace library implementation please refer to:
>>> +  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
>>
>> Is this library intended just for systemd? More generally, is there an
>> intention to provide a general purpose library API for kdbus? Or is the
>> intention that each application will roll a library suitable to its
>> needs? I think an answer to that question would be useful in this
>> Documentation file.
> 
> kdbus is in no way bound to systemd. There are ongoing efforts to port
> glib and qt to kdbus natively. The API is pretty simple 
                                 ^^^^^^^^^^^^^^^^^^^^^^^^
I think you and I must have quite different definitions of "simple"...
(For more on this point, see my reply to Daniel in a moment.)

> and I don't
> see how a libkdbus would simplify things. In fact, even our tests only
> have slim wrappers around the ioctls to simplify error-handling in
> test-scenarios.

Again, the above info would be useful in the Documentation file.

> Note that most of the toolkit work is on the marshaling level, which
> is independent of kdbus. kdbus just provides the transport level. DBus
> is just one, yet significant, application-layer on top of kdbus. Our
> test-cases use kdbus exclusively to transport raw byte streams.

Okay. Thanks for the info.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 10:32         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21 10:32 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg,
	jkosina, luto, linux-api, linux-kernel
  Cc: mtk.manpages, dh.herrmann, tixxdz, Johannes Stezenbach

Hello Daniel,

On 01/20/2015 07:23 PM, Daniel Mack wrote:
> On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
> 
> I think the simplest reason is because we want to be able to build kdbus
> as a module. 

This isn't any _good_ justification...

> It's rather an optional driver than a core kernel feature.

Given the various things that I've seen said about kdbus, the
preceding sentence makes little sense to me:

* kdbus will be the framework supporting user-space D-Bus in the
  future, and also used by systemd, and so on pretty much every 
  desktop system.
* kdbus solves much of the bandwidth problem of D-Bus1. That,
  along with a host of other features mean that there will be
  a lot of user-space developers interested in using this API.
* Various parties in user space are already expressing strong 
  interest in kdbus.

My guess from the above? This will NOT be an "optional driver". 
It *will be* a core kernel feature.

> IMO, kernel primitives should be syscalls, but kdbus is not a primitive
> but an elaborate subsystem.

Agreed. It's an elaborate subsystem. But that fact doesn't in itself
dictate any particular API design choice.

> Also, the context the kdbus commands operate on originate from a
> mountable special-purpose file system.

It's not clear to me how this point implies any particular API design
choice.

> Hence, we decided not to use a
> global kernel interface but specific ioctls on the nodes exposed by kdbusfs.

I don't follow the reasoning here at all. Here's what we have, if I
have grasped it roughly correctly:

* 16 ioctls exposed to user space.
* some 20 different structures exchanged between kernel and user space
* about 14k lines of kernel code implement the above
* some rather thin documentation of the whole lot

Sorry if that last point seems rather harsh. I know that you personally 
have done a lot of work on the kdbus.txt file. David Herrmann asserts
that this is a simple API. It is not. He also suggests that there is
no need for a libkdbus. I don't know whether that's right or not, but the
point is then that there's an expectation that the raw kernel API is what
user space will need to work with. 

Notwithstanding the fact that there's a lot of (good) information in
kdbus.txt, there's not nearly enough for someone to create useful, 
robust applications that use that API (and not enough that I as a
reviewer feel comfortable about reviewing the API). As things stand,
user-space developers will be forced to decipher large amounts of kernel
code and existing applications in order to actually build things. And
when they do, they'll be using one of the worst APIs known to man: ioctl(),
an API that provides no type safety at all.

ioctl() is a get-out-of-jail free card when it comes to API design. Rather
than thinking carefully and long about a set of coherent, stable APIs that 
provide some degree of type-safety to user-space, one just adds/changes/removes
an ioctl. And that process seems to be frequent and ongoing even now. (And 
it's to your great credit that the API/ABI breaks are clearly and honestly 
marked in the kdbus.h changelog.) All of this lightens the burden of API
design for kernel developers, but I'm concerned that the long-term pain
for user-space developers who use an API which (in my estimation) may
come to be widely used will be enormous.

Concretely, I'd like to see the following in kdbus.txt:
* A lot more detail, adding the various pieces that are currently missing.
  I've mentioned already the absence of detail on the item blob structures, 
  but there's probably several other pieces as well. My problem is that the
  API is so big and hard to grok that it's hard to even begin to work out
  what's missing from the documentation.
* Fleshing out the API summaries with code snippets that illustrate the
  use of the APIs.
* At least one simple working example application, complete with a walk
  through of how it's built and run.

Yes, all of this is a big demand. But this is a big API that is being added 
to the kernel, and one that may become widely used and for a long time.
It's imperative that the API is well documented, and as well designed as
possible. Furthermore, with such improved documentation I feel we'd be in 
a better position to evaluate the merits of an ioctl()-based API versus
some other approach.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 10:32         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-21 10:32 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach

Hello Daniel,

On 01/20/2015 07:23 PM, Daniel Mack wrote:
> On 01/20/2015 02:53 PM, Michael Kerrisk (man-pages) wrote:
>> This is an enormous and complex API. Why is the API ioctl() based,
>> rather than system-call-based? Have we learned nothing from the hydra
>> that the futex() multiplexing syscall became? (And kdbus is an order
>> of magnitude more complex, by the look of things.) At the very least,
>> a *good* justification of why the API is ioctl()-based should be part
>> of this documentation file.
> 
> I think the simplest reason is because we want to be able to build kdbus
> as a module. 

This isn't any _good_ justification...

> It's rather an optional driver than a core kernel feature.

Given the various things that I've seen said about kdbus, the
preceding sentence makes little sense to me:

* kdbus will be the framework supporting user-space D-Bus in the
  future, and also used by systemd, and so on pretty much every 
  desktop system.
* kdbus solves much of the bandwidth problem of D-Bus1. That,
  along with a host of other features mean that there will be
  a lot of user-space developers interested in using this API.
* Various parties in user space are already expressing strong 
  interest in kdbus.

My guess from the above? This will NOT be an "optional driver". 
It *will be* a core kernel feature.

> IMO, kernel primitives should be syscalls, but kdbus is not a primitive
> but an elaborate subsystem.

Agreed. It's an elaborate subsystem. But that fact doesn't in itself
dictate any particular API design choice.

> Also, the context the kdbus commands operate on originate from a
> mountable special-purpose file system.

It's not clear to me how this point implies any particular API design
choice.

> Hence, we decided not to use a
> global kernel interface but specific ioctls on the nodes exposed by kdbusfs.

I don't follow the reasoning here at all. Here's what we have, if I
have grasped it roughly correctly:

* 16 ioctls exposed to user space.
* some 20 different structures exchanged between kernel and user space
* about 14k lines of kernel code implement the above
* some rather thin documentation of the whole lot

Sorry if that last point seems rather harsh. I know that you personally 
have done a lot of work on the kdbus.txt file. David Herrmann asserts
that this is a simple API. It is not. He also suggests that there is
no need for a libkdbus. I don't know whether that's right or not, but the
point is then that there's an expectation that the raw kernel API is what
user space will need to work with. 

Notwithstanding the fact that there's a lot of (good) information in
kdbus.txt, there's not nearly enough for someone to create useful, 
robust applications that use that API (and not enough that I as a
reviewer feel comfortable about reviewing the API). As things stand,
user-space developers will be forced to decipher large amounts of kernel
code and existing applications in order to actually build things. And
when they do, they'll be using one of the worst APIs known to man: ioctl(),
an API that provides no type safety at all.

ioctl() is a get-out-of-jail free card when it comes to API design. Rather
than thinking carefully and long about a set of coherent, stable APIs that 
provide some degree of type-safety to user-space, one just adds/changes/removes
an ioctl. And that process seems to be frequent and ongoing even now. (And 
it's to your great credit that the API/ABI breaks are clearly and honestly 
marked in the kdbus.h changelog.) All of this lightens the burden of API
design for kernel developers, but I'm concerned that the long-term pain
for user-space developers who use an API which (in my estimation) may
come to be widely used will be enormous.

Concretely, I'd like to see the following in kdbus.txt:
* A lot more detail, adding the various pieces that are currently missing.
  I've mentioned already the absence of detail on the item blob structures, 
  but there's probably several other pieces as well. My problem is that the
  API is so big and hard to grok that it's hard to even begin to work out
  what's missing from the documentation.
* Fleshing out the API summaries with code snippets that illustrate the
  use of the APIs.
* At least one simple working example application, complete with a walk
  through of how it's built and run.

Yes, all of this is a big demand. But this is a big API that is being added 
to the kernel, and one that may become widely used and for a long time.
It's imperative that the API is well documented, and as well designed as
possible. Furthermore, with such improved documentation I feel we'd be in 
a better position to evaluate the merits of an ioctl()-based API versus
some other approach.

Thanks,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 15:19           ` Theodore Ts'o
  0 siblings, 0 replies; 143+ messages in thread
From: Theodore Ts'o @ 2015-01-21 15:19 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg,
	jkosina, luto, linux-api, linux-kernel, dh.herrmann, tixxdz,
	Johannes Stezenbach

On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote:
> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.

I have to agree with Michael here; it's really, **really**
disengenuous to say that that if you don't want kdbus, you can just
#ifconfig it out.  The fact that it systemd will be using it means
that it will very shortly become a core kernel feature which is
absolutely mandatory.  Sure, maybe it can be configured out for "tiny
kernels", just as in theory we can configure out the VM system for
really tiny embedded systems.  But we should be treating this as
something that is not optional, because the reality is that's the way
it's going to be in very short order.  So if that means to use proper
system calls instead of ioctls, we should do that.

       	     	     		   - Ted

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 15:19           ` Theodore Ts'o
  0 siblings, 0 replies; 143+ messages in thread
From: Theodore Ts'o @ 2015-01-21 15:19 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach

On Wed, Jan 21, 2015 at 11:32:59AM +0100, Michael Kerrisk (man-pages) wrote:
> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.

I have to agree with Michael here; it's really, **really**
disengenuous to say that that if you don't want kdbus, you can just
#ifconfig it out.  The fact that it systemd will be using it means
that it will very shortly become a core kernel feature which is
absolutely mandatory.  Sure, maybe it can be configured out for "tiny
kernels", just as in theory we can configure out the VM system for
really tiny embedded systems.  But we should be treating this as
something that is not optional, because the reality is that's the way
it's going to be in very short order.  So if that means to use proper
system calls instead of ioctls, we should do that.

       	     	     		   - Ted

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 16:58           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-21 16:58 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel
  Cc: dh.herrmann, tixxdz, Johannes Stezenbach

Hi Michael,

On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 07:23 PM, Daniel Mack wrote:

>> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.
> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>   along with a host of other features mean that there will be
>   a lot of user-space developers interested in using this API.
> * Various parties in user space are already expressing strong 
>   interest in kdbus.
> 
> My guess from the above? This will NOT be an "optional driver". 
> It *will be* a core kernel feature.

There will be userlands that will depend on kdbus, but that will still
not turn the "driver" into a core Linux kernel feature. We really want
it to be losely coupled from the rest of the kernel and optional after
all. The kernel people are working toward making more and more things
optional these days, and there will still be lots of systems that won't
be using kdbus.

>> Also, the context the kdbus commands operate on originate from a
>> mountable special-purpose file system.
> 
> It's not clear to me how this point implies any particular API design
> choice.

It emphasizes the fact that our ioctl API can only be used with the
nodes exposed by kdbusfs, and vice versa. I think operations on
driver-specific files do not justify a new 'generic' syscall API.

> Notwithstanding the fact that there's a lot of (good) information in
> kdbus.txt, there's not nearly enough for someone to create useful, 
> robust applications that use that API (and not enough that I as a
> reviewer feel comfortable about reviewing the API). As things stand,
> user-space developers will be forced to decipher large amounts of kernel
> code and existing applications in order to actually build things. And
> when they do, they'll be using one of the worst APIs known to man: ioctl(),
> an API that provides no type safety at all.

I don't see how ioctls are any worse than syscalls with pointers to
structures. One can screw up compatibility either way. How is an ioctl
wrapper/prototype any less type-safe than a syscall wrapper?

> ioctl() is a get-out-of-jail free card when it comes to API design.

And how are syscalls different in that regard when they would both
transport the same data structures? Also note that all kdbus ioctls
necessarily operate on a file descriptor context, which an ioctl passes
along by default.

> Rather
> than thinking carefully and long about a set of coherent, stable APIs that 
> provide some degree of type-safety to user-space, one just adds/changes/removes
> an ioctl.

Adding another ioctl in the future for cases we didn't think about
earlier would of course be considered a workaround; and even though such
things happen all the time, it's certainly something we'd like to avoid.

However, we would also need to add another syscall in such cases, which
is even worse IMO. So that's really not an argument against ioctls after
all. What fundamental difference between a raw syscall and a ioctl,
especially with regards to type safety, is there which would help us here?

> And that process seems to be frequent and ongoing even now. (And 
> it's to your great credit that the API/ABI breaks are clearly and honestly 
> marked in the kdbus.h changelog.) All of this lightens the burden of API
> design for kernel developers, but I'm concerned that the long-term pain
> for user-space developers who use an API which (in my estimation) may
> come to be widely used will be enormous.

Yes, we've jointly reviewed the API details again until just recently to
unify structs and enums etc, and added fields to make the ioctls structs
more versatile for possible future additions. By that, we effectively
broke the ABI, but we did that because we know we can't do such things
again in the future.

But again - I don't see how this would be different when using syscalls
rather than ioctls to transport information between the driver and
userspace. Could you elaborate?

> Concretely, I'd like to see the following in kdbus.txt:
> * A lot more detail, adding the various pieces that are currently missing.
>   I've mentioned already the absence of detail on the item blob structures, 
>   but there's probably several other pieces as well. My problem is that the
>   API is so big and hard to grok that it's hard to even begin to work out
>   what's missing from the documentation.
> * Fleshing out the API summaries with code snippets that illustrate the
>   use of the APIs.
> * At least one simple working example application, complete with a walk
>   through of how it's built and run.
> 
> Yes, all of this is a big demand. But this is a big API that is being added 
> to the kernel, and one that may become widely used and for a long time.

Fair enough. Everything that helps people understand and use the API in
a sane way is a good thing to have, and so is getting an assessment from
people how are exposed to this API for the first time.

We'll be working on restructuring the documentation.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-21 16:58           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-21 16:58 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach

Hi Michael,

On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
> On 01/20/2015 07:23 PM, Daniel Mack wrote:

>> It's rather an optional driver than a core kernel feature.
> 
> Given the various things that I've seen said about kdbus, the
> preceding sentence makes little sense to me:
> 
> * kdbus will be the framework supporting user-space D-Bus in the
>   future, and also used by systemd, and so on pretty much every 
>   desktop system.
> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>   along with a host of other features mean that there will be
>   a lot of user-space developers interested in using this API.
> * Various parties in user space are already expressing strong 
>   interest in kdbus.
> 
> My guess from the above? This will NOT be an "optional driver". 
> It *will be* a core kernel feature.

There will be userlands that will depend on kdbus, but that will still
not turn the "driver" into a core Linux kernel feature. We really want
it to be losely coupled from the rest of the kernel and optional after
all. The kernel people are working toward making more and more things
optional these days, and there will still be lots of systems that won't
be using kdbus.

>> Also, the context the kdbus commands operate on originate from a
>> mountable special-purpose file system.
> 
> It's not clear to me how this point implies any particular API design
> choice.

It emphasizes the fact that our ioctl API can only be used with the
nodes exposed by kdbusfs, and vice versa. I think operations on
driver-specific files do not justify a new 'generic' syscall API.

> Notwithstanding the fact that there's a lot of (good) information in
> kdbus.txt, there's not nearly enough for someone to create useful, 
> robust applications that use that API (and not enough that I as a
> reviewer feel comfortable about reviewing the API). As things stand,
> user-space developers will be forced to decipher large amounts of kernel
> code and existing applications in order to actually build things. And
> when they do, they'll be using one of the worst APIs known to man: ioctl(),
> an API that provides no type safety at all.

I don't see how ioctls are any worse than syscalls with pointers to
structures. One can screw up compatibility either way. How is an ioctl
wrapper/prototype any less type-safe than a syscall wrapper?

> ioctl() is a get-out-of-jail free card when it comes to API design.

And how are syscalls different in that regard when they would both
transport the same data structures? Also note that all kdbus ioctls
necessarily operate on a file descriptor context, which an ioctl passes
along by default.

> Rather
> than thinking carefully and long about a set of coherent, stable APIs that 
> provide some degree of type-safety to user-space, one just adds/changes/removes
> an ioctl.

Adding another ioctl in the future for cases we didn't think about
earlier would of course be considered a workaround; and even though such
things happen all the time, it's certainly something we'd like to avoid.

However, we would also need to add another syscall in such cases, which
is even worse IMO. So that's really not an argument against ioctls after
all. What fundamental difference between a raw syscall and a ioctl,
especially with regards to type safety, is there which would help us here?

> And that process seems to be frequent and ongoing even now. (And 
> it's to your great credit that the API/ABI breaks are clearly and honestly 
> marked in the kdbus.h changelog.) All of this lightens the burden of API
> design for kernel developers, but I'm concerned that the long-term pain
> for user-space developers who use an API which (in my estimation) may
> come to be widely used will be enormous.

Yes, we've jointly reviewed the API details again until just recently to
unify structs and enums etc, and added fields to make the ioctls structs
more versatile for possible future additions. By that, we effectively
broke the ABI, but we did that because we know we can't do such things
again in the future.

But again - I don't see how this would be different when using syscalls
rather than ioctls to transport information between the driver and
userspace. Could you elaborate?

> Concretely, I'd like to see the following in kdbus.txt:
> * A lot more detail, adding the various pieces that are currently missing.
>   I've mentioned already the absence of detail on the item blob structures, 
>   but there's probably several other pieces as well. My problem is that the
>   API is so big and hard to grok that it's hard to even begin to work out
>   what's missing from the documentation.
> * Fleshing out the API summaries with code snippets that illustrate the
>   use of the APIs.
> * At least one simple working example application, complete with a walk
>   through of how it's built and run.
> 
> Yes, all of this is a big demand. But this is a big API that is being added 
> to the kernel, and one that may become widely used and for a long time.

Fair enough. Everything that helps people understand and use the API in
a sane way is a good thing to have, and so is getting an assessment from
people how are exposed to this API for the first time.

We'll be working on restructuring the documentation.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-22 10:18             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-22 10:18 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd, ebiederm, gnomes, teg,
	jkosina, luto, linux-api, linux-kernel
  Cc: mtk.manpages, dh.herrmann, tixxdz, Johannes Stezenbach,
	Theodore T'so

Hi Daniel,

On 01/21/2015 05:58 PM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
>> On 01/20/2015 07:23 PM, Daniel Mack wrote:
> 
>>> It's rather an optional driver than a core kernel feature.
>>
>> Given the various things that I've seen said about kdbus, the
>> preceding sentence makes little sense to me:
>>
>> * kdbus will be the framework supporting user-space D-Bus in the
>>   future, and also used by systemd, and so on pretty much every 
>>   desktop system.
>> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>>   along with a host of other features mean that there will be
>>   a lot of user-space developers interested in using this API.
>> * Various parties in user space are already expressing strong 
>>   interest in kdbus.
>>
>> My guess from the above? This will NOT be an "optional driver". 
>> It *will be* a core kernel feature.
> 
> There will be userlands that will depend on kdbus, but that will still
> not turn the "driver" into a core Linux kernel feature. We really want
> it to be losely coupled from the rest of the kernel and optional after
> all. The kernel people are working toward making more and more things
> optional these days, and there will still be lots of systems that won't
> be using kdbus.

Make it optional and configurable if you want. But that misses my
point. kdbus is very likely to become an essential, widely used
piece of *general-purpose* piece of ABI infrastructure that will
be configured into virtually every type of system. As such, the same
standards should apply as for a "core kernel feature", whether or
not you want to cal it that.

>>> Also, the context the kdbus commands operate on originate from a
>>> mountable special-purpose file system.
>>
>> It's not clear to me how this point implies any particular API design
>> choice.
> 
> It emphasizes the fact that our ioctl API can only be used with the
> nodes exposed by kdbusfs, and vice versa. I think operations on
> driver-specific files do not justify a new 'generic' syscall API.

I see your (repeated) use of words like "driver" as just a distraction, 
I'm sorry. "Driver-specific files" is an implementation detail. The
point is that kdbus provides a complex, general-purpose feature for all
of userland. It is core infrastructure that will be used by key pieces 
of the plumbing layer, and likely by many other applications as well.
*Please* stop suggesting that it is not core infrastructure: kdbus
has the potential to be a great IPC that will be very useful to many,
many user-space developers.

(By the way, we have precedents for device/filesystem-specific system
calls. Even a recent one, in memfd_create().)

>> Notwithstanding the fact that there's a lot of (good) information in
>> kdbus.txt, there's not nearly enough for someone to create useful, 
>> robust applications that use that API (and not enough that I as a
>> reviewer feel comfortable about reviewing the API). As things stand,
>> user-space developers will be forced to decipher large amounts of kernel
>> code and existing applications in order to actually build things. And
>> when they do, they'll be using one of the worst APIs known to man: ioctl(),
>> an API that provides no type safety at all.
> 
> I don't see how ioctls are any worse than syscalls with pointers to
> structures. One can screw up compatibility either way. How is an ioctl
> wrapper/prototype any less type-safe than a syscall wrapper?

Taking that argument to the extreme, we would have no system calls 
at all, just one gigantic ioctl() ;-).

>> ioctl() is a get-out-of-jail free card when it comes to API design.
> 
> And how are syscalls different in that regard when they would both
> transport the same data structures? 

My general impression is that when it comes to system calls,
there's usually a lot more up front thought about API design.

> Also note that all kdbus ioctls
> necessarily operate on a file descriptor context, which an ioctl passes
> along by default.

I fail to see your point here. The same statement applies to multiple
special-purpose system calls (epoll, inotify, various shared memory APIs, 
and so on).

>> Rather
>> than thinking carefully and long about a set of coherent, stable APIs that 
>> provide some degree of type-safety to user-space, one just adds/changes/removes
>> an ioctl.
> 
> Adding another ioctl in the future for cases we didn't think about
> earlier would of course be considered a workaround; and even though such
> things happen all the time, it's certainly something we'd like to avoid.
> 
> However, we would also need to add another syscall in such cases, which
> is even worse IMO. So that's really not an argument against ioctls after
> all. What fundamental difference between a raw syscall and a ioctl,
> especially with regards to type safety, is there which would help us here?

Type safety helps user-space application developers. System calls have 
it, ioctl() does not.

>> And that process seems to be frequent and ongoing even now. (And 
>> it's to your great credit that the API/ABI breaks are clearly and honestly 
>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>> design for kernel developers, but I'm concerned that the long-term pain
>> for user-space developers who use an API which (in my estimation) may
>> come to be widely used will be enormous.
> 
> Yes, we've jointly reviewed the API details again until just recently to
> unify structs and enums etc, and added fields to make the ioctls structs
> more versatile for possible future additions. By that, we effectively
> broke the ABI, but we did that because we know we can't do such things
> again in the future.
> 
> But again - I don't see how this would be different when using syscalls
> rather than ioctls to transport information between the driver and
> userspace. Could you elaborate?

My suspicion is that not nearly enough thinking has yet been done about
the design of the API. That's based on these observations:

* Documentation that is, considering the size of the API, *way* too thin.
* Some parts of the API not documented at all (various kdbus_item blobs)
* ABI changes happening even quite recently
* API oddities such as the 'kernel_flags' fields. Why do I need to
  be told what flags the kernel supports on *every* operation?

The above is just after a day of looking hard at kdbus.txt. I strongly
suspect I'd find a lot of other issues if I spent more time on kdbus.

>> Concretely, I'd like to see the following in kdbus.txt:
>> * A lot more detail, adding the various pieces that are currently missing.
>>   I've mentioned already the absence of detail on the item blob structures, 
>>   but there's probably several other pieces as well. My problem is that the
>>   API is so big and hard to grok that it's hard to even begin to work out
>>   what's missing from the documentation.
>> * Fleshing out the API summaries with code snippets that illustrate the
>>   use of the APIs.
>> * At least one simple working example application, complete with a walk
>>   through of how it's built and run.
>>
>> Yes, all of this is a big demand. But this is a big API that is being added 
>> to the kernel, and one that may become widely used and for a long time.
> 
> Fair enough. Everything that helps people understand and use the API in
> a sane way is a good thing to have, and so is getting an assessment from
> people how are exposed to this API for the first time.
> 
> We'll be working on restructuring the documentation.

Thanks. I know it's a big effort, and I wish you success.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-22 10:18             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-22 10:18 UTC (permalink / raw)
  To: Daniel Mack, Greg Kroah-Hartman, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach,
	Theodore T'so

Hi Daniel,

On 01/21/2015 05:58 PM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/21/2015 11:32 AM, Michael Kerrisk (man-pages) wrote:
>> On 01/20/2015 07:23 PM, Daniel Mack wrote:
> 
>>> It's rather an optional driver than a core kernel feature.
>>
>> Given the various things that I've seen said about kdbus, the
>> preceding sentence makes little sense to me:
>>
>> * kdbus will be the framework supporting user-space D-Bus in the
>>   future, and also used by systemd, and so on pretty much every 
>>   desktop system.
>> * kdbus solves much of the bandwidth problem of D-Bus1. That,
>>   along with a host of other features mean that there will be
>>   a lot of user-space developers interested in using this API.
>> * Various parties in user space are already expressing strong 
>>   interest in kdbus.
>>
>> My guess from the above? This will NOT be an "optional driver". 
>> It *will be* a core kernel feature.
> 
> There will be userlands that will depend on kdbus, but that will still
> not turn the "driver" into a core Linux kernel feature. We really want
> it to be losely coupled from the rest of the kernel and optional after
> all. The kernel people are working toward making more and more things
> optional these days, and there will still be lots of systems that won't
> be using kdbus.

Make it optional and configurable if you want. But that misses my
point. kdbus is very likely to become an essential, widely used
piece of *general-purpose* piece of ABI infrastructure that will
be configured into virtually every type of system. As such, the same
standards should apply as for a "core kernel feature", whether or
not you want to cal it that.

>>> Also, the context the kdbus commands operate on originate from a
>>> mountable special-purpose file system.
>>
>> It's not clear to me how this point implies any particular API design
>> choice.
> 
> It emphasizes the fact that our ioctl API can only be used with the
> nodes exposed by kdbusfs, and vice versa. I think operations on
> driver-specific files do not justify a new 'generic' syscall API.

I see your (repeated) use of words like "driver" as just a distraction, 
I'm sorry. "Driver-specific files" is an implementation detail. The
point is that kdbus provides a complex, general-purpose feature for all
of userland. It is core infrastructure that will be used by key pieces 
of the plumbing layer, and likely by many other applications as well.
*Please* stop suggesting that it is not core infrastructure: kdbus
has the potential to be a great IPC that will be very useful to many,
many user-space developers.

(By the way, we have precedents for device/filesystem-specific system
calls. Even a recent one, in memfd_create().)

>> Notwithstanding the fact that there's a lot of (good) information in
>> kdbus.txt, there's not nearly enough for someone to create useful, 
>> robust applications that use that API (and not enough that I as a
>> reviewer feel comfortable about reviewing the API). As things stand,
>> user-space developers will be forced to decipher large amounts of kernel
>> code and existing applications in order to actually build things. And
>> when they do, they'll be using one of the worst APIs known to man: ioctl(),
>> an API that provides no type safety at all.
> 
> I don't see how ioctls are any worse than syscalls with pointers to
> structures. One can screw up compatibility either way. How is an ioctl
> wrapper/prototype any less type-safe than a syscall wrapper?

Taking that argument to the extreme, we would have no system calls 
at all, just one gigantic ioctl() ;-).

>> ioctl() is a get-out-of-jail free card when it comes to API design.
> 
> And how are syscalls different in that regard when they would both
> transport the same data structures? 

My general impression is that when it comes to system calls,
there's usually a lot more up front thought about API design.

> Also note that all kdbus ioctls
> necessarily operate on a file descriptor context, which an ioctl passes
> along by default.

I fail to see your point here. The same statement applies to multiple
special-purpose system calls (epoll, inotify, various shared memory APIs, 
and so on).

>> Rather
>> than thinking carefully and long about a set of coherent, stable APIs that 
>> provide some degree of type-safety to user-space, one just adds/changes/removes
>> an ioctl.
> 
> Adding another ioctl in the future for cases we didn't think about
> earlier would of course be considered a workaround; and even though such
> things happen all the time, it's certainly something we'd like to avoid.
> 
> However, we would also need to add another syscall in such cases, which
> is even worse IMO. So that's really not an argument against ioctls after
> all. What fundamental difference between a raw syscall and a ioctl,
> especially with regards to type safety, is there which would help us here?

Type safety helps user-space application developers. System calls have 
it, ioctl() does not.

>> And that process seems to be frequent and ongoing even now. (And 
>> it's to your great credit that the API/ABI breaks are clearly and honestly 
>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>> design for kernel developers, but I'm concerned that the long-term pain
>> for user-space developers who use an API which (in my estimation) may
>> come to be widely used will be enormous.
> 
> Yes, we've jointly reviewed the API details again until just recently to
> unify structs and enums etc, and added fields to make the ioctls structs
> more versatile for possible future additions. By that, we effectively
> broke the ABI, but we did that because we know we can't do such things
> again in the future.
> 
> But again - I don't see how this would be different when using syscalls
> rather than ioctls to transport information between the driver and
> userspace. Could you elaborate?

My suspicion is that not nearly enough thinking has yet been done about
the design of the API. That's based on these observations:

* Documentation that is, considering the size of the API, *way* too thin.
* Some parts of the API not documented at all (various kdbus_item blobs)
* ABI changes happening even quite recently
* API oddities such as the 'kernel_flags' fields. Why do I need to
  be told what flags the kernel supports on *every* operation?

The above is just after a day of looking hard at kdbus.txt. I strongly
suspect I'd find a lot of other issues if I spent more time on kdbus.

>> Concretely, I'd like to see the following in kdbus.txt:
>> * A lot more detail, adding the various pieces that are currently missing.
>>   I've mentioned already the absence of detail on the item blob structures, 
>>   but there's probably several other pieces as well. My problem is that the
>>   API is so big and hard to grok that it's hard to even begin to work out
>>   what's missing from the documentation.
>> * Fleshing out the API summaries with code snippets that illustrate the
>>   use of the APIs.
>> * At least one simple working example application, complete with a walk
>>   through of how it's built and run.
>>
>> Yes, all of this is a big demand. But this is a big API that is being added 
>> to the kernel, and one that may become widely used and for a long time.
> 
> Fair enough. Everything that helps people understand and use the API in
> a sane way is a good thing to have, and so is getting an assessment from
> people how are exposed to this API for the first time.
> 
> We'll be working on restructuring the documentation.

Thanks. I know it's a big effort, and I wish you success.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-22 13:46               ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-22 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Theodore T'so

Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 01/21/2015 05:58 PM, Daniel Mack wrote:
>>>> Also, the context the kdbus commands operate on originate from a
>>>> mountable special-purpose file system.
>>>
>>> It's not clear to me how this point implies any particular API design
>>> choice.
>>
>> It emphasizes the fact that our ioctl API can only be used with the
>> nodes exposed by kdbusfs, and vice versa. I think operations on
>> driver-specific files do not justify a new 'generic' syscall API.
>
> I see your (repeated) use of words like "driver" as just a distraction,
> I'm sorry. "Driver-specific files" is an implementation detail. The
> point is that kdbus provides a complex, general-purpose feature for all
> of userland. It is core infrastructure that will be used by key pieces
> of the plumbing layer, and likely by many other applications as well.
> *Please* stop suggesting that it is not core infrastructure: kdbus
> has the potential to be a great IPC that will be very useful to many,
> many user-space developers.

We called it an 'ipc driver' so far. It is in no way meant as
distraction. Feel free to call it 'core infrastructure'. I think we
can agree that we want it to be generically useful, like other ipc
mechanisms, including UDS and netlink.

> (By the way, we have precedents for device/filesystem-specific system
> calls. Even a recent one, in memfd_create().)

memfd_create() is in no way file-system specific. Internally, it uses
shmem, but that's an implementation detail. The API does not expose
this in any way. If you were referring to sealing, it's implemented as
fcntl(), not as a separate syscall. Furthermore, sealing is only
limited to shmem as it's the only volatile storage. It's not an API
restriction. Other volatile file-systems are free to implement
sealing.

>>> ioctl() is a get-out-of-jail free card when it comes to API design.
>>
>> And how are syscalls different in that regard when they would both
>> transport the same data structures?
>
> My general impression is that when it comes to system calls,
> there's usually a lot more up front thought about API design.

This is no technical reason why to use syscalls over ioctls. Imho,
it's rather a reason to improve the kernel's ioctl-review process.

>> Also note that all kdbus ioctls
>> necessarily operate on a file descriptor context, which an ioctl passes
>> along by default.
>
> I fail to see your point here. The same statement applies to multiple
> special-purpose system calls (epoll, inotify, various shared memory APIs,
> and so on).

epoll and inotify don't have a 'parent' object living in the
file-system. They *need* an entry-point. We can use open() for that.

You're right, from a technical viewpoint, there's no restriction.
There're examples for both (eg., see Solaris /dev/poll, as
ioctl()-based 'epoll').

>>> Rather
>>> than thinking carefully and long about a set of coherent, stable APIs that
>>> provide some degree of type-safety to user-space, one just adds/changes/removes
>>> an ioctl.
>>
>> Adding another ioctl in the future for cases we didn't think about
>> earlier would of course be considered a workaround; and even though such
>> things happen all the time, it's certainly something we'd like to avoid.
>>
>> However, we would also need to add another syscall in such cases, which
>> is even worse IMO. So that's really not an argument against ioctls after
>> all. What fundamental difference between a raw syscall and a ioctl,
>> especially with regards to type safety, is there which would help us here?
>
> Type safety helps user-space application developers. System calls have
> it, ioctl() does not.

This is simply not true. There is no type-safety in:
    syscall(__NR_xyz, some, random, arguments)

The only way a syscall gets 'type-safe', is to provide a wrapper
function. Same applies to ioctls. But people tend to not do that for
ioctls, which is, again, not a technical argument against ioctls. It's
a matter of psychology, though.

I still don't see a technical reason to use syscalls. API proposals welcome!

We're now working on a small kdbus helper library, which provides
type-safe ioctl wrappers, item-iterators and documented examples. But,
like syscalls, nobody is forced to use the wrappers. The API design is
not affected by this.

>>> And that process seems to be frequent and ongoing even now. (And
>>> it's to your great credit that the API/ABI breaks are clearly and honestly
>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>> design for kernel developers, but I'm concerned that the long-term pain
>>> for user-space developers who use an API which (in my estimation) may
>>> come to be widely used will be enormous.
>>
>> Yes, we've jointly reviewed the API details again until just recently to
>> unify structs and enums etc, and added fields to make the ioctls structs
>> more versatile for possible future additions. By that, we effectively
>> broke the ABI, but we did that because we know we can't do such things
>> again in the future.
>>
>> But again - I don't see how this would be different when using syscalls
>> rather than ioctls to transport information between the driver and
>> userspace. Could you elaborate?
>
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
>
> * Documentation that is, considering the size of the API, *way* too thin.

Ok, working on that.

> * Some parts of the API not documented at all (various kdbus_item blobs)

All public structures have documentation in kdbus.h. It may need
improvements, though.

> * ABI changes happening even quite recently

Please elaborate why 'recent ABI-changes' are a sign of a premature API.

I seriously doubt any API can be called 'perfect'. On the contrary, I
believe that all APIs could be improved continuously. The fact that
we, at one point, settle on an API is an admission of
backwards-compatibility. I in no way think it's a sign of
'perfection'.
With kdbus our plan is to give API-guarantees starting with upstream
inclusion. We know, that our API will not be perfect, none is. But we
will try hard to fix anything that comes up, as long as we can. And
this effort will manifest in ABI-breaks.

> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?

If we only returned EINVAL on invalid arguments, user-space had to
probe for each flag to see whether it's supported. By returning the
set of supported flags, user-space can cache those and _reliably_ know
which flags are supported.
We decided the overhead of a single u64 copy on each ioctl is
preferred over a separate syscall/ioctl to query kernel flags. If you
disagree, please elaborate (preferably with a suggestion how to do it
better).

> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

If you find the time, please do! Any hints how a specific part of the
API could be done better, is highly appreciated. A lot of the more or
less recent changes were done due to reviews from glib developers.
More help is always welcome!

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-22 13:46               ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-22 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Theodore T'so

Hi Michael

On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 01/21/2015 05:58 PM, Daniel Mack wrote:
>>>> Also, the context the kdbus commands operate on originate from a
>>>> mountable special-purpose file system.
>>>
>>> It's not clear to me how this point implies any particular API design
>>> choice.
>>
>> It emphasizes the fact that our ioctl API can only be used with the
>> nodes exposed by kdbusfs, and vice versa. I think operations on
>> driver-specific files do not justify a new 'generic' syscall API.
>
> I see your (repeated) use of words like "driver" as just a distraction,
> I'm sorry. "Driver-specific files" is an implementation detail. The
> point is that kdbus provides a complex, general-purpose feature for all
> of userland. It is core infrastructure that will be used by key pieces
> of the plumbing layer, and likely by many other applications as well.
> *Please* stop suggesting that it is not core infrastructure: kdbus
> has the potential to be a great IPC that will be very useful to many,
> many user-space developers.

We called it an 'ipc driver' so far. It is in no way meant as
distraction. Feel free to call it 'core infrastructure'. I think we
can agree that we want it to be generically useful, like other ipc
mechanisms, including UDS and netlink.

> (By the way, we have precedents for device/filesystem-specific system
> calls. Even a recent one, in memfd_create().)

memfd_create() is in no way file-system specific. Internally, it uses
shmem, but that's an implementation detail. The API does not expose
this in any way. If you were referring to sealing, it's implemented as
fcntl(), not as a separate syscall. Furthermore, sealing is only
limited to shmem as it's the only volatile storage. It's not an API
restriction. Other volatile file-systems are free to implement
sealing.

>>> ioctl() is a get-out-of-jail free card when it comes to API design.
>>
>> And how are syscalls different in that regard when they would both
>> transport the same data structures?
>
> My general impression is that when it comes to system calls,
> there's usually a lot more up front thought about API design.

This is no technical reason why to use syscalls over ioctls. Imho,
it's rather a reason to improve the kernel's ioctl-review process.

>> Also note that all kdbus ioctls
>> necessarily operate on a file descriptor context, which an ioctl passes
>> along by default.
>
> I fail to see your point here. The same statement applies to multiple
> special-purpose system calls (epoll, inotify, various shared memory APIs,
> and so on).

epoll and inotify don't have a 'parent' object living in the
file-system. They *need* an entry-point. We can use open() for that.

You're right, from a technical viewpoint, there's no restriction.
There're examples for both (eg., see Solaris /dev/poll, as
ioctl()-based 'epoll').

>>> Rather
>>> than thinking carefully and long about a set of coherent, stable APIs that
>>> provide some degree of type-safety to user-space, one just adds/changes/removes
>>> an ioctl.
>>
>> Adding another ioctl in the future for cases we didn't think about
>> earlier would of course be considered a workaround; and even though such
>> things happen all the time, it's certainly something we'd like to avoid.
>>
>> However, we would also need to add another syscall in such cases, which
>> is even worse IMO. So that's really not an argument against ioctls after
>> all. What fundamental difference between a raw syscall and a ioctl,
>> especially with regards to type safety, is there which would help us here?
>
> Type safety helps user-space application developers. System calls have
> it, ioctl() does not.

This is simply not true. There is no type-safety in:
    syscall(__NR_xyz, some, random, arguments)

The only way a syscall gets 'type-safe', is to provide a wrapper
function. Same applies to ioctls. But people tend to not do that for
ioctls, which is, again, not a technical argument against ioctls. It's
a matter of psychology, though.

I still don't see a technical reason to use syscalls. API proposals welcome!

We're now working on a small kdbus helper library, which provides
type-safe ioctl wrappers, item-iterators and documented examples. But,
like syscalls, nobody is forced to use the wrappers. The API design is
not affected by this.

>>> And that process seems to be frequent and ongoing even now. (And
>>> it's to your great credit that the API/ABI breaks are clearly and honestly
>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>> design for kernel developers, but I'm concerned that the long-term pain
>>> for user-space developers who use an API which (in my estimation) may
>>> come to be widely used will be enormous.
>>
>> Yes, we've jointly reviewed the API details again until just recently to
>> unify structs and enums etc, and added fields to make the ioctls structs
>> more versatile for possible future additions. By that, we effectively
>> broke the ABI, but we did that because we know we can't do such things
>> again in the future.
>>
>> But again - I don't see how this would be different when using syscalls
>> rather than ioctls to transport information between the driver and
>> userspace. Could you elaborate?
>
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
>
> * Documentation that is, considering the size of the API, *way* too thin.

Ok, working on that.

> * Some parts of the API not documented at all (various kdbus_item blobs)

All public structures have documentation in kdbus.h. It may need
improvements, though.

> * ABI changes happening even quite recently

Please elaborate why 'recent ABI-changes' are a sign of a premature API.

I seriously doubt any API can be called 'perfect'. On the contrary, I
believe that all APIs could be improved continuously. The fact that
we, at one point, settle on an API is an admission of
backwards-compatibility. I in no way think it's a sign of
'perfection'.
With kdbus our plan is to give API-guarantees starting with upstream
inclusion. We know, that our API will not be perfect, none is. But we
will try hard to fix anything that comes up, as long as we can. And
this effort will manifest in ABI-breaks.

> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?

If we only returned EINVAL on invalid arguments, user-space had to
probe for each flag to see whether it's supported. By returning the
set of supported flags, user-space can cache those and _reliably_ know
which flags are supported.
We decided the overhead of a single u64 copy on each ioctl is
preferred over a separate syscall/ioctl to query kernel flags. If you
disagree, please elaborate (preferably with a suggestion how to do it
better).

> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

If you find the time, please do! Any hints how a specific part of the
API could be done better, is highly appreciated. A lot of the more or
less recent changes were done due to reviews from glib developers.
More help is always welcome!

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-22 13:46               ` David Herrmann
  (?)
@ 2015-01-22 14:49               ` Austin S Hemmelgarn
  2015-01-23 16:08                 ` Greg Kroah-Hartman
  -1 siblings, 1 reply; 143+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-22 14:49 UTC (permalink / raw)
  To: David Herrmann, Michael Kerrisk (man-pages)
  Cc: Daniel Mack, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Theodore T'so

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

On 2015-01-22 08:46, David Herrmann wrote:
> Hi Michael
>
> On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>    be told what flags the kernel supports on *every* operation?
>
> If we only returned EINVAL on invalid arguments, user-space had to
> probe for each flag to see whether it's supported. By returning the
> set of supported flags, user-space can cache those and _reliably_ know
> which flags are supported.
> We decided the overhead of a single u64 copy on each ioctl is
> preferred over a separate syscall/ioctl to query kernel flags. If you
> disagree, please elaborate (preferably with a suggestion how to do it
> better).
>
While I agree that there should be a way for userspace to get the list 
of supported operations, userspace apps will only actually care about 
that once, when they begin talking to kdbus, because (ignoring the live 
kernel patching that people have been working on recently) the list of 
supported operations isn't going to change while the system is running. 
  While a u64 copy has relatively low overhead, it does have overhead, 
and that is very significant when you consider part of the reason some 
people want kdbus is for the performance gain.  Especially for those 
automotive applications that have been mentioned which fire off 
thousands of messages during start-up, every little bit of performance 
is significant.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23  6:28     ` Ahmed S. Darwish
  0 siblings, 0 replies; 143+ messages in thread
From: Ahmed S. Darwish @ 2015-01-23  6:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz,
	Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel@zonque.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus.

Pardon my ignorance, but we've always been told that adding
new ioctl()s to the kernel is a very big no-no.  But given
the seniority of the folks stewarding this kdbus effort,
there must be a good rationale ;-)

So, can the rationale behind introducing new ioctl()s be
further explained? It would be even better if it's included
in the documentation patch itself.

Thanks,
Darwish

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23  6:28     ` Ahmed S. Darwish
  0 siblings, 0 replies; 143+ messages in thread
From: Ahmed S. Darwish @ 2015-01-23  6:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> 
> kdbus is a system for low-latency, low-overhead, easy to use
> interprocess communication (IPC).
> 
> The interface to all functions in this driver is implemented via ioctls
> on files exposed through a filesystem called 'kdbusfs'. The default
> mount point of kdbusfs is /sys/fs/kdbus.

Pardon my ignorance, but we've always been told that adding
new ioctl()s to the kernel is a very big no-no.  But given
the seniority of the folks stewarding this kdbus effort,
there must be a good rationale ;-)

So, can the rationale behind introducing new ioctl()s be
further explained? It would be even better if it's included
in the documentation patch itself.

Thanks,
Darwish

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 11:47                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-23 11:47 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages, Daniel Mack, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Theodore T'so,
	Christoph Hellwig

Hi David,

On 01/22/2015 02:46 PM, David Herrmann wrote:
> Hi Michael
> 
> On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> On 01/21/2015 05:58 PM, Daniel Mack wrote:
>>>>> Also, the context the kdbus commands operate on originate from a
>>>>> mountable special-purpose file system.
>>>>
>>>> It's not clear to me how this point implies any particular API design
>>>> choice.
>>>
>>> It emphasizes the fact that our ioctl API can only be used with the
>>> nodes exposed by kdbusfs, and vice versa. I think operations on
>>> driver-specific files do not justify a new 'generic' syscall API.
>>
>> I see your (repeated) use of words like "driver" as just a distraction,
>> I'm sorry. "Driver-specific files" is an implementation detail. The
>> point is that kdbus provides a complex, general-purpose feature for all
>> of userland. It is core infrastructure that will be used by key pieces
>> of the plumbing layer, and likely by many other applications as well.
>> *Please* stop suggesting that it is not core infrastructure: kdbus
>> has the potential to be a great IPC that will be very useful to many,
>> many user-space developers.
> 
> We called it an 'ipc driver' so far. It is in no way meant as
> distraction. Feel free to call it 'core infrastructure'. I think we
> can agree that we want it to be generically useful, like other ipc
> mechanisms, including UDS and netlink.

Yes.

>> (By the way, we have precedents for device/filesystem-specific system
>> calls. Even a recent one, in memfd_create().)
> 
> memfd_create() is in no way file-system specific. Internally, it uses
> shmem, but that's an implementation detail. The API does not expose
> this in any way. If you were referring to sealing, it's implemented as
> fcntl(), not as a separate syscall. Furthermore, sealing is only
> limited to shmem as it's the only volatile storage. It's not an API
> restriction. Other volatile file-systems are free to implement
> sealing.

My bad. I mispoke there.

>>>> ioctl() is a get-out-of-jail free card when it comes to API design.
>>>
>>> And how are syscalls different in that regard when they would both
>>> transport the same data structures?
>>
>> My general impression is that when it comes to system calls,
>> there's usually a lot more up front thought about API design.
> 
> This is no technical reason why to use syscalls over ioctls. Imho,
> it's rather a reason to improve the kernel's ioctl-review process.

Agreed it's not a technical reason. But the distinction I make 
does capture how things usually work.

[...]

>>>> Rather
>>>> than thinking carefully and long about a set of coherent, stable APIs that
>>>> provide some degree of type-safety to user-space, one just adds/changes/removes
>>>> an ioctl.
>>>
>>> Adding another ioctl in the future for cases we didn't think about
>>> earlier would of course be considered a workaround; and even though such
>>> things happen all the time, it's certainly something we'd like to avoid.
>>>
>>> However, we would also need to add another syscall in such cases, which
>>> is even worse IMO. So that's really not an argument against ioctls after
>>> all. What fundamental difference between a raw syscall and a ioctl,
>>> especially with regards to type safety, is there which would help us here?
>>
>> Type safety helps user-space application developers. System calls have
>> it, ioctl() does not.
> 
> This is simply not true. There is no type-safety in:
>     syscall(__NR_xyz, some, random, arguments)
> 
> The only way a syscall gets 'type-safe', is to provide a wrapper
> function. Same applies to ioctls. But people tend to not do that for
> ioctls, which is, again, not a technical argument against ioctls. It's
> a matter of psychology, though.

Yes, I see I wasn't quite clear enough. I should have said:

    Type safety helps user-space application developers. 
    *As typically provided to user-space via libc wrappers*,
    system calls have it, ioctl() does not.

And system call wrappers are generally provided pretty much automatically
by libcs (at least, mostly, there's some to-ing and fro-ing about this in
glibc-land these days as you are aware from the memfd_create() story;
http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937).

So, in practice user-space programmers typically automatically get type
safety with system calls, but not with ioctl().

> I still don't see a technical reason to use syscalls. API proposals welcome!
> We're now working on a small kdbus helper library, which provides
> type-safe ioctl wrappers, item-iterators and documented examples. But,
> like syscalls, nobody is forced to use the wrappers. The API design is
> not affected by this.
> 
>>>> And that process seems to be frequent and ongoing even now. (And
>>>> it's to your great credit that the API/ABI breaks are clearly and honestly
>>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>>> design for kernel developers, but I'm concerned that the long-term pain
>>>> for user-space developers who use an API which (in my estimation) may
>>>> come to be widely used will be enormous.
>>>
>>> Yes, we've jointly reviewed the API details again until just recently to
>>> unify structs and enums etc, and added fields to make the ioctls structs
>>> more versatile for possible future additions. By that, we effectively
>>> broke the ABI, but we did that because we know we can't do such things
>>> again in the future.
>>>
>>> But again - I don't see how this would be different when using syscalls
>>> rather than ioctls to transport information between the driver and
>>> userspace. Could you elaborate?
>>
>> My suspicion is that not nearly enough thinking has yet been done about
>> the design of the API. That's based on these observations:
>>
>> * Documentation that is, considering the size of the API, *way* too thin.
> 
> Ok, working on that.
> 
>> * Some parts of the API not documented at all (various kdbus_item blobs)
> 
> All public structures have documentation in kdbus.h. It may need
> improvements, though.

Again -- that's very thin--one liners aand sentence fragments for the most 
part. (Not that I think that needs to be fixed, just that that doesn't
fit with my definition of "documented", which should be something like
what I've requested for kdbus.txt.)

>> * ABI changes happening even quite recently
> 
> Please elaborate why 'recent ABI-changes' are a sign of a premature API.
> 
> I seriously doubt any API can be called 'perfect'. On the contrary, I
> believe that all APIs could be improved continuously. The fact that
> we, at one point, settle on an API is an admission of
> backwards-compatibility. I in no way think it's a sign of
> 'perfection'.

I agree that no API is perfect. But some emerge from the starting gate 
in much better state than others. kdbus will likely be an important API,
and it's important that it's well designed one at the outset.

> With kdbus our plan is to give API-guarantees starting with upstream
> inclusion. We know, that our API will not be perfect, none is. But we
> will try hard to fix anything that comes up, as long as we can. And
> this effort will manifest in ABI-breaks.

Fair enough. My concern is that upstream inclusion is being rushed before
the API design can be well assessed.

>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>   be told what flags the kernel supports on *every* operation?
> 
> If we only returned EINVAL on invalid arguments, user-space had to
> probe for each flag to see whether it's supported. By returning the
> set of supported flags, user-space can cache those and _reliably_ know

(Not sure why you emphasize "reliably" here...)

> which flags are supported.
> We decided the overhead of a single u64 copy on each ioctl is
> preferred over a separate syscall/ioctl to query kernel flags. If you
> disagree, please elaborate (preferably with a suggestion how to do it
> better).

Well that's a quite unconventional design choice. Determining the set
of supported flags in an API is a one-time operation. The natural--and
I would say, *obviously* better--approach to this would be either the
traditional EINVAL approach or an API that is called once to retrieve 
the set of supported flags. Instead, kdbus clutters the APIs with a 
mostly unneeded extra piece of information on *every* call, and 
fractionally increases the run time, for no good reason.

Now, you might say that my suggested alternatives are not obviously
better. My response is that, at the very least, in the documentation
I'd expect to see this unconventional approach clearly highlighted 
and accompanied with a sound reason for choosing it. (Note: so far I 
still haven't actually seen a sound reason...)
 
>> The above is just after a day of looking hard at kdbus.txt. I strongly
>> suspect I'd find a lot of other issues if I spent more time on kdbus.
> 
> If you find the time, please do! Any hints how a specific part of the
> API could be done better, is highly appreciated. A lot of the more or
> less recent changes were done due to reviews from glib developers.

Could you point me at those reviews (mailing list archives?)?

So, I'm thinking about things such as the following:

* The odd choice of ioctl() as the API mechanism for what should become
  a key user-space API. (BTW, which other widely used IPC API ever
  took the approach of ioctl() as the mechanism?)

* Weak justifications for unconventional API design choices such
  as the 'kernel_flags' above.

* Thin documentation that doesn't provide nearly enough detail,
  has no worked examples of the use of the APIs (when it should 
  contain a multitude of such examples), and has no rationale 
  for the API design choices [1].

* An API design that consists of 16 ioctl() requests and 20+ 
  structures exchanged between user and kernel space being called 
  "simple". (Clearly it is not.)

Given a list of points like that, I worry that not nearly enough
thought has been put into design of the API, and certainly would be 
very concerned to think that it might be merged into mainline
in the near future. 

At this point, I think the onus is on the kdbus developers to 
provide strong evidence that they have a good API design, one that 
will well serve the needs of thousands of user-space programmers for
the next few decades. Such evidence would include at least:

  * Detailed documentation that fully described all facets of the API
  * A number of working, well documented example programs that start
    (very) simple, and ramp up to demonstrate more complex pieces
    of the API.
  * Documented rationale for API design choices.

To date, much of that sort of evidence is lacking, and I worry that
the job of proper API design will be left to someone else, someone
who devises a user-space library that provides a suitable 
abstraction on top of the current ioctl() API (but may be forced to
make design compromises because of design failings in the underlying
kernel API).

> More help is always welcome!

That's what I'm trying to do at the moment...

Cheers,

Michael


[1] Elsewhere in this thread, you've said that "with bus-proxy and 
    systemd we have two huge users of kdbus that put a lot of 
    pressure on API design." I would have expected that the results 
    of that pressure should be captured in documentation of the
    rationale for the API design. I haven't found any of that, 
    so far.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 11:47                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-23 11:47 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Daniel Mack,
	Greg Kroah-Hartman, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Theodore T'so, Christoph Hellwig

Hi David,

On 01/22/2015 02:46 PM, David Herrmann wrote:
> Hi Michael
> 
> On Thu, Jan 22, 2015 at 11:18 AM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 01/21/2015 05:58 PM, Daniel Mack wrote:
>>>>> Also, the context the kdbus commands operate on originate from a
>>>>> mountable special-purpose file system.
>>>>
>>>> It's not clear to me how this point implies any particular API design
>>>> choice.
>>>
>>> It emphasizes the fact that our ioctl API can only be used with the
>>> nodes exposed by kdbusfs, and vice versa. I think operations on
>>> driver-specific files do not justify a new 'generic' syscall API.
>>
>> I see your (repeated) use of words like "driver" as just a distraction,
>> I'm sorry. "Driver-specific files" is an implementation detail. The
>> point is that kdbus provides a complex, general-purpose feature for all
>> of userland. It is core infrastructure that will be used by key pieces
>> of the plumbing layer, and likely by many other applications as well.
>> *Please* stop suggesting that it is not core infrastructure: kdbus
>> has the potential to be a great IPC that will be very useful to many,
>> many user-space developers.
> 
> We called it an 'ipc driver' so far. It is in no way meant as
> distraction. Feel free to call it 'core infrastructure'. I think we
> can agree that we want it to be generically useful, like other ipc
> mechanisms, including UDS and netlink.

Yes.

>> (By the way, we have precedents for device/filesystem-specific system
>> calls. Even a recent one, in memfd_create().)
> 
> memfd_create() is in no way file-system specific. Internally, it uses
> shmem, but that's an implementation detail. The API does not expose
> this in any way. If you were referring to sealing, it's implemented as
> fcntl(), not as a separate syscall. Furthermore, sealing is only
> limited to shmem as it's the only volatile storage. It's not an API
> restriction. Other volatile file-systems are free to implement
> sealing.

My bad. I mispoke there.

>>>> ioctl() is a get-out-of-jail free card when it comes to API design.
>>>
>>> And how are syscalls different in that regard when they would both
>>> transport the same data structures?
>>
>> My general impression is that when it comes to system calls,
>> there's usually a lot more up front thought about API design.
> 
> This is no technical reason why to use syscalls over ioctls. Imho,
> it's rather a reason to improve the kernel's ioctl-review process.

Agreed it's not a technical reason. But the distinction I make 
does capture how things usually work.

[...]

>>>> Rather
>>>> than thinking carefully and long about a set of coherent, stable APIs that
>>>> provide some degree of type-safety to user-space, one just adds/changes/removes
>>>> an ioctl.
>>>
>>> Adding another ioctl in the future for cases we didn't think about
>>> earlier would of course be considered a workaround; and even though such
>>> things happen all the time, it's certainly something we'd like to avoid.
>>>
>>> However, we would also need to add another syscall in such cases, which
>>> is even worse IMO. So that's really not an argument against ioctls after
>>> all. What fundamental difference between a raw syscall and a ioctl,
>>> especially with regards to type safety, is there which would help us here?
>>
>> Type safety helps user-space application developers. System calls have
>> it, ioctl() does not.
> 
> This is simply not true. There is no type-safety in:
>     syscall(__NR_xyz, some, random, arguments)
> 
> The only way a syscall gets 'type-safe', is to provide a wrapper
> function. Same applies to ioctls. But people tend to not do that for
> ioctls, which is, again, not a technical argument against ioctls. It's
> a matter of psychology, though.

Yes, I see I wasn't quite clear enough. I should have said:

    Type safety helps user-space application developers. 
    *As typically provided to user-space via libc wrappers*,
    system calls have it, ioctl() does not.

And system call wrappers are generally provided pretty much automatically
by libcs (at least, mostly, there's some to-ing and fro-ing about this in
glibc-land these days as you are aware from the memfd_create() story;
http://thread.gmane.org/gmane.comp.lib.glibc.alpha/45884/focus=46937).

So, in practice user-space programmers typically automatically get type
safety with system calls, but not with ioctl().

> I still don't see a technical reason to use syscalls. API proposals welcome!
> We're now working on a small kdbus helper library, which provides
> type-safe ioctl wrappers, item-iterators and documented examples. But,
> like syscalls, nobody is forced to use the wrappers. The API design is
> not affected by this.
> 
>>>> And that process seems to be frequent and ongoing even now. (And
>>>> it's to your great credit that the API/ABI breaks are clearly and honestly
>>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>>> design for kernel developers, but I'm concerned that the long-term pain
>>>> for user-space developers who use an API which (in my estimation) may
>>>> come to be widely used will be enormous.
>>>
>>> Yes, we've jointly reviewed the API details again until just recently to
>>> unify structs and enums etc, and added fields to make the ioctls structs
>>> more versatile for possible future additions. By that, we effectively
>>> broke the ABI, but we did that because we know we can't do such things
>>> again in the future.
>>>
>>> But again - I don't see how this would be different when using syscalls
>>> rather than ioctls to transport information between the driver and
>>> userspace. Could you elaborate?
>>
>> My suspicion is that not nearly enough thinking has yet been done about
>> the design of the API. That's based on these observations:
>>
>> * Documentation that is, considering the size of the API, *way* too thin.
> 
> Ok, working on that.
> 
>> * Some parts of the API not documented at all (various kdbus_item blobs)
> 
> All public structures have documentation in kdbus.h. It may need
> improvements, though.

Again -- that's very thin--one liners aand sentence fragments for the most 
part. (Not that I think that needs to be fixed, just that that doesn't
fit with my definition of "documented", which should be something like
what I've requested for kdbus.txt.)

>> * ABI changes happening even quite recently
> 
> Please elaborate why 'recent ABI-changes' are a sign of a premature API.
> 
> I seriously doubt any API can be called 'perfect'. On the contrary, I
> believe that all APIs could be improved continuously. The fact that
> we, at one point, settle on an API is an admission of
> backwards-compatibility. I in no way think it's a sign of
> 'perfection'.

I agree that no API is perfect. But some emerge from the starting gate 
in much better state than others. kdbus will likely be an important API,
and it's important that it's well designed one at the outset.

> With kdbus our plan is to give API-guarantees starting with upstream
> inclusion. We know, that our API will not be perfect, none is. But we
> will try hard to fix anything that comes up, as long as we can. And
> this effort will manifest in ABI-breaks.

Fair enough. My concern is that upstream inclusion is being rushed before
the API design can be well assessed.

>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>   be told what flags the kernel supports on *every* operation?
> 
> If we only returned EINVAL on invalid arguments, user-space had to
> probe for each flag to see whether it's supported. By returning the
> set of supported flags, user-space can cache those and _reliably_ know

(Not sure why you emphasize "reliably" here...)

> which flags are supported.
> We decided the overhead of a single u64 copy on each ioctl is
> preferred over a separate syscall/ioctl to query kernel flags. If you
> disagree, please elaborate (preferably with a suggestion how to do it
> better).

Well that's a quite unconventional design choice. Determining the set
of supported flags in an API is a one-time operation. The natural--and
I would say, *obviously* better--approach to this would be either the
traditional EINVAL approach or an API that is called once to retrieve 
the set of supported flags. Instead, kdbus clutters the APIs with a 
mostly unneeded extra piece of information on *every* call, and 
fractionally increases the run time, for no good reason.

Now, you might say that my suggested alternatives are not obviously
better. My response is that, at the very least, in the documentation
I'd expect to see this unconventional approach clearly highlighted 
and accompanied with a sound reason for choosing it. (Note: so far I 
still haven't actually seen a sound reason...)
 
>> The above is just after a day of looking hard at kdbus.txt. I strongly
>> suspect I'd find a lot of other issues if I spent more time on kdbus.
> 
> If you find the time, please do! Any hints how a specific part of the
> API could be done better, is highly appreciated. A lot of the more or
> less recent changes were done due to reviews from glib developers.

Could you point me at those reviews (mailing list archives?)?

So, I'm thinking about things such as the following:

* The odd choice of ioctl() as the API mechanism for what should become
  a key user-space API. (BTW, which other widely used IPC API ever
  took the approach of ioctl() as the mechanism?)

* Weak justifications for unconventional API design choices such
  as the 'kernel_flags' above.

* Thin documentation that doesn't provide nearly enough detail,
  has no worked examples of the use of the APIs (when it should 
  contain a multitude of such examples), and has no rationale 
  for the API design choices [1].

* An API design that consists of 16 ioctl() requests and 20+ 
  structures exchanged between user and kernel space being called 
  "simple". (Clearly it is not.)

Given a list of points like that, I worry that not nearly enough
thought has been put into design of the API, and certainly would be 
very concerned to think that it might be merged into mainline
in the near future. 

At this point, I think the onus is on the kdbus developers to 
provide strong evidence that they have a good API design, one that 
will well serve the needs of thousands of user-space programmers for
the next few decades. Such evidence would include at least:

  * Detailed documentation that fully described all facets of the API
  * A number of working, well documented example programs that start
    (very) simple, and ramp up to demonstrate more complex pieces
    of the API.
  * Documented rationale for API design choices.

To date, much of that sort of evidence is lacking, and I worry that
the job of proper API design will be left to someone else, someone
who devises a user-space library that provides a suitable 
abstraction on top of the current ioctl() API (but may be forced to
make design compromises because of design failings in the underlying
kernel API).

> More help is always welcome!

That's what I'm trying to do at the moment...

Cheers,

Michael


[1] Elsewhere in this thread, you've said that "with bus-proxy and 
    systemd we have two huge users of kdbus that put a lot of 
    pressure on API design." I would have expected that the results 
    of that pressure should be captured in documentation of the
    rationale for the API design. I haven't found any of that, 
    so far.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-23  6:28     ` Ahmed S. Darwish
  (?)
@ 2015-01-23 13:19     ` Greg Kroah-Hartman
  2015-01-23 13:29         ` Greg Kroah-Hartman
  2015-01-25  3:30         ` Ahmed S. Darwish
  -1 siblings, 2 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 13:19 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz,
	Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > From: Daniel Mack <daniel@zonque.org>
> > 
> > kdbus is a system for low-latency, low-overhead, easy to use
> > interprocess communication (IPC).
> > 
> > The interface to all functions in this driver is implemented via ioctls
> > on files exposed through a filesystem called 'kdbusfs'. The default
> > mount point of kdbusfs is /sys/fs/kdbus.
> 
> Pardon my ignorance, but we've always been told that adding
> new ioctl()s to the kernel is a very big no-no.  But given
> the seniority of the folks stewarding this kdbus effort,
> there must be a good rationale ;-)
> 
> So, can the rationale behind introducing new ioctl()s be
> further explained? It would be even better if it's included
> in the documentation patch itself.

The main reason to use an ioctl is that you want to atomically set
and/or get something "complex" through the user/kernel boundary.  For
simple device attributes, sysfs works great, for configuring devices,
configfs works great, but for data streams / structures / etc. an ioctl
is the correct thing to use.

Examples of new ioctls being added to the kernel are all over the
place, look at all of the special-purpose ioctls the filesystems keep
creating (they aren't adding new syscalls), look at the monstrosity that
is the DRM layer, look at other complex things like openvswitch, or
"simpler" device-specific interfaces like the MEI one, or even more
complex ones like the MMC interface.  These are all valid uses of ioctls
as they are device/filesystem specific ways to interact with the kernel.

The thing is, almost no one pays attention to these new ioctls as they
are domain-specific interfaces, with open userspace programs talking to
them, and they work well.  ioctl is a powerful and useful interface, and
if we were to suddenly require no new ioctls, and require everything to
be a syscall, we would do nothing except make apis more complex (hint,
you now have to do extra validation on your file descriptor passed to
you to determine if it really is what you can properly operate your
ioctl on), and cause no real benefit at all.

Yes, people abuse ioctls at times, but really, they are needed.

And remember, I was one of the people who long ago thought we should not
be adding more ioctls, but I have seen the folly of my ways, and chalk
it up to youthful ignorance :)

Hope this helps explain things better,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 13:29         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 13:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz,
	Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel@zonque.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.

Oops, I meant, MIC, not MMC, sorry about that.


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 13:29         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 13:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.

Oops, I meant, MIC, not MMC, sorry about that.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 15:54               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 15:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, arnd, ebiederm, gnomes, teg, jkosina, luto,
	linux-api, linux-kernel, dh.herrmann, tixxdz,
	Johannes Stezenbach, Theodore T'so

On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
> >> And that process seems to be frequent and ongoing even now. (And 
> >> it's to your great credit that the API/ABI breaks are clearly and honestly 
> >> marked in the kdbus.h changelog.) All of this lightens the burden of API
> >> design for kernel developers, but I'm concerned that the long-term pain
> >> for user-space developers who use an API which (in my estimation) may
> >> come to be widely used will be enormous.
> > 
> > Yes, we've jointly reviewed the API details again until just recently to
> > unify structs and enums etc, and added fields to make the ioctls structs
> > more versatile for possible future additions. By that, we effectively
> > broke the ABI, but we did that because we know we can't do such things
> > again in the future.
> > 
> > But again - I don't see how this would be different when using syscalls
> > rather than ioctls to transport information between the driver and
> > userspace. Could you elaborate?
> 
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
> 
> * Documentation that is, considering the size of the API, *way* too thin.
> * Some parts of the API not documented at all (various kdbus_item blobs)
> * ABI changes happening even quite recently
> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?
> 
> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

"not enough thinking"?

We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
Before that we have been thinking about this for many years, learning
from the previous attempts to get this type of feature merged into the
kernel, talking with users about what they need for this, and soliciting
kernel developer's opinions on what type of API would be best for this
type of feature.

Since then we have done nothing but constantly revise the API.  My first
mock ups were way too simple, and in discussing things with people much
more knowledgeable about D-Bus, they pointed out the problems, and we
iterated.  And iterated.  And iterated some more.  We have worked with
just about every userspace libdbus developer group, including QtDbus
developers as well as glib developers.  Now not all of them agreed with
some of our decisions in the implementation, which is fair enough, you
can't please everyone, but they _all_ agree that what we have now is the
proper way to implement this type of functionality and have reviewed the
features as being correct and compatible with their needs and users.

Those discussions have happened in emails, presentations, meetings, and
hackfests pretty much continuously for the past 2 years all around the
world.

We have stress-tested the api with both unit tests (which are included
here in the patch set) as well as a real-world implementation (sd-bus in
the systemd source repo.)  That real-world implementation successfully
has been booting many of our daily machines for many months now.

Yes, the documentation can always be better, but please don't confuse
the lack of understanding how D-Bus works and its model with the lack of
understanding this kdbus implementation, the two are not comparable.
For some good primers on what D-Bus is, and the terminology it expects
see:
	http://dbus.freedesktop.org/doc/dbus-tutorial.html
and also:
	http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc

We are not going to put a basic "here is what D-Bus is and how to use
it" into the kernel tree, that is totally outside the scope here.

I suggest reading the tutorial above, and then going back and reading
the kdbus documentation provided.  If you think we are lacking stuff on
the kdbus side, we will be glad to flush out any needed areas.

Also, Daniel has said he will work on a basic userspace "example"
library to show how to use this api, which might make the api a bit
easier to understand.

However, I personally don't think this "example code" is necessary at
all.  We don't ask for this type of "simple examples" from other new
kernel apis we create and add to the kernel all the time.  We require
there to be a user of the api, but not one that is so well documented
that others can write a from-scratch raw userspace replacement.
Specific examples of this are my previously mentioned ioctl users
(btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
apis, DRM.

Users aren't going to be writing their own "raw kdbus" libraries.  Or if
they are, they are going to start with one of the existing
implementations we have (the test examples and sd-bus, and I think there
is a native Go implementation somewhere as well.)  Users are going to be
using those libraries to write their code, and to be honest, the user
api for sd-bus is a delight to use compared to the "old style" libdbus
interface as we have the benefit of 10 years of experience working with
D-Bus apis in the wild now to learn from past mistakes.

Back to the API.  We have taken review comments on the previous postings
of the code and reworked the API, moving it from a character device to
be a filesystem, which ended up making things a lot easier in the end, a
good example of a review process that is working.  Those changes are
a sign that our development review process works.  People pointed out
problems in our character api that we hadn't thought about from the
kernel implementation side.  And so we changed them and the code is
better and more robust because of it, a success story for our review
process.

Personally, I was the one that started down the character device node
path, so blame that design decision on me, not the other developers
here.  And I was wrong with that, but moving from character to a
filesystem wasn't a huge change, the structures and interactions all
remained almost identical, as the logic behind the API is, in my
opinion, correct for the problem it is addressing.

The 37 different developers who have contributed to this code base are
quite talented and skilled and experienced in user and kernel apis,
having implemented many apis of their own that users rely on every day.

Yes, we all make design mistakes, and you might not agree with some of
them, that's fine.  But it is flat out rude to say that we have not been
thinking about this, when I would guess that this is one of the largest
(in time and contributions) kernel development feature to be worked on
in the past few years.

And yes, I'm being very defensive, as I take this very seriously, so
please, don't so lightly dismiss us as not knowing what we are doing, as
frankly, we do.

Thanks for making it this far, I'll go back to technical discussions of
the API now, as that's what we should be doing, not casting aspirations
as to the aptitude of the people involved.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-23 15:54               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 15:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach,
	Theodore T'so

On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
> >> And that process seems to be frequent and ongoing even now. (And 
> >> it's to your great credit that the API/ABI breaks are clearly and honestly 
> >> marked in the kdbus.h changelog.) All of this lightens the burden of API
> >> design for kernel developers, but I'm concerned that the long-term pain
> >> for user-space developers who use an API which (in my estimation) may
> >> come to be widely used will be enormous.
> > 
> > Yes, we've jointly reviewed the API details again until just recently to
> > unify structs and enums etc, and added fields to make the ioctls structs
> > more versatile for possible future additions. By that, we effectively
> > broke the ABI, but we did that because we know we can't do such things
> > again in the future.
> > 
> > But again - I don't see how this would be different when using syscalls
> > rather than ioctls to transport information between the driver and
> > userspace. Could you elaborate?
> 
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
> 
> * Documentation that is, considering the size of the API, *way* too thin.
> * Some parts of the API not documented at all (various kdbus_item blobs)
> * ABI changes happening even quite recently
> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?
> 
> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

"not enough thinking"?

We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
Before that we have been thinking about this for many years, learning
from the previous attempts to get this type of feature merged into the
kernel, talking with users about what they need for this, and soliciting
kernel developer's opinions on what type of API would be best for this
type of feature.

Since then we have done nothing but constantly revise the API.  My first
mock ups were way too simple, and in discussing things with people much
more knowledgeable about D-Bus, they pointed out the problems, and we
iterated.  And iterated.  And iterated some more.  We have worked with
just about every userspace libdbus developer group, including QtDbus
developers as well as glib developers.  Now not all of them agreed with
some of our decisions in the implementation, which is fair enough, you
can't please everyone, but they _all_ agree that what we have now is the
proper way to implement this type of functionality and have reviewed the
features as being correct and compatible with their needs and users.

Those discussions have happened in emails, presentations, meetings, and
hackfests pretty much continuously for the past 2 years all around the
world.

We have stress-tested the api with both unit tests (which are included
here in the patch set) as well as a real-world implementation (sd-bus in
the systemd source repo.)  That real-world implementation successfully
has been booting many of our daily machines for many months now.

Yes, the documentation can always be better, but please don't confuse
the lack of understanding how D-Bus works and its model with the lack of
understanding this kdbus implementation, the two are not comparable.
For some good primers on what D-Bus is, and the terminology it expects
see:
	http://dbus.freedesktop.org/doc/dbus-tutorial.html
and also:
	http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc

We are not going to put a basic "here is what D-Bus is and how to use
it" into the kernel tree, that is totally outside the scope here.

I suggest reading the tutorial above, and then going back and reading
the kdbus documentation provided.  If you think we are lacking stuff on
the kdbus side, we will be glad to flush out any needed areas.

Also, Daniel has said he will work on a basic userspace "example"
library to show how to use this api, which might make the api a bit
easier to understand.

However, I personally don't think this "example code" is necessary at
all.  We don't ask for this type of "simple examples" from other new
kernel apis we create and add to the kernel all the time.  We require
there to be a user of the api, but not one that is so well documented
that others can write a from-scratch raw userspace replacement.
Specific examples of this are my previously mentioned ioctl users
(btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
apis, DRM.

Users aren't going to be writing their own "raw kdbus" libraries.  Or if
they are, they are going to start with one of the existing
implementations we have (the test examples and sd-bus, and I think there
is a native Go implementation somewhere as well.)  Users are going to be
using those libraries to write their code, and to be honest, the user
api for sd-bus is a delight to use compared to the "old style" libdbus
interface as we have the benefit of 10 years of experience working with
D-Bus apis in the wild now to learn from past mistakes.

Back to the API.  We have taken review comments on the previous postings
of the code and reworked the API, moving it from a character device to
be a filesystem, which ended up making things a lot easier in the end, a
good example of a review process that is working.  Those changes are
a sign that our development review process works.  People pointed out
problems in our character api that we hadn't thought about from the
kernel implementation side.  And so we changed them and the code is
better and more robust because of it, a success story for our review
process.

Personally, I was the one that started down the character device node
path, so blame that design decision on me, not the other developers
here.  And I was wrong with that, but moving from character to a
filesystem wasn't a huge change, the structures and interactions all
remained almost identical, as the logic behind the API is, in my
opinion, correct for the problem it is addressing.

The 37 different developers who have contributed to this code base are
quite talented and skilled and experienced in user and kernel apis,
having implemented many apis of their own that users rely on every day.

Yes, we all make design mistakes, and you might not agree with some of
them, that's fine.  But it is flat out rude to say that we have not been
thinking about this, when I would guess that this is one of the largest
(in time and contributions) kernel development feature to be worked on
in the past few years.

And yes, I'm being very defensive, as I take this very seriously, so
please, don't so lightly dismiss us as not knowing what we are doing, as
frankly, we do.

Thanks for making it this far, I'll go back to technical discussions of
the API now, as that's what we should be doing, not casting aspirations
as to the aptitude of the people involved.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-22 14:49               ` Austin S Hemmelgarn
@ 2015-01-23 16:08                 ` Greg Kroah-Hartman
  2015-01-26 14:46                     ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-01-23 16:08 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: David Herrmann, Michael Kerrisk (man-pages),
	Daniel Mack, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux API, linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Theodore T'so

On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
> While I agree that there should be a way for userspace to get the list of
> supported operations, userspace apps will only actually care about that
> once, when they begin talking to kdbus, because (ignoring the live kernel
> patching that people have been working on recently) the list of supported
> operations isn't going to change while the system is running.  While a u64
> copy has relatively low overhead, it does have overhead, and that is very
> significant when you consider part of the reason some people want kdbus is
> for the performance gain.  Especially for those automotive applications that
> have been mentioned which fire off thousands of messages during start-up,
> every little bit of performance is significant.

A single u64 in a structure is not going to be measurable at all,
processors just copy memory too fast these days for 4 extra bytes to be
noticable.  So let's make this as easy as possible for userspace, making
it simpler logic there, which is much more important than saving
theoretical time in the kernel.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-25  3:30         ` Ahmed S. Darwish
  0 siblings, 0 replies; 143+ messages in thread
From: Ahmed S. Darwish @ 2015-01-25  3:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd, ebiederm, gnomes, teg, jkosina, luto, linux-api,
	linux-kernel, daniel, dh.herrmann, tixxdz,
	Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel@zonque.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.  These are all valid uses of ioctls
> as they are device/filesystem specific ways to interact with the kernel.
> 
> The thing is, almost no one pays attention to these new ioctls as they
> are domain-specific interfaces, with open userspace programs talking to
> them, and they work well.  ioctl is a powerful and useful interface, and
> if we were to suddenly require no new ioctls, and require everything to
> be a syscall, we would do nothing except make apis more complex (hint,
> you now have to do extra validation on your file descriptor passed to
> you to determine if it really is what you can properly operate your
> ioctl on), and cause no real benefit at all.
> 
> Yes, people abuse ioctls at times, but really, they are needed.
> 
> And remember, I was one of the people who long ago thought we should not
> be adding more ioctls, but I have seen the folly of my ways, and chalk
> it up to youthful ignorance :)
> 

Exactly, and that's why I wondered about the sudden change
of heart ;-)

Thanks for taking the time to write this.

Regards,
Darwish

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-25  3:30         ` Ahmed S. Darwish
  0 siblings, 0 replies; 143+ messages in thread
From: Ahmed S. Darwish @ 2015-01-25  3:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.  These are all valid uses of ioctls
> as they are device/filesystem specific ways to interact with the kernel.
> 
> The thing is, almost no one pays attention to these new ioctls as they
> are domain-specific interfaces, with open userspace programs talking to
> them, and they work well.  ioctl is a powerful and useful interface, and
> if we were to suddenly require no new ioctls, and require everything to
> be a syscall, we would do nothing except make apis more complex (hint,
> you now have to do extra validation on your file descriptor passed to
> you to determine if it really is what you can properly operate your
> ioctl on), and cause no real benefit at all.
> 
> Yes, people abuse ioctls at times, but really, they are needed.
> 
> And remember, I was one of the people who long ago thought we should not
> be adding more ioctls, but I have seen the folly of my ways, and chalk
> it up to youthful ignorance :)
> 

Exactly, and that's why I wondered about the sudden change
of heart ;-)

Thanks for taking the time to write this.

Regards,
Darwish

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 14:42                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-26 14:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: mtk.manpages, Daniel Mack, arnd, ebiederm, gnomes, teg, jkosina,
	luto, linux-api, linux-kernel, dh.herrmann, tixxdz,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hi Greg,

First of all, I seem to have offended you. That was not my intention.
It's certainly not my intent to disparage you or your work (or for 
that matter, the other kdbus developers). Insofar as any of the wordings 
I've used suggested otherwise, I do apologize.

I'll comment on various points below, keeping it as technical as I can.
Then I have a couple of general questions at the end with the goal
of ensuring that my comments are not coming from a broken world view.

On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
>>>> And that process seems to be frequent and ongoing even now. (And 
>>>> it's to your great credit that the API/ABI breaks are clearly and honestly 
>>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>>> design for kernel developers, but I'm concerned that the long-term pain
>>>> for user-space developers who use an API which (in my estimation) may
>>>> come to be widely used will be enormous.
>>>
>>> Yes, we've jointly reviewed the API details again until just recently to
>>> unify structs and enums etc, and added fields to make the ioctls structs
>>> more versatile for possible future additions. By that, we effectively
>>> broke the ABI, but we did that because we know we can't do such things
>>> again in the future.
>>>
>>> But again - I don't see how this would be different when using syscalls
>>> rather than ioctls to transport information between the driver and
>>> userspace. Could you elaborate?
>>
>> My suspicion is that not nearly enough thinking has yet been done about
>> the design of the API. That's based on these observations:
>>
>> * Documentation that is, considering the size of the API, *way* too thin.
>> * Some parts of the API not documented at all (various kdbus_item blobs)
>> * ABI changes happening even quite recently
>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>   be told what flags the kernel supports on *every* operation?
>>
>> The above is just after a day of looking hard at kdbus.txt. I strongly
>> suspect I'd find a lot of other issues if I spent more time on kdbus.
> 
> "not enough thinking"?
> 
> We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
> Before that we have been thinking about this for many years, learning
> from the previous attempts to get this type of feature merged into the
> kernel, talking with users about what they need for this, and soliciting
> kernel developer's opinions on what type of API would be best for this
> type of feature.
> 
> Since then we have done nothing but constantly revise the API.  My first
> mock ups were way too simple, and in discussing things with people much
> more knowledgeable about D-Bus, they pointed out the problems, and we
> iterated.  And iterated.  And iterated some more.  We have worked with
> just about every userspace libdbus developer group, including QtDbus
> developers as well as glib developers.  Now not all of them agreed with
> some of our decisions in the implementation, which is fair enough, you
> can't please everyone, but they _all_ agree that what we have now is the
> proper way to implement this type of functionality and have reviewed the
> features as being correct and compatible with their needs and users.
> 
> Those discussions have happened in emails, presentations, meetings, and
> hackfests pretty much continuously for the past 2 years all around the
> world.
> 
> We have stress-tested the api with both unit tests (which are included
> here in the patch set) as well as a real-world implementation (sd-bus in
> the systemd source repo.)  That real-world implementation successfully
> has been booting many of our daily machines for many months now.

Notwithstanding that I don't see how a unit test stress tests an API 
*design*, I've no reason to doubt that kdbus works. But that's not the 
point of my concern. I worry how usable this API is going to be for the 
world at large.

> Yes, the documentation can always be better, but please don't confuse
> the lack of understanding how D-Bus works and its model with the lack of
> understanding this kdbus implementation, the two are not comparable.
> For some good primers on what D-Bus is, and the terminology it expects
> see:
> 	http://dbus.freedesktop.org/doc/dbus-tutorial.html
> and also:
> 	http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc
> 
> We are not going to put a basic "here is what D-Bus is and how to use
> it" into the kernel tree, that is totally outside the scope here.

I didn't expect that you should do that. But it does touch on a general 
question that I'll leave to the end of this mail.

> I suggest reading the tutorial above, and then going back and reading
> the kdbus documentation provided.  If you think we are lacking stuff on
> the kdbus side, we will be glad to flush out any needed areas.
> 
> Also, Daniel has said he will work on a basic userspace "example"
> library to show how to use this api, which might make the api a bit
> easier to understand.
> 
> However, I personally don't think this "example code" is necessary at
> all.  We don't ask for this type of "simple examples" from other new
> kernel apis we create and add to the kernel all the time.   We require
> there to be a user of the api, but not one that is so well documented
> that others can write a from-scratch raw userspace replacement.

What you've just summarized there is how low a bar we've historically 
set in API design. Thus, the API is littered with (for example) dozens 
of system calls that were insufficiently well thought out in their 
design, and have subsequently been superseded by replacements that 
fixed the design mistakes. One of the cause of that problems is the
targeting of "*a* user of the API"--general purpose APIs need to be
considered from the point of view of multiple potentially different 
use cases. And I'm certainly not talking about being able to reimplement
the API, But, the API is a contract, and it needs to be well understood
by its creators and consumers in order that they can assess and use that
API. Extensive Documentation generally is the best way to do that.

And, anyway, I had understood that there was a rough consensus that we do
want to see more tests/examples and documentation happening in the future.
Certainly, the number kernel developers who are taking a shot at
writing man pages these days is refreshing. (We are 7/7 for man-page
documented system calls in the 3.17-3.19 frame. That's a trend I've done 
my best to encourage, and hope to see continue in the future.) And now
we have kselftest, in part thanks to your good efforts.

> Specific examples of this are my previously mentioned ioctl users
> (btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
> apis, DRM.

You've made this comparison a number of times, but I think it misses
a crucial point. Those examples are all(?) domain-specific APIs with
relatively few users in terms of user-space applications, whereas, 
IIUC, kdbus is intended to be an IPC mechanism that can be employed
by user-space applications in a general-purpose fashion, and upon 
which potentially a multitude of different applications might be built.
That's why I think the decision to use an ioctl()-based interface 
needs to be considered from a (very) skeptical point of view: no other 
general-purpose IPC mechanism employs such an approach. (Again, see
one of my general questions at the end of this mail.)

> Users aren't going to be writing their own "raw kdbus" libraries.  Or if
> they are, they are going to start with one of the existing
> implementations we have (the test examples and sd-bus, and I think there
> is a native Go implementation somewhere as well.)  Users are going to be
> using those libraries to write their code, and to be honest, the user
> api for sd-bus is a delight to use compared to the "old style" libdbus
> interface as we have the benefit of 10 years of experience working with
> D-Bus apis in the wild now to learn from past mistakes.
> 
> Back to the API.  We have taken review comments on the previous postings
> of the code and reworked the API, moving it from a character device to
> be a filesystem, which ended up making things a lot easier in the end, a
> good example of a review process that is working.  Those changes are
> a sign that our development review process works.  People pointed out
> problems in our character api that we hadn't thought about from the
> kernel implementation side.  And so we changed them and the code is
> better and more robust because of it, a success story for our review
> process.
> 
> Personally, I was the one that started down the character device node
> path, so blame that design decision on me, not the other developers
> here.  And I was wrong with that, but moving from character to a
> filesystem wasn't a huge change, the structures and interactions all
> remained almost identical, as the logic behind the API is, in my
> opinion, correct for the problem it is addressing.
> 
> The 37 different developers who have contributed to this code base are
> quite talented and skilled and experienced in user and kernel apis,
> having implemented many apis of their own that users rely on every day.

Yes, but I am not sure that the 15 developers who made each made 1 commit
(out of 2816 to date) would have done much work on the API. And probably 
the same is true for the 9 more who made just 2 or 3 commits. As one
would expect, the great deal of the good work has been done by a small
core: just shy of 95% commits by the top 5 committers.

> Yes, we all make design mistakes, and you might not agree with some of
> them, that's fine.  But it is flat out rude to say that we have not been
> thinking about this, when I would guess that this is one of the largest
> (in time and contributions) kernel development feature to be worked on
> in the past few years.
>
> And yes, I'm being very defensive, as I take this very seriously, so
> please, don't so lightly dismiss us as not knowing what we are doing, as
> frankly, we do.

Greg, I did not say you hadn't been thinking about this [API design].
(But I acknowledge my words could have been better chosen.) However,
API design is hard to get right, and causes endless pain when it's wrong.
And by now I've been watching long enough to know that the mistakes
are frequent. Even Davide Libenzi, who once upon a time was one of our 
more prolific and talented creators of APIs made mistakes. Thus, for
example, as we speak, the third iteration of epoll_wait() is in 
development (epoll_wait() ==> epoll_pwait() ==> epoll_pwait1()). 
And epoll is an API that is significantly simpler than kdbus.

In my observation, a good API design is a well documented API design.
Otherwise, it's virtually impossible to think thoroughly about the API,
and that is especially true as the API gets larger. (And AFAICS, the
kdbus API is bigger than, for example, the epoll API by an order of
magnitude, or so.) Part of that documentation also should include some
concrete examples of the use of the API. Again because it helps people
to think about and assess the API. (This is especially the case for
the fresh minds that explore the new API for the first time without 
having the preconceptions that are almost inevitable for the creators 
of the API.)

(BTW, I'm not ignoring your contents about the D-Bus spec above. But 
kdbus is a free-standing API, IIUC, and as such, it should be assessed 
on its own.)

I would summarize your statement above, as "trust us, we know what 
we're doing". With respect, my default position is not to trust. It's
nothing personal: API design is hard, and mistakes are too often made.
What I want to say in return is "trust us", where "us" is used very
inclusively to mean: the kernel maintainers, and for that matter 
user-space programmers, who need enough information that we can make a
well-informed assessment of the merits of an API that will need to be 
supported forever and may have a multitude of different users. In 
my assessment, the current information is far from sufficient, and 
it's a considerable risk to merge the API lacking such information.

> Thanks for making it this far, I'll go back to technical discussions of
> the API now, as that's what we should be doing, not casting aspirations
> as to the aptitude of the people involved.

Greg, that comment, with its implication that I am not concerned about
technical matters, but rather with something more malicious was quite 
uncalled for.

But, I am happy to return to technical matters. And I think it best to 
start with a couple of fundamental questions, since some of the comments
I've seen to date from different kdbus developers seem to conflict:

1. Is this intended to be a general purpose API that might see a 
   multitude of different users, or is it thought of as an API designed 
   to support a few specific users such as D-Bus and maybe a handful of 
   others? I had thought the former, but when you point me in the 
   direction of the D-Bus spec, I start to have doubts.

2. Is the API to be invoked directly by applications or is intended to
   be used only behind specific libraries? You seem to be saying that
   the latter is the case (here, I'm referring to your comment above 
   about sd-bus). However, when I asked David Herrmann a similar
   question I got this responser:

      "kdbus is in no way bound to systemd. There are ongoing efforts 
       to port glib and qt to kdbus natively. The API is pretty simple 
       and I don't see how a libkdbus would simplify things. In fact, 
       even our tests only have slim wrappers around the ioctls to 
       simplify error-handling in test-scenarios."

   To me, that implies that users will employ the raw kernel API.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 14:42                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-26 14:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Daniel Mack,
	arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Johannes Stezenbach,
	Theodore T'so, christoph Hellwig

Hi Greg,

First of all, I seem to have offended you. That was not my intention.
It's certainly not my intent to disparage you or your work (or for 
that matter, the other kdbus developers). Insofar as any of the wordings 
I've used suggested otherwise, I do apologize.

I'll comment on various points below, keeping it as technical as I can.
Then I have a couple of general questions at the end with the goal
of ensuring that my comments are not coming from a broken world view.

On 01/23/2015 04:54 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
>>>> And that process seems to be frequent and ongoing even now. (And 
>>>> it's to your great credit that the API/ABI breaks are clearly and honestly 
>>>> marked in the kdbus.h changelog.) All of this lightens the burden of API
>>>> design for kernel developers, but I'm concerned that the long-term pain
>>>> for user-space developers who use an API which (in my estimation) may
>>>> come to be widely used will be enormous.
>>>
>>> Yes, we've jointly reviewed the API details again until just recently to
>>> unify structs and enums etc, and added fields to make the ioctls structs
>>> more versatile for possible future additions. By that, we effectively
>>> broke the ABI, but we did that because we know we can't do such things
>>> again in the future.
>>>
>>> But again - I don't see how this would be different when using syscalls
>>> rather than ioctls to transport information between the driver and
>>> userspace. Could you elaborate?
>>
>> My suspicion is that not nearly enough thinking has yet been done about
>> the design of the API. That's based on these observations:
>>
>> * Documentation that is, considering the size of the API, *way* too thin.
>> * Some parts of the API not documented at all (various kdbus_item blobs)
>> * ABI changes happening even quite recently
>> * API oddities such as the 'kernel_flags' fields. Why do I need to
>>   be told what flags the kernel supports on *every* operation?
>>
>> The above is just after a day of looking hard at kdbus.txt. I strongly
>> suspect I'd find a lot of other issues if I spent more time on kdbus.
> 
> "not enough thinking"?
> 
> We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
> Before that we have been thinking about this for many years, learning
> from the previous attempts to get this type of feature merged into the
> kernel, talking with users about what they need for this, and soliciting
> kernel developer's opinions on what type of API would be best for this
> type of feature.
> 
> Since then we have done nothing but constantly revise the API.  My first
> mock ups were way too simple, and in discussing things with people much
> more knowledgeable about D-Bus, they pointed out the problems, and we
> iterated.  And iterated.  And iterated some more.  We have worked with
> just about every userspace libdbus developer group, including QtDbus
> developers as well as glib developers.  Now not all of them agreed with
> some of our decisions in the implementation, which is fair enough, you
> can't please everyone, but they _all_ agree that what we have now is the
> proper way to implement this type of functionality and have reviewed the
> features as being correct and compatible with their needs and users.
> 
> Those discussions have happened in emails, presentations, meetings, and
> hackfests pretty much continuously for the past 2 years all around the
> world.
> 
> We have stress-tested the api with both unit tests (which are included
> here in the patch set) as well as a real-world implementation (sd-bus in
> the systemd source repo.)  That real-world implementation successfully
> has been booting many of our daily machines for many months now.

Notwithstanding that I don't see how a unit test stress tests an API 
*design*, I've no reason to doubt that kdbus works. But that's not the 
point of my concern. I worry how usable this API is going to be for the 
world at large.

> Yes, the documentation can always be better, but please don't confuse
> the lack of understanding how D-Bus works and its model with the lack of
> understanding this kdbus implementation, the two are not comparable.
> For some good primers on what D-Bus is, and the terminology it expects
> see:
> 	http://dbus.freedesktop.org/doc/dbus-tutorial.html
> and also:
> 	http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc
> 
> We are not going to put a basic "here is what D-Bus is and how to use
> it" into the kernel tree, that is totally outside the scope here.

I didn't expect that you should do that. But it does touch on a general 
question that I'll leave to the end of this mail.

> I suggest reading the tutorial above, and then going back and reading
> the kdbus documentation provided.  If you think we are lacking stuff on
> the kdbus side, we will be glad to flush out any needed areas.
> 
> Also, Daniel has said he will work on a basic userspace "example"
> library to show how to use this api, which might make the api a bit
> easier to understand.
> 
> However, I personally don't think this "example code" is necessary at
> all.  We don't ask for this type of "simple examples" from other new
> kernel apis we create and add to the kernel all the time.   We require
> there to be a user of the api, but not one that is so well documented
> that others can write a from-scratch raw userspace replacement.

What you've just summarized there is how low a bar we've historically 
set in API design. Thus, the API is littered with (for example) dozens 
of system calls that were insufficiently well thought out in their 
design, and have subsequently been superseded by replacements that 
fixed the design mistakes. One of the cause of that problems is the
targeting of "*a* user of the API"--general purpose APIs need to be
considered from the point of view of multiple potentially different 
use cases. And I'm certainly not talking about being able to reimplement
the API, But, the API is a contract, and it needs to be well understood
by its creators and consumers in order that they can assess and use that
API. Extensive Documentation generally is the best way to do that.

And, anyway, I had understood that there was a rough consensus that we do
want to see more tests/examples and documentation happening in the future.
Certainly, the number kernel developers who are taking a shot at
writing man pages these days is refreshing. (We are 7/7 for man-page
documented system calls in the 3.17-3.19 frame. That's a trend I've done 
my best to encourage, and hope to see continue in the future.) And now
we have kselftest, in part thanks to your good efforts.

> Specific examples of this are my previously mentioned ioctl users
> (btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
> apis, DRM.

You've made this comparison a number of times, but I think it misses
a crucial point. Those examples are all(?) domain-specific APIs with
relatively few users in terms of user-space applications, whereas, 
IIUC, kdbus is intended to be an IPC mechanism that can be employed
by user-space applications in a general-purpose fashion, and upon 
which potentially a multitude of different applications might be built.
That's why I think the decision to use an ioctl()-based interface 
needs to be considered from a (very) skeptical point of view: no other 
general-purpose IPC mechanism employs such an approach. (Again, see
one of my general questions at the end of this mail.)

> Users aren't going to be writing their own "raw kdbus" libraries.  Or if
> they are, they are going to start with one of the existing
> implementations we have (the test examples and sd-bus, and I think there
> is a native Go implementation somewhere as well.)  Users are going to be
> using those libraries to write their code, and to be honest, the user
> api for sd-bus is a delight to use compared to the "old style" libdbus
> interface as we have the benefit of 10 years of experience working with
> D-Bus apis in the wild now to learn from past mistakes.
> 
> Back to the API.  We have taken review comments on the previous postings
> of the code and reworked the API, moving it from a character device to
> be a filesystem, which ended up making things a lot easier in the end, a
> good example of a review process that is working.  Those changes are
> a sign that our development review process works.  People pointed out
> problems in our character api that we hadn't thought about from the
> kernel implementation side.  And so we changed them and the code is
> better and more robust because of it, a success story for our review
> process.
> 
> Personally, I was the one that started down the character device node
> path, so blame that design decision on me, not the other developers
> here.  And I was wrong with that, but moving from character to a
> filesystem wasn't a huge change, the structures and interactions all
> remained almost identical, as the logic behind the API is, in my
> opinion, correct for the problem it is addressing.
> 
> The 37 different developers who have contributed to this code base are
> quite talented and skilled and experienced in user and kernel apis,
> having implemented many apis of their own that users rely on every day.

Yes, but I am not sure that the 15 developers who made each made 1 commit
(out of 2816 to date) would have done much work on the API. And probably 
the same is true for the 9 more who made just 2 or 3 commits. As one
would expect, the great deal of the good work has been done by a small
core: just shy of 95% commits by the top 5 committers.

> Yes, we all make design mistakes, and you might not agree with some of
> them, that's fine.  But it is flat out rude to say that we have not been
> thinking about this, when I would guess that this is one of the largest
> (in time and contributions) kernel development feature to be worked on
> in the past few years.
>
> And yes, I'm being very defensive, as I take this very seriously, so
> please, don't so lightly dismiss us as not knowing what we are doing, as
> frankly, we do.

Greg, I did not say you hadn't been thinking about this [API design].
(But I acknowledge my words could have been better chosen.) However,
API design is hard to get right, and causes endless pain when it's wrong.
And by now I've been watching long enough to know that the mistakes
are frequent. Even Davide Libenzi, who once upon a time was one of our 
more prolific and talented creators of APIs made mistakes. Thus, for
example, as we speak, the third iteration of epoll_wait() is in 
development (epoll_wait() ==> epoll_pwait() ==> epoll_pwait1()). 
And epoll is an API that is significantly simpler than kdbus.

In my observation, a good API design is a well documented API design.
Otherwise, it's virtually impossible to think thoroughly about the API,
and that is especially true as the API gets larger. (And AFAICS, the
kdbus API is bigger than, for example, the epoll API by an order of
magnitude, or so.) Part of that documentation also should include some
concrete examples of the use of the API. Again because it helps people
to think about and assess the API. (This is especially the case for
the fresh minds that explore the new API for the first time without 
having the preconceptions that are almost inevitable for the creators 
of the API.)

(BTW, I'm not ignoring your contents about the D-Bus spec above. But 
kdbus is a free-standing API, IIUC, and as such, it should be assessed 
on its own.)

I would summarize your statement above, as "trust us, we know what 
we're doing". With respect, my default position is not to trust. It's
nothing personal: API design is hard, and mistakes are too often made.
What I want to say in return is "trust us", where "us" is used very
inclusively to mean: the kernel maintainers, and for that matter 
user-space programmers, who need enough information that we can make a
well-informed assessment of the merits of an API that will need to be 
supported forever and may have a multitude of different users. In 
my assessment, the current information is far from sufficient, and 
it's a considerable risk to merge the API lacking such information.

> Thanks for making it this far, I'll go back to technical discussions of
> the API now, as that's what we should be doing, not casting aspirations
> as to the aptitude of the people involved.

Greg, that comment, with its implication that I am not concerned about
technical matters, but rather with something more malicious was quite 
uncalled for.

But, I am happy to return to technical matters. And I think it best to 
start with a couple of fundamental questions, since some of the comments
I've seen to date from different kdbus developers seem to conflict:

1. Is this intended to be a general purpose API that might see a 
   multitude of different users, or is it thought of as an API designed 
   to support a few specific users such as D-Bus and maybe a handful of 
   others? I had thought the former, but when you point me in the 
   direction of the D-Bus spec, I start to have doubts.

2. Is the API to be invoked directly by applications or is intended to
   be used only behind specific libraries? You seem to be saying that
   the latter is the case (here, I'm referring to your comment above 
   about sd-bus). However, when I asked David Herrmann a similar
   question I got this responser:

      "kdbus is in no way bound to systemd. There are ongoing efforts 
       to port glib and qt to kdbus natively. The API is pretty simple 
       and I don't see how a libkdbus would simplify things. In fact, 
       even our tests only have slim wrappers around the ioctls to 
       simplify error-handling in test-scenarios."

   To me, that implies that users will employ the raw kernel API.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 14:46                     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-26 14:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Austin S Hemmelgarn
  Cc: mtk.manpages, David Herrmann, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Theodore T'so, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Hello Greg,

On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>> While I agree that there should be a way for userspace to get the list of
>> supported operations, userspace apps will only actually care about that
>> once, when they begin talking to kdbus, because (ignoring the live kernel
>> patching that people have been working on recently) the list of supported
>> operations isn't going to change while the system is running.  While a u64
>> copy has relatively low overhead, it does have overhead, and that is very
>> significant when you consider part of the reason some people want kdbus is
>> for the performance gain.  Especially for those automotive applications that
>> have been mentioned which fire off thousands of messages during start-up,
>> every little bit of performance is significant.
> 
> A single u64 in a structure is not going to be measurable at all,
> processors just copy memory too fast these days for 4 extra bytes to be
> noticable.  

It depends on the definition of measurable, I suppose, but this statement
appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
sets of valid flags. That's 16 bytes, and it definitely makes a difference.
Simply running a loop that does a naive memcpy() in a tight user-space
loop (code below), I see the following for the execution of 1e9 loops:

    Including the two extra u64 fields: 3.2 sec
    Without the two extra u64 fields:   2.6 sec

On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
simplest syscall, giving us a rough measure of the context switch) takes
68 seconds. In other words, the cost of copying those 16 bytes is about 1%
of the base context switch/syscall cost. I assume the costs of copying 
those 16 bytes across the kernel-user-space boundary would not be cheaper, 
but have not tested that. If my assumption is correct, then 1% seems a
significant figure to me in an API whose raison d'être is speed.

> So let's make this as easy as possible for userspace, making
> it simpler logic there, which is much more important than saving
> theoretical time in the kernel.

But this also missed the other part of the point. Copying these fields on
every operation, when in fact they are only needed once, clutters the API,
in my opinion. Good APIs are as simple as they can be to do their job. 
Redundancy is an enemy of simplicity. Simplest would have been a one time 
API that returns a structure containing all of the supported flags across 
the API. Alternatively, the traditional EINVAL approach is well understood,
and suffices.

Thanks,

Michael

=========

#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct kdbus_msg_info {
    uint64_t offset;
    uint64_t msg_size;
    uint64_t return_flags;
};

struct kdbus_cmd_send {
        uint64_t size;
        uint64_t flags;
#if FIELDS >= 1
        uint64_t kernel_flags;
#endif
#if FIELDS >= 2
        uint64_t kernel_msg_flags;
#endif
        uint64_t return_flags;
        uint64_t msg_address;
        struct kdbus_msg_info reply;
        //struct kdbus_item items[0];
} __attribute__((aligned(8)));

int
main(int argc, char *argv[])
{
    long nloops, j;
    struct kdbus_cmd_send src, dst;
    memset(&dst, 0, sizeof(struct kdbus_cmd_send));

    printf("struct size: %zd\n", sizeof(struct kdbus_cmd_send));
    nloops = (argc > 1) ? atol(argv[1]) : 1000000000;

    for (j = 0; j < nloops; j++) {
        memcpy(&dst, &src, sizeof(struct kdbus_cmd_send));
    }

    exit(EXIT_SUCCESS);
}


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 14:46                     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-26 14:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Austin S Hemmelgarn
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, David Herrmann, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Theodore T'so, Andy Lutomirski, Linux API,
	linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hello Greg,

On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>> While I agree that there should be a way for userspace to get the list of
>> supported operations, userspace apps will only actually care about that
>> once, when they begin talking to kdbus, because (ignoring the live kernel
>> patching that people have been working on recently) the list of supported
>> operations isn't going to change while the system is running.  While a u64
>> copy has relatively low overhead, it does have overhead, and that is very
>> significant when you consider part of the reason some people want kdbus is
>> for the performance gain.  Especially for those automotive applications that
>> have been mentioned which fire off thousands of messages during start-up,
>> every little bit of performance is significant.
> 
> A single u64 in a structure is not going to be measurable at all,
> processors just copy memory too fast these days for 4 extra bytes to be
> noticable.  

It depends on the definition of measurable, I suppose, but this statement
appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
sets of valid flags. That's 16 bytes, and it definitely makes a difference.
Simply running a loop that does a naive memcpy() in a tight user-space
loop (code below), I see the following for the execution of 1e9 loops:

    Including the two extra u64 fields: 3.2 sec
    Without the two extra u64 fields:   2.6 sec

On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
simplest syscall, giving us a rough measure of the context switch) takes
68 seconds. In other words, the cost of copying those 16 bytes is about 1%
of the base context switch/syscall cost. I assume the costs of copying 
those 16 bytes across the kernel-user-space boundary would not be cheaper, 
but have not tested that. If my assumption is correct, then 1% seems a
significant figure to me in an API whose raison d'être is speed.

> So let's make this as easy as possible for userspace, making
> it simpler logic there, which is much more important than saving
> theoretical time in the kernel.

But this also missed the other part of the point. Copying these fields on
every operation, when in fact they are only needed once, clutters the API,
in my opinion. Good APIs are as simple as they can be to do their job. 
Redundancy is an enemy of simplicity. Simplest would have been a one time 
API that returns a structure containing all of the supported flags across 
the API. Alternatively, the traditional EINVAL approach is well understood,
and suffices.

Thanks,

Michael

=========

#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct kdbus_msg_info {
    uint64_t offset;
    uint64_t msg_size;
    uint64_t return_flags;
};

struct kdbus_cmd_send {
        uint64_t size;
        uint64_t flags;
#if FIELDS >= 1
        uint64_t kernel_flags;
#endif
#if FIELDS >= 2
        uint64_t kernel_msg_flags;
#endif
        uint64_t return_flags;
        uint64_t msg_address;
        struct kdbus_msg_info reply;
        //struct kdbus_item items[0];
} __attribute__((aligned(8)));

int
main(int argc, char *argv[])
{
    long nloops, j;
    struct kdbus_cmd_send src, dst;
    memset(&dst, 0, sizeof(struct kdbus_cmd_send));

    printf("struct size: %zd\n", sizeof(struct kdbus_cmd_send));
    nloops = (argc > 1) ? atol(argv[1]) : 1000000000;

    for (j = 0; j < nloops; j++) {
        memcpy(&dst, &src, sizeof(struct kdbus_cmd_send));
    }

    exit(EXIT_SUCCESS);
}


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-26 14:42                 ` Michael Kerrisk (man-pages)
  (?)
@ 2015-01-26 15:26                 ` Tom Gundersen
  2015-01-26 16:44                     ` christoph Hellwig
  2015-01-26 16:45                   ` Michael Kerrisk (man-pages)
  -1 siblings, 2 replies; 143+ messages in thread
From: Tom Gundersen @ 2015-01-26 15:26 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Greg Kroah-Hartman, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, David Herrmann, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hi Michael,

On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> 2. Is the API to be invoked directly by applications or is intended to
>    be used only behind specific libraries? You seem to be saying that
>    the latter is the case (here, I'm referring to your comment above
>    about sd-bus). However, when I asked David Herrmann a similar
>    question I got this responser:
>
>       "kdbus is in no way bound to systemd. There are ongoing efforts
>        to port glib and qt to kdbus natively. The API is pretty simple
>        and I don't see how a libkdbus would simplify things. In fact,
>        even our tests only have slim wrappers around the ioctls to
>        simplify error-handling in test-scenarios."
>
>    To me, that implies that users will employ the raw kernel API.

The way I read this is that there will (probably) be a handful of
users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
ell, and maybe a few others. However, third-party developers will not
know/care about the details of kdbus, they'll just be coding against
the dbus libraries as before (might be minor changes, but they
certainly won't need to know anything about the kernel API). Similarly
to how userspace developers now code against their libc of choice,
rather than use kernel syscalls directly.

HTH,

Tom

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 16:44                     ` christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: christoph Hellwig @ 2015-01-26 16:44 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, David Herrmann, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote:
> The way I read this is that there will (probably) be a handful of
> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
> ell, and maybe a few others. However, third-party developers will not
> know/care about the details of kdbus, they'll just be coding against
> the dbus libraries as before (might be minor changes, but they
> certainly won't need to know anything about the kernel API). Similarly
> to how userspace developers now code against their libc of choice,
> rather than use kernel syscalls directly.

Which means we do need proper man pages and detailed documentation for
it, just like syscalls for syscalls which just happened to be used by
a few libcs.  I suspect it really should be implemented as
syscalls anyway, but we can leave that argument aside from now.  Good
documentation certainly helps with making that decision in an educated
way.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-26 16:44                     ` christoph Hellwig
  0 siblings, 0 replies; 143+ messages in thread
From: christoph Hellwig @ 2015-01-26 16:44 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, David Herrmann, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

On Mon, Jan 26, 2015 at 04:26:53PM +0100, Tom Gundersen wrote:
> The way I read this is that there will (probably) be a handful of
> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
> ell, and maybe a few others. However, third-party developers will not
> know/care about the details of kdbus, they'll just be coding against
> the dbus libraries as before (might be minor changes, but they
> certainly won't need to know anything about the kernel API). Similarly
> to how userspace developers now code against their libc of choice,
> rather than use kernel syscalls directly.

Which means we do need proper man pages and detailed documentation for
it, just like syscalls for syscalls which just happened to be used by
a few libcs.  I suspect it really should be implemented as
syscalls anyway, but we can leave that argument aside from now.  Good
documentation certainly helps with making that decision in an educated
way.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-26 15:26                 ` Tom Gundersen
  2015-01-26 16:44                     ` christoph Hellwig
@ 2015-01-26 16:45                   ` Michael Kerrisk (man-pages)
  2015-01-27 15:23                     ` David Herrmann
  1 sibling, 1 reply; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-26 16:45 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: mtk.manpages, Greg Kroah-Hartman, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, David Herrmann, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

On 01/26/2015 04:26 PM, Tom Gundersen wrote:
> Hi Michael,
> 
> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> 2. Is the API to be invoked directly by applications or is intended to
>>    be used only behind specific libraries? You seem to be saying that
>>    the latter is the case (here, I'm referring to your comment above
>>    about sd-bus). However, when I asked David Herrmann a similar
>>    question I got this responser:
>>
>>       "kdbus is in no way bound to systemd. There are ongoing efforts
>>        to port glib and qt to kdbus natively. The API is pretty simple
>>        and I don't see how a libkdbus would simplify things. In fact,
>>        even our tests only have slim wrappers around the ioctls to
>>        simplify error-handling in test-scenarios."
>>
>>    To me, that implies that users will employ the raw kernel API.
> 
> The way I read this is that there will (probably) be a handful of
> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
> ell, and maybe a few others. However, third-party developers will not
> know/care about the details of kdbus, they'll just be coding against
> the dbus libraries as before (might be minor changes, but they
> certainly won't need to know anything about the kernel API). Similarly
> to how userspace developers now code against their libc of choice,
> rather than use kernel syscalls directly.

Thanks, Tom, for the input. I'm still confused though, since elsewhere
in this thread David Herrmann said in response to a question of mine:

    I think we can agree that we want it to be generically useful, 
    like other ipc mechanisms, including UDS and netlink.

Again, that sounds to me like the vision is not "a handful of users".
Hopefully Greg and David can clarify.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-26 21:32               ` One Thousand Gnomes
  0 siblings, 0 replies; 143+ messages in thread
From: One Thousand Gnomes @ 2015-01-26 21:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Johannes Stezenbach, arnd, ebiederm, teg, jkosina, luto,
	linux-api, linux-kernel, daniel, dh.herrmann, tixxdz

On Tue, 20 Jan 2015 09:13:59 +0800
> That's because people have not done anything really needing performance
> on the desktop over D-Bus in the past due to how slow the current
> implementation is. 

The desktop is a performance critical environment, even though certain
desktop developers think 2GB is a bit small to run an application and
5000 dbus transactions to start something is "ok"

Alan

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v3 00/13] Add kdbus implementation
@ 2015-01-26 21:32               ` One Thousand Gnomes
  0 siblings, 0 replies; 143+ messages in thread
From: One Thousand Gnomes @ 2015-01-26 21:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Johannes Stezenbach, arnd-r2nGTMty4D4,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A

On Tue, 20 Jan 2015 09:13:59 +0800
> That's because people have not done anything really needing performance
> on the desktop over D-Bus in the past due to how slow the current
> implementation is. 

The desktop is a performance critical environment, even though certain
desktop developers think 2GB is a bit small to run an application and
5000 dbus transactions to start something is "ok"

Alan

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 15:05                       ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-27 15:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Greg Kroah-Hartman, Austin S Hemmelgarn, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Theodore T'so, Andy Lutomirski, Linux API,
	linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi

On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hello Greg,
>
> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>> While I agree that there should be a way for userspace to get the list of
>>> supported operations, userspace apps will only actually care about that
>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>> patching that people have been working on recently) the list of supported
>>> operations isn't going to change while the system is running.  While a u64
>>> copy has relatively low overhead, it does have overhead, and that is very
>>> significant when you consider part of the reason some people want kdbus is
>>> for the performance gain.  Especially for those automotive applications that
>>> have been mentioned which fire off thousands of messages during start-up,
>>> every little bit of performance is significant.
>>
>> A single u64 in a structure is not going to be measurable at all,
>> processors just copy memory too fast these days for 4 extra bytes to be
>> noticable.
>
> It depends on the definition of measurable, I suppose, but this statement
> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
> Simply running a loop that does a naive memcpy() in a tight user-space
> loop (code below), I see the following for the execution of 1e9 loops:
>
>     Including the two extra u64 fields: 3.2 sec
>     Without the two extra u64 fields:   2.6 sec
>
> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
> simplest syscall, giving us a rough measure of the context switch) takes
> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
> of the base context switch/syscall cost. I assume the costs of copying
> those 16 bytes across the kernel-user-space boundary would not be cheaper,
> but have not tested that. If my assumption is correct, then 1% seems a
> significant figure to me in an API whose raison d'être is speed.

I have no idea how this is related to any kdbus ioctl?

A 16byte copy does not affect the performance of kdbus message
transactions in any way that matters.

>> So let's make this as easy as possible for userspace, making
>> it simpler logic there, which is much more important than saving
>> theoretical time in the kernel.
>
> But this also missed the other part of the point. Copying these fields on
> every operation, when in fact they are only needed once, clutters the API,
> in my opinion. Good APIs are as simple as they can be to do their job.
> Redundancy is an enemy of simplicity. Simplest would have been a one time
> API that returns a structure containing all of the supported flags across
> the API. Alternatively, the traditional EINVAL approach is well understood,
> and suffices.

We're going to drop "kernel_flags" in favor of a new
KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
negotiation for this ioctl and return the supported flags/items inline
(overwriting the passed data). The ioctl will not be executed and will
not affect the state of the FD.
I hope this keeps the API simple.

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 15:05                       ` David Herrmann
  0 siblings, 0 replies; 143+ messages in thread
From: David Herrmann @ 2015-01-27 15:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Greg Kroah-Hartman, Austin S Hemmelgarn, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Theodore T'so, Andy Lutomirski, Linux API,
	linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi

On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hello Greg,
>
> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>> While I agree that there should be a way for userspace to get the list of
>>> supported operations, userspace apps will only actually care about that
>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>> patching that people have been working on recently) the list of supported
>>> operations isn't going to change while the system is running.  While a u64
>>> copy has relatively low overhead, it does have overhead, and that is very
>>> significant when you consider part of the reason some people want kdbus is
>>> for the performance gain.  Especially for those automotive applications that
>>> have been mentioned which fire off thousands of messages during start-up,
>>> every little bit of performance is significant.
>>
>> A single u64 in a structure is not going to be measurable at all,
>> processors just copy memory too fast these days for 4 extra bytes to be
>> noticable.
>
> It depends on the definition of measurable, I suppose, but this statement
> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
> Simply running a loop that does a naive memcpy() in a tight user-space
> loop (code below), I see the following for the execution of 1e9 loops:
>
>     Including the two extra u64 fields: 3.2 sec
>     Without the two extra u64 fields:   2.6 sec
>
> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
> simplest syscall, giving us a rough measure of the context switch) takes
> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
> of the base context switch/syscall cost. I assume the costs of copying
> those 16 bytes across the kernel-user-space boundary would not be cheaper,
> but have not tested that. If my assumption is correct, then 1% seems a
> significant figure to me in an API whose raison d'être is speed.

I have no idea how this is related to any kdbus ioctl?

A 16byte copy does not affect the performance of kdbus message
transactions in any way that matters.

>> So let's make this as easy as possible for userspace, making
>> it simpler logic there, which is much more important than saving
>> theoretical time in the kernel.
>
> But this also missed the other part of the point. Copying these fields on
> every operation, when in fact they are only needed once, clutters the API,
> in my opinion. Good APIs are as simple as they can be to do their job.
> Redundancy is an enemy of simplicity. Simplest would have been a one time
> API that returns a structure containing all of the supported flags across
> the API. Alternatively, the traditional EINVAL approach is well understood,
> and suffices.

We're going to drop "kernel_flags" in favor of a new
KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
negotiation for this ioctl and return the supported flags/items inline
(overwriting the passed data). The ioctl will not be executed and will
not affect the state of the FD.
I hope this keeps the API simple.

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-26 16:45                   ` Michael Kerrisk (man-pages)
@ 2015-01-27 15:23                     ` David Herrmann
  2015-01-27 17:53                       ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 143+ messages in thread
From: David Herrmann @ 2015-01-27 15:23 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Tom Gundersen, Greg Kroah-Hartman, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hi

On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 01/26/2015 04:26 PM, Tom Gundersen wrote:
>> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> 2. Is the API to be invoked directly by applications or is intended to
>>>    be used only behind specific libraries? You seem to be saying that
>>>    the latter is the case (here, I'm referring to your comment above
>>>    about sd-bus). However, when I asked David Herrmann a similar
>>>    question I got this responser:
>>>
>>>       "kdbus is in no way bound to systemd. There are ongoing efforts
>>>        to port glib and qt to kdbus natively. The API is pretty simple
>>>        and I don't see how a libkdbus would simplify things. In fact,
>>>        even our tests only have slim wrappers around the ioctls to
>>>        simplify error-handling in test-scenarios."
>>>
>>>    To me, that implies that users will employ the raw kernel API.
>>
>> The way I read this is that there will (probably) be a handful of
>> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
>> ell, and maybe a few others. However, third-party developers will not
>> know/care about the details of kdbus, they'll just be coding against
>> the dbus libraries as before (might be minor changes, but they
>> certainly won't need to know anything about the kernel API). Similarly
>> to how userspace developers now code against their libc of choice,
>> rather than use kernel syscalls directly.
>
> Thanks, Tom, for the input. I'm still confused though, since elsewhere
> in this thread David Herrmann said in response to a question of mine:
>
>     I think we can agree that we want it to be generically useful,
>     like other ipc mechanisms, including UDS and netlink.
>
> Again, that sounds to me like the vision is not "a handful of users".
> Hopefully Greg and David can clarify.

I only expect a handful of users to call the ioctls directly. The
libraries that implement the payload-marshaling, in particular. It's a
similar situation with netlink.

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 16:03                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-01-27 16:03 UTC (permalink / raw)
  To: David Herrmann
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Austin S Hemmelgarn, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Theodore T'so, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> Hi
>
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>>> While I agree that there should be a way for userspace to get the list of
>>>> supported operations, userspace apps will only actually care about that
>>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>>> patching that people have been working on recently) the list of supported
>>>> operations isn't going to change while the system is running.  While a u64
>>>> copy has relatively low overhead, it does have overhead, and that is very
>>>> significant when you consider part of the reason some people want kdbus is
>>>> for the performance gain.  Especially for those automotive applications that
>>>> have been mentioned which fire off thousands of messages during start-up,
>>>> every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>>     Including the two extra u64 fields: 3.2 sec
>>     Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'être is speed.
>
> I have no idea how this is related to any kdbus ioctl?
>
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.
>

Sorry for jumping in so late.  Since this version of kdbus was sent,
I've been on vacation for part of the time and I had the flu for the
rest of the time.

What are the performance goals of kdbus?  How fast is it ever intended
to be?  The reason I ask is that, in the current design, kdbus
collects "metadata" (credentials and other identifying information,
collected in kdbus_meta_proc_collect) from the sender of every message
*at send time*. [1]  This is slow, and it will always be slow.  The
slowness of this operation will, in my personal system performance
crystal ball, overshadow the cost of a 16 byte copy by several orders
of magnitude.

[1] After much discussion last time around, I'm at least convinced
that the kdbus people have reasons to like the idea of capturing
metadata for each message.  I still think the design is wrong even
from a security standpoint, but right now I'm talking about
performance.  If you want the data plane to be fast, it should be
separated from the control plane as much as possible, and this design
is the opposite.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 16:03                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-01-27 16:03 UTC (permalink / raw)
  To: David Herrmann
  Cc: Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Austin S Hemmelgarn, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Theodore T'so, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi
>
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>>> While I agree that there should be a way for userspace to get the list of
>>>> supported operations, userspace apps will only actually care about that
>>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>>> patching that people have been working on recently) the list of supported
>>>> operations isn't going to change while the system is running.  While a u64
>>>> copy has relatively low overhead, it does have overhead, and that is very
>>>> significant when you consider part of the reason some people want kdbus is
>>>> for the performance gain.  Especially for those automotive applications that
>>>> have been mentioned which fire off thousands of messages during start-up,
>>>> every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>>     Including the two extra u64 fields: 3.2 sec
>>     Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'être is speed.
>
> I have no idea how this is related to any kdbus ioctl?
>
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.
>

Sorry for jumping in so late.  Since this version of kdbus was sent,
I've been on vacation for part of the time and I had the flu for the
rest of the time.

What are the performance goals of kdbus?  How fast is it ever intended
to be?  The reason I ask is that, in the current design, kdbus
collects "metadata" (credentials and other identifying information,
collected in kdbus_meta_proc_collect) from the sender of every message
*at send time*. [1]  This is slow, and it will always be slow.  The
slowness of this operation will, in my personal system performance
crystal ball, overshadow the cost of a 16 byte copy by several orders
of magnitude.

[1] After much discussion last time around, I'm at least convinced
that the kdbus people have reasons to like the idea of capturing
metadata for each message.  I still think the design is wrong even
from a security standpoint, but right now I'm talking about
performance.  If you want the data plane to be fast, it should be
separated from the control plane as much as possible, and this design
is the opposite.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-27 15:23                     ` David Herrmann
@ 2015-01-27 17:53                       ` Michael Kerrisk (man-pages)
  2015-01-27 18:14                           ` Daniel Mack
  0 siblings, 1 reply; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-27 17:53 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages, Tom Gundersen, Greg Kroah-Hartman, Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Jiri Kosina, Andy Lutomirski, Linux API, LKML, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

On 01/27/2015 04:23 PM, David Herrmann wrote:
> Hi
> 
> On Mon, Jan 26, 2015 at 5:45 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> On 01/26/2015 04:26 PM, Tom Gundersen wrote:
>>> On Mon, Jan 26, 2015 at 3:42 PM, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> 2. Is the API to be invoked directly by applications or is intended to
>>>>    be used only behind specific libraries? You seem to be saying that
>>>>    the latter is the case (here, I'm referring to your comment above
>>>>    about sd-bus). However, when I asked David Herrmann a similar
>>>>    question I got this responser:
>>>>
>>>>       "kdbus is in no way bound to systemd. There are ongoing efforts
>>>>        to port glib and qt to kdbus natively. The API is pretty simple
>>>>        and I don't see how a libkdbus would simplify things. In fact,
>>>>        even our tests only have slim wrappers around the ioctls to
>>>>        simplify error-handling in test-scenarios."
>>>>
>>>>    To me, that implies that users will employ the raw kernel API.
>>>
>>> The way I read this is that there will (probably) be a handful of
>>> users, namely the existing dbus libraries: libdus, sd-bus, glib, Qt,
>>> ell, and maybe a few others. However, third-party developers will not
>>> know/care about the details of kdbus, they'll just be coding against
>>> the dbus libraries as before (might be minor changes, but they
>>> certainly won't need to know anything about the kernel API). Similarly
>>> to how userspace developers now code against their libc of choice,
>>> rather than use kernel syscalls directly.
>>
>> Thanks, Tom, for the input. I'm still confused though, since elsewhere
>> in this thread David Herrmann said in response to a question of mine:
>>
>>     I think we can agree that we want it to be generically useful,
>>     like other ipc mechanisms, including UDS and netlink.
>>
>> Again, that sounds to me like the vision is not "a handful of users".
>> Hopefully Greg and David can clarify.
> 
> I only expect a handful of users to call the ioctls directly. The
> libraries that implement the payload-marshaling, in particular. It's a
> similar situation with netlink.

Thanks, David, for the clarification. I think it would have been helpful
to have that more clearly stated up front, especially as some comments 
in this thread, such as the above, could be interpreted to mean quite 
the opposite. Can I suggest that some text on this point be added to 
kdbus.txt?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 18:03                         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-27 18:03 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages, Greg Kroah-Hartman, Austin S Hemmelgarn,
	Daniel Mack, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Theodore T'so,
	Andy Lutomirski, Linux API, linux-kernel, Djalal Harouni,
	Johannes Stezenbach, Christoph Hellwig

Hi David,

On 01/27/2015 04:05 PM, David Herrmann wrote:
> Hi
> 
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>>> While I agree that there should be a way for userspace to get the list of
>>>> supported operations, userspace apps will only actually care about that
>>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>>> patching that people have been working on recently) the list of supported
>>>> operations isn't going to change while the system is running.  While a u64
>>>> copy has relatively low overhead, it does have overhead, and that is very
>>>> significant when you consider part of the reason some people want kdbus is
>>>> for the performance gain.  Especially for those automotive applications that
>>>> have been mentioned which fire off thousands of messages during start-up,
>>>> every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>>     Including the two extra u64 fields: 3.2 sec
>>     Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'être is speed.
> 
> I have no idea how this is related to any kdbus ioctl?
> 
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.

I'm not sure if it's related/significant or not, since I'm ignorant
of the performance figures for kdbus. I just got curious when Greg
stated that the cost of copying would not be noticeable. (I got curious 
also about my assumption, and did an experiment with a dummy system call
that throws bytes across the fence into user space. The cost of an
extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context 
switch cost.)

>>> So let's make this as easy as possible for userspace, making
>>> it simpler logic there, which is much more important than saving
>>> theoretical time in the kernel.
>>
>> But this also missed the other part of the point. Copying these fields on
>> every operation, when in fact they are only needed once, clutters the API,
>> in my opinion. Good APIs are as simple as they can be to do their job.
>> Redundancy is an enemy of simplicity. Simplest would have been a one time
>> API that returns a structure containing all of the supported flags across
>> the API. Alternatively, the traditional EINVAL approach is well understood,
>> and suffices.
> 
> We're going to drop "kernel_flags" in favor of a new
> KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
> negotiation for this ioctl and return the supported flags/items inline
> (overwriting the passed data). The ioctl will not be executed and will
> not affect the state of the FD.
> I hope this keeps the API simple.

Not sure I quite understand the details from your description, but I assume 
the it'll end up in the doc, and I'll try to take a look later.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 18:03                         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-27 18:03 UTC (permalink / raw)
  To: David Herrmann
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Greg Kroah-Hartman,
	Austin S Hemmelgarn, Daniel Mack, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Theodore T'so, Andy Lutomirski, Linux API, linux-kernel,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Hi David,

On 01/27/2015 04:05 PM, David Herrmann wrote:
> Hi
> 
> On Mon, Jan 26, 2015 at 3:46 PM, Michael Kerrisk (man-pages)
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hello Greg,
>>
>> On 01/23/2015 05:08 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
>>>> While I agree that there should be a way for userspace to get the list of
>>>> supported operations, userspace apps will only actually care about that
>>>> once, when they begin talking to kdbus, because (ignoring the live kernel
>>>> patching that people have been working on recently) the list of supported
>>>> operations isn't going to change while the system is running.  While a u64
>>>> copy has relatively low overhead, it does have overhead, and that is very
>>>> significant when you consider part of the reason some people want kdbus is
>>>> for the performance gain.  Especially for those automotive applications that
>>>> have been mentioned which fire off thousands of messages during start-up,
>>>> every little bit of performance is significant.
>>>
>>> A single u64 in a structure is not going to be measurable at all,
>>> processors just copy memory too fast these days for 4 extra bytes to be
>>> noticable.
>>
>> It depends on the definition of measurable, I suppose, but this statement
>> appears incorrect to me. In some cases (e.g., kdbus_msg_info) we're talking
>> about *two* u64 fields (kernel_gs, kernel_msg_flags) being used to pass back
>> sets of valid flags. That's 16 bytes, and it definitely makes a difference.
>> Simply running a loop that does a naive memcpy() in a tight user-space
>> loop (code below), I see the following for the execution of 1e9 loops:
>>
>>     Including the two extra u64 fields: 3.2 sec
>>     Without the two extra u64 fields:   2.6 sec
>>
>> On the same box, doing 1e9 calls to getppid() (i.e., pretty much the
>> simplest syscall, giving us a rough measure of the context switch) takes
>> 68 seconds. In other words, the cost of copying those 16 bytes is about 1%
>> of the base context switch/syscall cost. I assume the costs of copying
>> those 16 bytes across the kernel-user-space boundary would not be cheaper,
>> but have not tested that. If my assumption is correct, then 1% seems a
>> significant figure to me in an API whose raison d'être is speed.
> 
> I have no idea how this is related to any kdbus ioctl?
> 
> A 16byte copy does not affect the performance of kdbus message
> transactions in any way that matters.

I'm not sure if it's related/significant or not, since I'm ignorant
of the performance figures for kdbus. I just got curious when Greg
stated that the cost of copying would not be noticeable. (I got curious 
also about my assumption, and did an experiment with a dummy system call
that throws bytes across the fence into user space. The cost of an
extra 16 bytes (56 to 72 bytes) is about 3% of the base syscall/context 
switch cost.)

>>> So let's make this as easy as possible for userspace, making
>>> it simpler logic there, which is much more important than saving
>>> theoretical time in the kernel.
>>
>> But this also missed the other part of the point. Copying these fields on
>> every operation, when in fact they are only needed once, clutters the API,
>> in my opinion. Good APIs are as simple as they can be to do their job.
>> Redundancy is an enemy of simplicity. Simplest would have been a one time
>> API that returns a structure containing all of the supported flags across
>> the API. Alternatively, the traditional EINVAL approach is well understood,
>> and suffices.
> 
> We're going to drop "kernel_flags" in favor of a new
> KDBUS_FLAG_NEGOTIATE flag which asks the kernel to do feature
> negotiation for this ioctl and return the supported flags/items inline
> (overwriting the passed data). The ioctl will not be executed and will
> not affect the state of the FD.
> I hope this keeps the API simple.

Not sure I quite understand the details from your description, but I assume 
the it'll end up in the doc, and I'll try to take a look later.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 18:14                           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-27 18:14 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), David Herrmann
  Cc: Tom Gundersen, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hi Michael,

On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
> On 01/27/2015 04:23 PM, David Herrmann wrote:

>> I only expect a handful of users to call the ioctls directly. The
>> libraries that implement the payload-marshaling, in particular. It's a
>> similar situation with netlink.
> 
> Thanks, David, for the clarification. I think it would have been helpful
> to have that more clearly stated up front, especially as some comments 
> in this thread, such as the above, could be interpreted to mean quite 
> the opposite. Can I suggest that some text on this point be added to 
> kdbus.txt?

We're currently working on an a set of comprehensive man pages to
document all the commands in the API, along with every struct, enum etc.
We do that so that developers are able to actually understand every
detail of the API, even though most people - as David explained - will
not use that interface directly in the first place but let one of the
high-level libraries help them integrate D-Bus functionality into their
applications.

If you want, have a look at the upstream repository for a 	preliminary
version of the new docs.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-27 18:14                           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-27 18:14 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), David Herrmann
  Cc: Tom Gundersen, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hi Michael,

On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
> On 01/27/2015 04:23 PM, David Herrmann wrote:

>> I only expect a handful of users to call the ioctls directly. The
>> libraries that implement the payload-marshaling, in particular. It's a
>> similar situation with netlink.
> 
> Thanks, David, for the clarification. I think it would have been helpful
> to have that more clearly stated up front, especially as some comments 
> in this thread, such as the above, could be interpreted to mean quite 
> the opposite. Can I suggest that some text on this point be added to 
> kdbus.txt?

We're currently working on an a set of comprehensive man pages to
document all the commands in the API, along with every struct, enum etc.
We do that so that developers are able to actually understand every
detail of the API, even though most people - as David explained - will
not use that interface directly in the first place but let one of the
high-level libraries help them integrate D-Bus functionality into their
applications.

If you want, have a look at the upstream repository for a 	preliminary
version of the new docs.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-27 18:14                           ` Daniel Mack
  (?)
@ 2015-01-28 10:46                           ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 143+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-28 10:46 UTC (permalink / raw)
  To: Daniel Mack, David Herrmann
  Cc: mtk.manpages, Tom Gundersen, Greg Kroah-Hartman, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Jiri Kosina,
	Andy Lutomirski, Linux API, LKML, Djalal Harouni,
	Johannes Stezenbach, Theodore T'so, christoph Hellwig

Hello Daniel,

On 01/27/2015 07:14 PM, Daniel Mack wrote:
> Hi Michael,
> 
> On 01/27/2015 06:53 PM, Michael Kerrisk (man-pages) wrote:
>> On 01/27/2015 04:23 PM, David Herrmann wrote:
> 
>>> I only expect a handful of users to call the ioctls directly. The
>>> libraries that implement the payload-marshaling, in particular. It's a
>>> similar situation with netlink.
>>
>> Thanks, David, for the clarification. I think it would have been helpful
>> to have that more clearly stated up front, especially as some comments 
>> in this thread, such as the above, could be interpreted to mean quite 
>> the opposite. Can I suggest that some text on this point be added to 
>> kdbus.txt?
> 
> We're currently working on an a set of comprehensive man pages to
> document all the commands in the API, along with every struct, enum etc.
> We do that so that developers are able to actually understand every
> detail of the API, even though most people - as David explained - will
> not use that interface directly in the first place but let one of the
> high-level libraries help them integrate D-Bus functionality into their
> applications.

(I suggest that some text about this text go into the kdbus(7) page.)

> If you want, have a look at the upstream repository for a 	preliminary
> version of the new docs.

That's at https://code.google.com/p/d-bus/ , right? This looks like a 
good direction to go in. Thanks for tackling that.

I hope to take a longer look sometime soon, but a few general conventions
for man-pages that you might want to consider following:

* When listing errors, I think you should change your 
  language/formatting somewhat. Examples here from kdbus.endpoint.7:
   
  (1) The man page says

          RETURN VALUE
               On success, all mentioned ioctl commands return 0.

       Better to write this from a user-space point of view:

          RETURN VALUE
               On success, all mentioned ioctl commands return 0; on 
               error, -1 is returned, and errno is set to indicate 
               the error.

  (2) I would change the wording in the ERRORS sections from
       "may return the following errors"
       to
       "may fail with the following errors"

  (3) When listing the errors, drop the minus signs; that's not 
      what user-space sees. They see a positive value in errno.

  (4) The usual formatting convention for constants, including 
      error constants in man pages is boldface, rather than 
      underline/emphasis.

  (5) Insofar as it's possible, it would be good to make all
      pages format nicely within 80 columns. Some of the literal
      text and ASCII art could, I think, be narrowed.

Thanks,

Michael
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-29  8:53                           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-29  8:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Herrmann, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Austin S Hemmelgarn, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Theodore T'so, Linux API, linux-kernel, Djalal Harouni,
	Johannes Stezenbach, Christoph Hellwig

Hi Andy,

On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
> On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann <dh.herrmann@gmail.com> wrote:

>> A 16byte copy does not affect the performance of kdbus message
>> transactions in any way that matters.

> What are the performance goals of kdbus?  How fast is it ever intended
> to be?

One of the design goals of kdbus is to speed up a typical D-Bus message
turnaround. That is, to minimize the number of context switches it
currently takes to get a message across the bus, and to avoid
unnecessary extra payload copies.

Even though I'm sure there's still room for future improvement, the
benchmark test we provided in the kernel self-tests shows that basic
data transmission performance is roughly comparable to that of UDS for
smaller payloads. For payloads of bigger sizes (>128kb), kdbus is
actually faster due to its zero-copy mechanism.

> The reason I ask is that, in the current design, kdbus
> collects "metadata" (credentials and other identifying information,
> collected in kdbus_meta_proc_collect) from the sender of every message
> *at send time*. [1]  This is slow, and it will always be slow.  The
> slowness of this operation will, in my personal system performance
> crystal ball, overshadow the cost of a 16 byte copy by several orders
> of magnitude.

That's certainly true, but that's not a contradiction to the performance
argument. Please keep in mind that if a receiving peer does not request
any metadata, the kernel doesn't collect and attach any. We know that
gathering of some of the metadata comes at a price, which is why we
split up the information is such fine-grained pieces.

Also note that if a receiving peer opts in for a certain piece of
metadata, it should do that that for a good reason, because it needs
that data to process a request. Letting kdbus do the work of providing
such information is still a lot faster than having the receiving peer
gather it itself, as that would involve more syscalls and more context
switches (let alone the fact that doing so is inherently racy, as
explained in earlier threads).

So, yes, collecting metadata can slow down message exchange, but after
all, that's an optional feature that has to be used with sense. I'll add
some words on that to the man-pages.


HTH,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-01-29  8:53                           ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-01-29  8:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Herrmann, Michael Kerrisk (man-pages),
	Greg Kroah-Hartman, Austin S Hemmelgarn, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Theodore T'so, Linux API, linux-kernel, Djalal Harouni,
	Johannes Stezenbach, Christoph Hellwig

Hi Andy,

On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
> On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

>> A 16byte copy does not affect the performance of kdbus message
>> transactions in any way that matters.

> What are the performance goals of kdbus?  How fast is it ever intended
> to be?

One of the design goals of kdbus is to speed up a typical D-Bus message
turnaround. That is, to minimize the number of context switches it
currently takes to get a message across the bus, and to avoid
unnecessary extra payload copies.

Even though I'm sure there's still room for future improvement, the
benchmark test we provided in the kernel self-tests shows that basic
data transmission performance is roughly comparable to that of UDS for
smaller payloads. For payloads of bigger sizes (>128kb), kdbus is
actually faster due to its zero-copy mechanism.

> The reason I ask is that, in the current design, kdbus
> collects "metadata" (credentials and other identifying information,
> collected in kdbus_meta_proc_collect) from the sender of every message
> *at send time*. [1]  This is slow, and it will always be slow.  The
> slowness of this operation will, in my personal system performance
> crystal ball, overshadow the cost of a 16 byte copy by several orders
> of magnitude.

That's certainly true, but that's not a contradiction to the performance
argument. Please keep in mind that if a receiving peer does not request
any metadata, the kernel doesn't collect and attach any. We know that
gathering of some of the metadata comes at a price, which is why we
split up the information is such fine-grained pieces.

Also note that if a receiving peer opts in for a certain piece of
metadata, it should do that that for a good reason, because it needs
that data to process a request. Letting kdbus do the work of providing
such information is still a lot faster than having the receiving peer
gather it itself, as that would involve more syscalls and more context
switches (let alone the fact that doing so is inherently racy, as
explained in earlier threads).

So, yes, collecting metadata can slow down message exchange, but after
all, that's an optional feature that has to be used with sense. I'll add
some words on that to the man-pages.


HTH,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-29  8:53                           ` Daniel Mack
  (?)
@ 2015-01-29 11:25                           ` Andy Lutomirski
  2015-01-29 11:42                             ` Daniel Mack
  -1 siblings, 1 reply; 143+ messages in thread
From: Andy Lutomirski @ 2015-01-29 11:25 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Michael Kerrisk, Arnd Bergmann, Theodore T'so, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Jan 29, 2015 3:53 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
> Hi Andy,
>
> On 01/27/2015 05:03 PM, Andy Lutomirski wrote:
> > On Tue, Jan 27, 2015 at 7:05 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
>
> >> A 16byte copy does not affect the performance of kdbus message
> >> transactions in any way that matters.
>
> > What are the performance goals of kdbus?  How fast is it ever intended
> > to be?
>
> One of the design goals of kdbus is to speed up a typical D-Bus message
> turnaround. That is, to minimize the number of context switches it
> currently takes to get a message across the bus, and to avoid
> unnecessary extra payload copies.
>
> Even though I'm sure there's still room for future improvement, the
> benchmark test we provided in the kernel self-tests shows that basic
> data transmission performance is roughly comparable to that of UDS for
> smaller payloads. For payloads of bigger sizes (>128kb), kdbus is
> actually faster due to its zero-copy mechanism.
>
> > The reason I ask is that, in the current design, kdbus
> > collects "metadata" (credentials and other identifying information,
> > collected in kdbus_meta_proc_collect) from the sender of every message
> > *at send time*. [1]  This is slow, and it will always be slow.  The
> > slowness of this operation will, in my personal system performance
> > crystal ball, overshadow the cost of a 16 byte copy by several orders
> > of magnitude.
>
> That's certainly true, but that's not a contradiction to the performance
> argument. Please keep in mind that if a receiving peer does not request
> any metadata, the kernel doesn't collect and attach any. We know that
> gathering of some of the metadata comes at a price, which is why we
> split up the information is such fine-grained pieces.
>
> Also note that if a receiving peer opts in for a certain piece of
> metadata, it should do that that for a good reason, because it needs
> that data to process a request. Letting kdbus do the work of providing
> such information is still a lot faster than having the receiving peer
> gather it itself, as that would involve more syscalls and more context
> switches (let alone the fact that doing so is inherently racy, as
> explained in earlier threads).

All this is true, but if you used connect-time metadata, this would be
a non-issue.

Given that I see almost no advantage to send-time metadata, and I see
three disadvantages (slower, inconsistent with the basic POSIX model,
and inconsistent with existing user-space dbus), I still don't see why
you designed it this way.

There's an added disadvantage of the current design: if a kdbus user
is communicating with a traditional d-bus user using the proxy, then
IIUC the credentials at the time of connection get used.

In summary, the current design is (a) unlike almost everything else
that uses file descriptors, (b) much slower, (c) different from
traditional d-bus, and (d) gives inconsistent behavior to new clients
depending on what server they're connecting to.

--Andy

>
> So, yes, collecting metadata can slow down message exchange, but after
> all, that's an optional feature that has to be used with sense. I'll add
> some words on that to the man-pages.
>
>
> HTH,
> Daniel
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-29 11:25                           ` Andy Lutomirski
@ 2015-01-29 11:42                             ` Daniel Mack
  2015-01-29 12:09                               ` Andy Lutomirski
  0 siblings, 1 reply; 143+ messages in thread
From: Daniel Mack @ 2015-01-29 11:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Michael Kerrisk, Arnd Bergmann, Theodore T'so, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
> On Jan 29, 2015 3:53 AM, "Daniel Mack" <daniel@zonque.org> wrote:

>> Also note that if a receiving peer opts in for a certain piece of
>> metadata, it should do that that for a good reason, because it needs
>> that data to process a request. Letting kdbus do the work of providing
>> such information is still a lot faster than having the receiving peer
>> gather it itself, as that would involve more syscalls and more context
>> switches (let alone the fact that doing so is inherently racy, as
>> explained in earlier threads).

> Given that I see almost no advantage to send-time metadata, and I see
> three disadvantages (slower, inconsistent with the basic POSIX model,
> and inconsistent with existing user-space dbus), I still don't see why
> you designed it this way.

Because effective information about tasks may change over time, and
D-Bus is a connection-less protocol that has no notion of peer-to-peer
connections.

As we explained before, currently, D-Bus peers do collect the same
information already if they need to have them, but they have to do deal
with the inherit races in such cases. kdbus is closing the gap by
optionally providing the same information along with each message, if
requested.

> There's an added disadvantage of the current design: if a kdbus user
> is communicating with a traditional d-bus user using the proxy, then
> IIUC the credentials at the time of connection get used.

That's not quite true any more. After our discussion in v2, we agreed on
dropping this detail. If you're using the proxy, no metadata is attached
to messages any more. Userspace has to gather this information in the
traditional, racy way in such cases. You are right - metadata about the
proxy task is of no interest here, and hence dropping the information
altogether is the most consistent thing we can do.

But again - that metadata thing just an optional feature. People
developing with the bare kernel-level API are free to ignore all that
and just just kdbus as low-level protocol for reliable multicast. Note
that in such cases, you would still be able to retrieve the connect-time
metadata if that's needed.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-01-29 11:42                             ` Daniel Mack
@ 2015-01-29 12:09                               ` Andy Lutomirski
  2015-02-02  9:34                                   ` Daniel Mack
  0 siblings, 1 reply; 143+ messages in thread
From: Andy Lutomirski @ 2015-01-29 12:09 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Linux API, Michael Kerrisk,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, David Herrmann,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Jan 29, 2015 6:42 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
> On 01/29/2015 12:25 PM, Andy Lutomirski wrote:
> > On Jan 29, 2015 3:53 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
> >> Also note that if a receiving peer opts in for a certain piece of
> >> metadata, it should do that that for a good reason, because it needs
> >> that data to process a request. Letting kdbus do the work of providing
> >> such information is still a lot faster than having the receiving peer
> >> gather it itself, as that would involve more syscalls and more context
> >> switches (let alone the fact that doing so is inherently racy, as
> >> explained in earlier threads).
>
> > Given that I see almost no advantage to send-time metadata, and I see
> > three disadvantages (slower, inconsistent with the basic POSIX model,
> > and inconsistent with existing user-space dbus), I still don't see why
> > you designed it this way.
>
> Because effective information about tasks may change over time, and
> D-Bus is a connection-less protocol that has no notion of peer-to-peer
> connections.
>
> As we explained before, currently, D-Bus peers do collect the same
> information already if they need to have them, but they have to do deal
> with the inherit races in such cases. kdbus is closing the gap by
> optionally providing the same information along with each message, if
> requested.

In all these discussions, no one ever gave a decent example use case.
If a process drops some privilege, it must close all fds it has that
captured its old privilege.  This has nothing to do with kdbus.  With
kdbus, you still need to close and reopen your kdbus fd, unless you've
disabled that bit of metadata, so using send-time metadata hasn't
bought you benefit that I can see.

I agree that the design seems to have improved to a state of being at
least decent, but that doesn't mean that using send-time metadata is a
good idea for systemd or for anything else.

>
> > There's an added disadvantage of the current design: if a kdbus user
> > is communicating with a traditional d-bus user using the proxy, then
> > IIUC the credentials at the time of connection get used.
>
> That's not quite true any more. After our discussion in v2, we agreed on
> dropping this detail. If you're using the proxy, no metadata is attached
> to messages any more. Userspace has to gather this information in the
> traditional, racy way in such cases. You are right - metadata about the
> proxy task is of no interest here, and hence dropping the information
> altogether is the most consistent thing we can do.
>
> But again - that metadata thing just an optional feature. People
> developing with the bare kernel-level API are free to ignore all that
> and just just kdbus as low-level protocol for reliable multicast. Note
> that in such cases, you would still be able to retrieve the connect-time
> metadata if that's needed.
>

It's an optional feature that will get used, non-optionally, thousands
of times on each boot, apparently.  Keep in mind that it's also a
scalability problem because it takes locks.  If it ever gets used
thousands of times per CPU on a big thousand-core machine, it's going
to suck, and you'll have backed yourself into a corner.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-02  9:34                                   ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-02-02  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Ted Ts'o, Linux API, Michael Kerrisk,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, David Herrmann,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi Andy,

On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> On Jan 29, 2015 6:42 AM, "Daniel Mack" <daniel@zonque.org> wrote:

>> As we explained before, currently, D-Bus peers do collect the same
>> information already if they need to have them, but they have to do deal
>> with the inherit races in such cases. kdbus is closing the gap by
>> optionally providing the same information along with each message, if
>> requested.
> 
> In all these discussions, no one ever gave a decent example use case.
> If a process drops some privilege, it must close all fds it has that
> captured its old privilege.  This has nothing to do with kdbus.

kdbus does not implement any new concept here but sticks to what
SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
file-descriptor from socket() or socketpair() and freely pass it around
between different tasks or threads, but messages will always have the
credentials attached that are valid at *send* time. SO_PEERCREDS,
however, still reports the connect-time credentials, and kdbus provides
exactly the same semantics and both ways of retrieving information.

> I agree that the design seems to have improved to a state of being at
> least decent,

One reason for that is your feedback. Thanks for that again!

> It's an optional feature that will get used, non-optionally, thousands
> of times on each boot, apparently.  Keep in mind that it's also a
> scalability problem because it takes locks.  If it ever gets used
> thousands of times per CPU on a big thousand-core machine, it's going
> to suck, and you'll have backed yourself into a corner.

That's right, but again - if an application wants to gather this kind of
information about tasks it interacts with, it can do so today by looking
at /proc or similar sources. Desktop machines do exactly that already,
and the kernel code executed in such cases very much resembles that in
metadata.c, and is certainly not cheaper. kdbus just makes such
information more accessible when requested. Which information is
collected is defined by bit-masks on both the sender and the receiver
connection, and most applications will effectively only use a very
limited set by default if they go through one of the more high-level
libraries.

Also, when metadata is collected, the code mostly takes temporary
references on objects like PIDs, namespaces etc. Which operation would
you consider particularly expensive?


Thanks again,
Daniel


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-02  9:34                                   ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-02-02  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Ted Ts'o, Linux API, Michael Kerrisk,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, David Herrmann,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi Andy,

On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> On Jan 29, 2015 6:42 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:

>> As we explained before, currently, D-Bus peers do collect the same
>> information already if they need to have them, but they have to do deal
>> with the inherit races in such cases. kdbus is closing the gap by
>> optionally providing the same information along with each message, if
>> requested.
> 
> In all these discussions, no one ever gave a decent example use case.
> If a process drops some privilege, it must close all fds it has that
> captured its old privilege.  This has nothing to do with kdbus.

kdbus does not implement any new concept here but sticks to what
SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
file-descriptor from socket() or socketpair() and freely pass it around
between different tasks or threads, but messages will always have the
credentials attached that are valid at *send* time. SO_PEERCREDS,
however, still reports the connect-time credentials, and kdbus provides
exactly the same semantics and both ways of retrieving information.

> I agree that the design seems to have improved to a state of being at
> least decent,

One reason for that is your feedback. Thanks for that again!

> It's an optional feature that will get used, non-optionally, thousands
> of times on each boot, apparently.  Keep in mind that it's also a
> scalability problem because it takes locks.  If it ever gets used
> thousands of times per CPU on a big thousand-core machine, it's going
> to suck, and you'll have backed yourself into a corner.

That's right, but again - if an application wants to gather this kind of
information about tasks it interacts with, it can do so today by looking
at /proc or similar sources. Desktop machines do exactly that already,
and the kernel code executed in such cases very much resembles that in
metadata.c, and is certainly not cheaper. kdbus just makes such
information more accessible when requested. Which information is
collected is defined by bit-masks on both the sender and the receiver
connection, and most applications will effectively only use a very
limited set by default if they go through one of the more high-level
libraries.

Also, when metadata is collected, the code mostly takes temporary
references on objects like PIDs, namespaces etc. Which operation would
you consider particularly expensive?


Thanks again,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-02 20:12                                     ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-02 20:12 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
> Hi Andy,
>
> On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> > On Jan 29, 2015 6:42 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
> >> As we explained before, currently, D-Bus peers do collect the same
> >> information already if they need to have them, but they have to do deal
> >> with the inherit races in such cases. kdbus is closing the gap by
> >> optionally providing the same information along with each message, if
> >> requested.
> >
> > In all these discussions, no one ever gave a decent example use case.
> > If a process drops some privilege, it must close all fds it has that
> > captured its old privilege.  This has nothing to do with kdbus.
>
> kdbus does not implement any new concept here but sticks to what
> SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
> file-descriptor from socket() or socketpair() and freely pass it around
> between different tasks or threads, but messages will always have the
> credentials attached that are valid at *send* time. SO_PEERCREDS,
> however, still reports the connect-time credentials, and kdbus provides
> exactly the same semantics and both ways of retrieving information.
>
> > I agree that the design seems to have improved to a state of being at
> > least decent,
>
> One reason for that is your feedback. Thanks for that again!
>
> > It's an optional feature that will get used, non-optionally, thousands
> > of times on each boot, apparently.  Keep in mind that it's also a
> > scalability problem because it takes locks.  If it ever gets used
> > thousands of times per CPU on a big thousand-core machine, it's going
> > to suck, and you'll have backed yourself into a corner.
>
> That's right, but again - if an application wants to gather this kind of
> information about tasks it interacts with, it can do so today by looking
> at /proc or similar sources. Desktop machines do exactly that already,
> and the kernel code executed in such cases very much resembles that in
> metadata.c, and is certainly not cheaper. kdbus just makes such
> information more accessible when requested. Which information is
> collected is defined by bit-masks on both the sender and the receiver
> connection, and most applications will effectively only use a very
> limited set by default if they go through one of the more high-level
> libraries.

I should rephrase a bit.  Kdbus doesn't require use of send-time
metadata.  It does, however, strongly encourage it, and it sounds like
systemd and other major users will use send-time metadata.  Once that
happens, it's ABI (even if it's purely in userspace), and changing it
is asking for security holes to pop up.  So you'll be mostly stuck
with it.

>
> Also, when metadata is collected, the code mostly takes temporary
> references on objects like PIDs, namespaces etc. Which operation would
> you consider particularly expensive?

The refcounting, copies of some of the data, and counting bytes and
allocating space.  The refcounting is the part that will scale
particularly badly on many CPUs.

Do you have some simple benchmark code you can share?  I'd like to
play with it a bit.

--Andy

>
>
> Thanks again,
> Daniel
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-02 20:12                                     ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-02 20:12 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>
> Hi Andy,
>
> On 01/29/2015 01:09 PM, Andy Lutomirski wrote:
> > On Jan 29, 2015 6:42 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>
> >> As we explained before, currently, D-Bus peers do collect the same
> >> information already if they need to have them, but they have to do deal
> >> with the inherit races in such cases. kdbus is closing the gap by
> >> optionally providing the same information along with each message, if
> >> requested.
> >
> > In all these discussions, no one ever gave a decent example use case.
> > If a process drops some privilege, it must close all fds it has that
> > captured its old privilege.  This has nothing to do with kdbus.
>
> kdbus does not implement any new concept here but sticks to what
> SCM_CREDENTIALS does on SOL_SEQPACKET. An application can get a
> file-descriptor from socket() or socketpair() and freely pass it around
> between different tasks or threads, but messages will always have the
> credentials attached that are valid at *send* time. SO_PEERCREDS,
> however, still reports the connect-time credentials, and kdbus provides
> exactly the same semantics and both ways of retrieving information.
>
> > I agree that the design seems to have improved to a state of being at
> > least decent,
>
> One reason for that is your feedback. Thanks for that again!
>
> > It's an optional feature that will get used, non-optionally, thousands
> > of times on each boot, apparently.  Keep in mind that it's also a
> > scalability problem because it takes locks.  If it ever gets used
> > thousands of times per CPU on a big thousand-core machine, it's going
> > to suck, and you'll have backed yourself into a corner.
>
> That's right, but again - if an application wants to gather this kind of
> information about tasks it interacts with, it can do so today by looking
> at /proc or similar sources. Desktop machines do exactly that already,
> and the kernel code executed in such cases very much resembles that in
> metadata.c, and is certainly not cheaper. kdbus just makes such
> information more accessible when requested. Which information is
> collected is defined by bit-masks on both the sender and the receiver
> connection, and most applications will effectively only use a very
> limited set by default if they go through one of the more high-level
> libraries.

I should rephrase a bit.  Kdbus doesn't require use of send-time
metadata.  It does, however, strongly encourage it, and it sounds like
systemd and other major users will use send-time metadata.  Once that
happens, it's ABI (even if it's purely in userspace), and changing it
is asking for security holes to pop up.  So you'll be mostly stuck
with it.

>
> Also, when metadata is collected, the code mostly takes temporary
> references on objects like PIDs, namespaces etc. Which operation would
> you consider particularly expensive?

The refcounting, copies of some of the data, and counting bytes and
allocating space.  The refcounting is the part that will scale
particularly badly on many CPUs.

Do you have some simple benchmark code you can share?  I'd like to
play with it a bit.

--Andy

>
>
> Thanks again,
> Daniel
>

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-03 10:09                                       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-02-03 10:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi Andy,

On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:

>> That's right, but again - if an application wants to gather this kind of
>> information about tasks it interacts with, it can do so today by looking
>> at /proc or similar sources. Desktop machines do exactly that already,
>> and the kernel code executed in such cases very much resembles that in
>> metadata.c, and is certainly not cheaper. kdbus just makes such
>> information more accessible when requested. Which information is
>> collected is defined by bit-masks on both the sender and the receiver
>> connection, and most applications will effectively only use a very
>> limited set by default if they go through one of the more high-level
>> libraries.
> 
> I should rephrase a bit.  Kdbus doesn't require use of send-time
> metadata.  It does, however, strongly encourage it, and it sounds like

On the kernel level, kdbus just *offers* that, just like sockets offer
SO_PASSCRED. On the userland level, kdbus helps applications get that
information race-free, easier and faster than they would otherwise.

> systemd and other major users will use send-time metadata.  Once that
> happens, it's ABI (even if it's purely in userspace), and changing it
> is asking for security holes to pop up.  So you'll be mostly stuck
> with it.

We know we can't break the ABI. At most, we could deprecate item types
and introduce new ones, but we want to avoid that by all means of
course. However, I fail to see how that is related to send time
metadata, or even to kdbus in general, as all ABIs have to be kept stable.

> Do you have some simple benchmark code you can share?  I'd like to
> play with it a bit.

Sure, it's part of the self-test suite. Call it with "-t benchmark" to
run the benchmark as isolated test with verbose output. The code for
that lives in test-benchmark.c.


Thanks,
Daniel



^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-03 10:09                                       ` Daniel Mack
  0 siblings, 0 replies; 143+ messages in thread
From: Daniel Mack @ 2015-02-03 10:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi Andy,

On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:

>> That's right, but again - if an application wants to gather this kind of
>> information about tasks it interacts with, it can do so today by looking
>> at /proc or similar sources. Desktop machines do exactly that already,
>> and the kernel code executed in such cases very much resembles that in
>> metadata.c, and is certainly not cheaper. kdbus just makes such
>> information more accessible when requested. Which information is
>> collected is defined by bit-masks on both the sender and the receiver
>> connection, and most applications will effectively only use a very
>> limited set by default if they go through one of the more high-level
>> libraries.
> 
> I should rephrase a bit.  Kdbus doesn't require use of send-time
> metadata.  It does, however, strongly encourage it, and it sounds like

On the kernel level, kdbus just *offers* that, just like sockets offer
SO_PASSCRED. On the userland level, kdbus helps applications get that
information race-free, easier and faster than they would otherwise.

> systemd and other major users will use send-time metadata.  Once that
> happens, it's ABI (even if it's purely in userspace), and changing it
> is asking for security holes to pop up.  So you'll be mostly stuck
> with it.

We know we can't break the ABI. At most, we could deprecate item types
and introduce new ones, but we want to avoid that by all means of
course. However, I fail to see how that is related to send time
metadata, or even to kdbus in general, as all ABIs have to be kept stable.

> Do you have some simple benchmark code you can share?  I'd like to
> play with it a bit.

Sure, it's part of the self-test suite. Call it with "-t benchmark" to
run the benchmark as isolated test with verbose output. The code for
that lives in test-benchmark.c.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  0:41                                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-04  0:41 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel@zonque.org> wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.

I should have said it differently.  ABI is the wrong term -- it's more
of a protocol issue.

It looks like, with the current code, the kernel will provide
(optional) send-time metadata, and the sd-bus library will use it.
The result will be that the communication protocol between clients and
udev, systemd, systemd-logind, g-s-d, etc, will likely involve
send-time metadata.  This may end up being a bottleneck.

Once this happens, changing the protocol will be very hard without
introducing security bugs.  If people start switching to
connection-time metadata to gain performance, then they'll break both
the communication protocol and the expectations of client code.  (In
fact, it'll break twice, sort of, since I think that the current
protocols are connect-time.)

To me, this seems like a down-side of using send-time metadata, albeit
possibly not a huge downside at least in the near term.  I don't see a
corresponding benefit, though.

>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.
>

I'll try to play with this soon.  Thanks.

--Andy

>
> Thanks,
> Daniel
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  0:41                                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-04  0:41 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.

I should have said it differently.  ABI is the wrong term -- it's more
of a protocol issue.

It looks like, with the current code, the kernel will provide
(optional) send-time metadata, and the sd-bus library will use it.
The result will be that the communication protocol between clients and
udev, systemd, systemd-logind, g-s-d, etc, will likely involve
send-time metadata.  This may end up being a bottleneck.

Once this happens, changing the protocol will be very hard without
introducing security bugs.  If people start switching to
connection-time metadata to gain performance, then they'll break both
the communication protocol and the expectations of client code.  (In
fact, it'll break twice, sort of, since I think that the current
protocols are connect-time.)

To me, this seems like a down-side of using send-time metadata, albeit
possibly not a huge downside at least in the near term.  I don't see a
corresponding benefit, though.

>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.
>

I'll try to play with this soon.  Thanks.

--Andy

>
> Thanks,
> Daniel
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  2:47                                           ` Eric W. Biederman
  0 siblings, 0 replies; 143+ messages in thread
From: Eric W. Biederman @ 2015-02-04  2:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Arnd Bergmann, Ted Ts'o, Michael Kerrisk,
	Linux API, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Andy Lutomirski <luto@amacapital.net> writes:

> On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel@zonque.org> wrote:
>> Hi Andy,
>>
>> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>>
>>>> That's right, but again - if an application wants to gather this kind of
>>>> information about tasks it interacts with, it can do so today by looking
>>>> at /proc or similar sources. Desktop machines do exactly that already,
>>>> and the kernel code executed in such cases very much resembles that in
>>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>>> information more accessible when requested. Which information is
>>>> collected is defined by bit-masks on both the sender and the receiver
>>>> connection, and most applications will effectively only use a very
>>>> limited set by default if they go through one of the more high-level
>>>> libraries.
>>>
>>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>>> metadata.  It does, however, strongly encourage it, and it sounds like
>>
>> On the kernel level, kdbus just *offers* that, just like sockets offer
>> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> information race-free, easier and faster than they would otherwise.
>>
>>> systemd and other major users will use send-time metadata.  Once that
>>> happens, it's ABI (even if it's purely in userspace), and changing it
>>> is asking for security holes to pop up.  So you'll be mostly stuck
>>> with it.
>>
>> We know we can't break the ABI. At most, we could deprecate item types
>> and introduce new ones, but we want to avoid that by all means of
>> course. However, I fail to see how that is related to send time
>> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
> I should have said it differently.  ABI is the wrong term -- it's more
> of a protocol issue.
>
> It looks like, with the current code, the kernel will provide
> (optional) send-time metadata, and the sd-bus library will use it.
> The result will be that the communication protocol between clients and
> udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> send-time metadata.  This may end up being a bottleneck.

A quick note on a couple of things I have seen in this conversation.

- The reason for kdbus is performance.

- pipes rather than unix domain sockets are likely the standard to meet.
  If you can't equal unix domain sockets for simple things you are
  likely leaving a lot of stops in.  Last I looked pipes in general were
  notiably faster than unix domain sockets.

  The performance numbers I saw posted up-thread were horrible.  I have
  seen faster numbers across a network of machines.  If your ping-pong
  latency isn't measured in nano-seconds you are probably doing
  something wrong.

- syscalls remove overhead.  So since performance is kdbus's reason for existence
  let's remove some ridiculous stops, and get a fast path into the kernel.

- send-time metadata is a performance nightmare.  SO_PASSCRED is hard
  to implement in a fast performant way, especially when namespaces
  get involved.  Over the long term if you use send-time metadata
  you will grow the kind of compatibility hacks that the user
  namespace and the pid namespace have on SO_PASSCRED and things will
  slow down.

  A similar effect that is more performant in general is to enforce that
  the sender has the expected attributes.

> Once this happens, changing the protocol will be very hard without
> introducing security bugs.  If people start switching to
> connection-time metadata to gain performance, then they'll break both
> the communication protocol and the expectations of client code.  (In
> fact, it'll break twice, sort of, since I think that the current
> protocols are connect-time.)
>
> To me, this seems like a down-side of using send-time metadata, albeit
> possibly not a huge downside at least in the near term.  I don't see a
> corresponding benefit, though.

I think send-time metadata verification is less bad in this regard.

Eric

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  2:47                                           ` Eric W. Biederman
  0 siblings, 0 replies; 143+ messages in thread
From: Eric W. Biederman @ 2015-02-04  2:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Arnd Bergmann, Ted Ts'o, Michael Kerrisk,
	Linux API, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>> Hi Andy,
>>
>> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>>
>>>> That's right, but again - if an application wants to gather this kind of
>>>> information about tasks it interacts with, it can do so today by looking
>>>> at /proc or similar sources. Desktop machines do exactly that already,
>>>> and the kernel code executed in such cases very much resembles that in
>>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>>> information more accessible when requested. Which information is
>>>> collected is defined by bit-masks on both the sender and the receiver
>>>> connection, and most applications will effectively only use a very
>>>> limited set by default if they go through one of the more high-level
>>>> libraries.
>>>
>>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>>> metadata.  It does, however, strongly encourage it, and it sounds like
>>
>> On the kernel level, kdbus just *offers* that, just like sockets offer
>> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> information race-free, easier and faster than they would otherwise.
>>
>>> systemd and other major users will use send-time metadata.  Once that
>>> happens, it's ABI (even if it's purely in userspace), and changing it
>>> is asking for security holes to pop up.  So you'll be mostly stuck
>>> with it.
>>
>> We know we can't break the ABI. At most, we could deprecate item types
>> and introduce new ones, but we want to avoid that by all means of
>> course. However, I fail to see how that is related to send time
>> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
> I should have said it differently.  ABI is the wrong term -- it's more
> of a protocol issue.
>
> It looks like, with the current code, the kernel will provide
> (optional) send-time metadata, and the sd-bus library will use it.
> The result will be that the communication protocol between clients and
> udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> send-time metadata.  This may end up being a bottleneck.

A quick note on a couple of things I have seen in this conversation.

- The reason for kdbus is performance.

- pipes rather than unix domain sockets are likely the standard to meet.
  If you can't equal unix domain sockets for simple things you are
  likely leaving a lot of stops in.  Last I looked pipes in general were
  notiably faster than unix domain sockets.

  The performance numbers I saw posted up-thread were horrible.  I have
  seen faster numbers across a network of machines.  If your ping-pong
  latency isn't measured in nano-seconds you are probably doing
  something wrong.

- syscalls remove overhead.  So since performance is kdbus's reason for existence
  let's remove some ridiculous stops, and get a fast path into the kernel.

- send-time metadata is a performance nightmare.  SO_PASSCRED is hard
  to implement in a fast performant way, especially when namespaces
  get involved.  Over the long term if you use send-time metadata
  you will grow the kind of compatibility hacks that the user
  namespace and the pid namespace have on SO_PASSCRED and things will
  slow down.

  A similar effect that is more performant in general is to enforce that
  the sender has the expected attributes.

> Once this happens, changing the protocol will be very hard without
> introducing security bugs.  If people start switching to
> connection-time metadata to gain performance, then they'll break both
> the communication protocol and the expectations of client code.  (In
> fact, it'll break twice, sort of, since I think that the current
> protocols are connect-time.)
>
> To me, this seems like a down-side of using send-time metadata, albeit
> possibly not a huge downside at least in the near term.  I don't see a
> corresponding benefit, though.

I think send-time metadata verification is less bad in this regard.

Eric

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  3:14                                             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-02-04  3:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Daniel Mack, Arnd Bergmann, Ted Ts'o,
	Michael Kerrisk, Linux API, One Thousand Gnomes,
	Austin S Hemmelgarn, Tom Gundersen, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> 
> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel@zonque.org> wrote:
> >> Hi Andy,
> >>
> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
> >>
> >>>> That's right, but again - if an application wants to gather this kind of
> >>>> information about tasks it interacts with, it can do so today by looking
> >>>> at /proc or similar sources. Desktop machines do exactly that already,
> >>>> and the kernel code executed in such cases very much resembles that in
> >>>> metadata.c, and is certainly not cheaper. kdbus just makes such
> >>>> information more accessible when requested. Which information is
> >>>> collected is defined by bit-masks on both the sender and the receiver
> >>>> connection, and most applications will effectively only use a very
> >>>> limited set by default if they go through one of the more high-level
> >>>> libraries.
> >>>
> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
> >>> metadata.  It does, however, strongly encourage it, and it sounds like
> >>
> >> On the kernel level, kdbus just *offers* that, just like sockets offer
> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
> >> information race-free, easier and faster than they would otherwise.
> >>
> >>> systemd and other major users will use send-time metadata.  Once that
> >>> happens, it's ABI (even if it's purely in userspace), and changing it
> >>> is asking for security holes to pop up.  So you'll be mostly stuck
> >>> with it.
> >>
> >> We know we can't break the ABI. At most, we could deprecate item types
> >> and introduce new ones, but we want to avoid that by all means of
> >> course. However, I fail to see how that is related to send time
> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
> >
> > I should have said it differently.  ABI is the wrong term -- it's more
> > of a protocol issue.
> >
> > It looks like, with the current code, the kernel will provide
> > (optional) send-time metadata, and the sd-bus library will use it.
> > The result will be that the communication protocol between clients and
> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> > send-time metadata.  This may end up being a bottleneck.
> 
> A quick note on a couple of things I have seen in this conversation.
> 
> - The reason for kdbus is performance.

No, that's not the only reason for kdbus, don't focus only on this.  I
set out a long list of things for why we created kdbus, speed was only
one of the things.  Security is also one, and the ability to gather
these attributes in an atomic and secure way is very important as
userspace wants this.

> - pipes rather than unix domain sockets are likely the standard to meet.
>   If you can't equal unix domain sockets for simple things you are
>   likely leaving a lot of stops in.  Last I looked pipes in general were
>   notiably faster than unix domain sockets.
> 
>   The performance numbers I saw posted up-thread were horrible.  I have
>   seen faster numbers across a network of machines.  If your ping-pong
>   latency isn't measured in nano-seconds you are probably doing
>   something wrong.

It all depends on what you are passing on that "ping-pong", a real
D-Bus connection has real data and meta data that has to be sent.
Trying to make a fake benchmark number isn't going to show anything.

> - syscalls remove overhead.  So since performance is kdbus's reason for existence
>   let's remove some ridiculous stops, and get a fast path into the kernel.

Again, not the only reason, see my first post in this thread for
details.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  3:14                                             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 143+ messages in thread
From: Greg Kroah-Hartman @ 2015-02-04  3:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Daniel Mack, Arnd Bergmann, Ted Ts'o,
	Michael Kerrisk, Linux API, One Thousand Gnomes,
	Austin S Hemmelgarn, Tom Gundersen, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
> 
> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> >> Hi Andy,
> >>
> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> >>
> >>>> That's right, but again - if an application wants to gather this kind of
> >>>> information about tasks it interacts with, it can do so today by looking
> >>>> at /proc or similar sources. Desktop machines do exactly that already,
> >>>> and the kernel code executed in such cases very much resembles that in
> >>>> metadata.c, and is certainly not cheaper. kdbus just makes such
> >>>> information more accessible when requested. Which information is
> >>>> collected is defined by bit-masks on both the sender and the receiver
> >>>> connection, and most applications will effectively only use a very
> >>>> limited set by default if they go through one of the more high-level
> >>>> libraries.
> >>>
> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
> >>> metadata.  It does, however, strongly encourage it, and it sounds like
> >>
> >> On the kernel level, kdbus just *offers* that, just like sockets offer
> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
> >> information race-free, easier and faster than they would otherwise.
> >>
> >>> systemd and other major users will use send-time metadata.  Once that
> >>> happens, it's ABI (even if it's purely in userspace), and changing it
> >>> is asking for security holes to pop up.  So you'll be mostly stuck
> >>> with it.
> >>
> >> We know we can't break the ABI. At most, we could deprecate item types
> >> and introduce new ones, but we want to avoid that by all means of
> >> course. However, I fail to see how that is related to send time
> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
> >
> > I should have said it differently.  ABI is the wrong term -- it's more
> > of a protocol issue.
> >
> > It looks like, with the current code, the kernel will provide
> > (optional) send-time metadata, and the sd-bus library will use it.
> > The result will be that the communication protocol between clients and
> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
> > send-time metadata.  This may end up being a bottleneck.
> 
> A quick note on a couple of things I have seen in this conversation.
> 
> - The reason for kdbus is performance.

No, that's not the only reason for kdbus, don't focus only on this.  I
set out a long list of things for why we created kdbus, speed was only
one of the things.  Security is also one, and the ability to gather
these attributes in an atomic and secure way is very important as
userspace wants this.

> - pipes rather than unix domain sockets are likely the standard to meet.
>   If you can't equal unix domain sockets for simple things you are
>   likely leaving a lot of stops in.  Last I looked pipes in general were
>   notiably faster than unix domain sockets.
> 
>   The performance numbers I saw posted up-thread were horrible.  I have
>   seen faster numbers across a network of machines.  If your ping-pong
>   latency isn't measured in nano-seconds you are probably doing
>   something wrong.

It all depends on what you are passing on that "ping-pong", a real
D-Bus connection has real data and meta data that has to be sent.
Trying to make a fake benchmark number isn't going to show anything.

> - syscalls remove overhead.  So since performance is kdbus's reason for existence
>   let's remove some ridiculous stops, and get a fast path into the kernel.

Again, not the only reason, see my first post in this thread for
details.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  6:30                                               ` Eric W. Biederman
  0 siblings, 0 replies; 143+ messages in thread
From: Eric W. Biederman @ 2015-02-04  6:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Daniel Mack, Arnd Bergmann, Ted Ts'o,
	Michael Kerrisk, Linux API, One Thousand Gnomes,
	Austin S Hemmelgarn, Tom Gundersen, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

> On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>> 
>> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel@zonque.org> wrote:
>> >> Hi Andy,
>> >>
>> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>> >>
>> >>>> That's right, but again - if an application wants to gather this kind of
>> >>>> information about tasks it interacts with, it can do so today by looking
>> >>>> at /proc or similar sources. Desktop machines do exactly that already,
>> >>>> and the kernel code executed in such cases very much resembles that in
>> >>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>> >>>> information more accessible when requested. Which information is
>> >>>> collected is defined by bit-masks on both the sender and the receiver
>> >>>> connection, and most applications will effectively only use a very
>> >>>> limited set by default if they go through one of the more high-level
>> >>>> libraries.
>> >>>
>> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> >>> metadata.  It does, however, strongly encourage it, and it sounds like
>> >>
>> >> On the kernel level, kdbus just *offers* that, just like sockets offer
>> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> >> information race-free, easier and faster than they would otherwise.
>> >>
>> >>> systemd and other major users will use send-time metadata.  Once that
>> >>> happens, it's ABI (even if it's purely in userspace), and changing it
>> >>> is asking for security holes to pop up.  So you'll be mostly stuck
>> >>> with it.
>> >>
>> >> We know we can't break the ABI. At most, we could deprecate item types
>> >> and introduce new ones, but we want to avoid that by all means of
>> >> course. However, I fail to see how that is related to send time
>> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>> >
>> > I should have said it differently.  ABI is the wrong term -- it's more
>> > of a protocol issue.
>> >
>> > It looks like, with the current code, the kernel will provide
>> > (optional) send-time metadata, and the sd-bus library will use it.
>> > The result will be that the communication protocol between clients and
>> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
>> > send-time metadata.  This may end up being a bottleneck.
>> 
>> A quick note on a couple of things I have seen in this conversation.
>> 
>> - The reason for kdbus is performance.
>
> No, that's not the only reason for kdbus, don't focus only on this.  I
> set out a long list of things for why we created kdbus, speed was only
> one of the things.  Security is also one, and the ability to gather
> these attributes in an atomic and secure way is very important as
> userspace wants this.

Perhaps I should have said the predominant reason.  Certainly that seems
to be most of what I have seen talked about.

Regardless looking at the performance in the design and removing any
substantial obstacle to making things go fast.

Further.  I had this conversation earlier in an earlier round of the
review and I was told that in fact existing dbus applications do not
want or need these attributes.   I think I heard journald wants them for
pretty printing things.

If security is your concern I really think per message attributes
collected and sent when a message is sent is a bad idea.  It has been a
nasty anti-pattern in the kernel code.  Lots and lots of meta-data
copyed from a task and sent to someone else has significant performance,
maintenance, and security impacts.

Code written in that pattern is complex and hard to analyze, and hard to
think about.  Consider debugging why a message does not get the expected
treatment from your suid application because someone changed the euid
over that particular call and had not thought about it's consequences.
Frankly I have been there and done that and it is a mess. 

So no I do not think breaking encapsulation and having weird side
effects affecting your new primitive will have any security benefits
whatsover.   It will just result in brittle complex code.

If you want to avoid the races causing sends through a file descriptor
to fail that don't have the expected attributes (my constructive
suggestion earlier) is a very different thing from a performance and
mainteance standpoint.  That does not increase the code complexity
nearly as much in the implementation or in use, and unexpected failures
happen right away.

>> - pipes rather than unix domain sockets are likely the standard to meet.
>>   If you can't equal unix domain sockets for simple things you are
>>   likely leaving a lot of stops in.  Last I looked pipes in general were
>>   notiably faster than unix domain sockets.
>> 
>>   The performance numbers I saw posted up-thread were horrible.  I have
>>   seen faster numbers across a network of machines.  If your ping-pong
>>   latency isn't measured in nano-seconds you are probably doing
>>   something wrong.
>
> It all depends on what you are passing on that "ping-pong", a real
> D-Bus connection has real data and meta data that has to be sent.
> Trying to make a fake benchmark number isn't going to show anything.

All that I was intending to convey is that the numbers I have seen have
been orders of magnitude slower than I would expect.  And 10x to 100x
slower than the code should be is a reason to ask why.

In my experience being efficient with small messages are important
because (a) they are the hardest to make go fast (b) they are surprising
common.  Remote X application start-up times are very slow because of
these.

People have a distressing habit of writing applications that
send a small message and synchronously waits for it.  Over time these
small ipc calls build up and you are limited by  how fast they will go.

>> - syscalls remove overhead.  So since performance is kdbus's reason for existence
>>   let's remove some ridiculous stops, and get a fast path into the kernel.
>
> Again, not the only reason, see my first post in this thread for
> details.

But performance is important, and performance is a good reason to use
system calls.

Security is another reason to have real system calls, as there is less
going on (compared to an ioctl multiplexer) so the code is easier to
audit.

Eric

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04  6:30                                               ` Eric W. Biederman
  0 siblings, 0 replies; 143+ messages in thread
From: Eric W. Biederman @ 2015-02-04  6:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Daniel Mack, Arnd Bergmann, Ted Ts'o,
	Michael Kerrisk, Linux API, One Thousand Gnomes,
	Austin S Hemmelgarn, Tom Gundersen, linux-kernel, David Herrmann,
	Djalal Harouni, Johannes Stezenbach, Christoph Hellwig

Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> writes:

> On Tue, Feb 03, 2015 at 08:47:51PM -0600, Eric W. Biederman wrote:
>> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>> 
>> > On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>> >> Hi Andy,
>> >>
>> >> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> >>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>> >>
>> >>>> That's right, but again - if an application wants to gather this kind of
>> >>>> information about tasks it interacts with, it can do so today by looking
>> >>>> at /proc or similar sources. Desktop machines do exactly that already,
>> >>>> and the kernel code executed in such cases very much resembles that in
>> >>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>> >>>> information more accessible when requested. Which information is
>> >>>> collected is defined by bit-masks on both the sender and the receiver
>> >>>> connection, and most applications will effectively only use a very
>> >>>> limited set by default if they go through one of the more high-level
>> >>>> libraries.
>> >>>
>> >>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> >>> metadata.  It does, however, strongly encourage it, and it sounds like
>> >>
>> >> On the kernel level, kdbus just *offers* that, just like sockets offer
>> >> SO_PASSCRED. On the userland level, kdbus helps applications get that
>> >> information race-free, easier and faster than they would otherwise.
>> >>
>> >>> systemd and other major users will use send-time metadata.  Once that
>> >>> happens, it's ABI (even if it's purely in userspace), and changing it
>> >>> is asking for security holes to pop up.  So you'll be mostly stuck
>> >>> with it.
>> >>
>> >> We know we can't break the ABI. At most, we could deprecate item types
>> >> and introduce new ones, but we want to avoid that by all means of
>> >> course. However, I fail to see how that is related to send time
>> >> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>> >
>> > I should have said it differently.  ABI is the wrong term -- it's more
>> > of a protocol issue.
>> >
>> > It looks like, with the current code, the kernel will provide
>> > (optional) send-time metadata, and the sd-bus library will use it.
>> > The result will be that the communication protocol between clients and
>> > udev, systemd, systemd-logind, g-s-d, etc, will likely involve
>> > send-time metadata.  This may end up being a bottleneck.
>> 
>> A quick note on a couple of things I have seen in this conversation.
>> 
>> - The reason for kdbus is performance.
>
> No, that's not the only reason for kdbus, don't focus only on this.  I
> set out a long list of things for why we created kdbus, speed was only
> one of the things.  Security is also one, and the ability to gather
> these attributes in an atomic and secure way is very important as
> userspace wants this.

Perhaps I should have said the predominant reason.  Certainly that seems
to be most of what I have seen talked about.

Regardless looking at the performance in the design and removing any
substantial obstacle to making things go fast.

Further.  I had this conversation earlier in an earlier round of the
review and I was told that in fact existing dbus applications do not
want or need these attributes.   I think I heard journald wants them for
pretty printing things.

If security is your concern I really think per message attributes
collected and sent when a message is sent is a bad idea.  It has been a
nasty anti-pattern in the kernel code.  Lots and lots of meta-data
copyed from a task and sent to someone else has significant performance,
maintenance, and security impacts.

Code written in that pattern is complex and hard to analyze, and hard to
think about.  Consider debugging why a message does not get the expected
treatment from your suid application because someone changed the euid
over that particular call and had not thought about it's consequences.
Frankly I have been there and done that and it is a mess. 

So no I do not think breaking encapsulation and having weird side
effects affecting your new primitive will have any security benefits
whatsover.   It will just result in brittle complex code.

If you want to avoid the races causing sends through a file descriptor
to fail that don't have the expected attributes (my constructive
suggestion earlier) is a very different thing from a performance and
mainteance standpoint.  That does not increase the code complexity
nearly as much in the implementation or in use, and unexpected failures
happen right away.

>> - pipes rather than unix domain sockets are likely the standard to meet.
>>   If you can't equal unix domain sockets for simple things you are
>>   likely leaving a lot of stops in.  Last I looked pipes in general were
>>   notiably faster than unix domain sockets.
>> 
>>   The performance numbers I saw posted up-thread were horrible.  I have
>>   seen faster numbers across a network of machines.  If your ping-pong
>>   latency isn't measured in nano-seconds you are probably doing
>>   something wrong.
>
> It all depends on what you are passing on that "ping-pong", a real
> D-Bus connection has real data and meta data that has to be sent.
> Trying to make a fake benchmark number isn't going to show anything.

All that I was intending to convey is that the numbers I have seen have
been orders of magnitude slower than I would expect.  And 10x to 100x
slower than the code should be is a reason to ask why.

In my experience being efficient with small messages are important
because (a) they are the hardest to make go fast (b) they are surprising
common.  Remote X application start-up times are very slow because of
these.

People have a distressing habit of writing applications that
send a small message and synchronously waits for it.  Over time these
small ipc calls build up and you are limited by  how fast they will go.

>> - syscalls remove overhead.  So since performance is kdbus's reason for existence
>>   let's remove some ridiculous stops, and get a fast path into the kernel.
>
> Again, not the only reason, see my first post in this thread for
> details.

But performance is important, and performance is a good reason to use
system calls.

Security is another reason to have real system calls, as there is less
going on (compared to an ioctl multiplexer) so the code is easier to
audit.

Eric

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04 23:03                                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-04 23:03 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel@zonque.org> wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel@zonque.org> wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.

I see "latencies" of around 20 microseconds with lockdep and context
tracking off.  For example:

stats  (UNIX): 226730 packets processed, latency (nsecs) min/max/avg
 3845 //   34828 //    4069
stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg
19123 //   99660 //   20696

This is IMO not very good.  With memfds off:

stats  (UNIX): 226061 packets processed, latency (nsecs) min/max/avg
 3885 //   32019 //    4079
stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg
10525 //   42578 //   10932

With memfds off and the payload set to 8 bytes:

stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg
9963 //   64325 //   11645
stats  (UNIX): 253695 packets processed, latency (nsecs) min/max/avg
 2986 //   56094 //    3565

Am I missing something here?  This is slow enough that a lightweight
userspace dbus daemon should be able to outperform kdbus, or at least
come very close.

It would be kind of nice to know how long just the send call takes, too.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-04 23:03                                         ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-04 23:03 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Arnd Bergmann, Ted Ts'o, Michael Kerrisk, Linux API,
	One Thousand Gnomes, Austin S Hemmelgarn, Tom Gundersen,
	Greg Kroah-Hartman, linux-kernel, Eric W. Biederman,
	David Herrmann, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Tue, Feb 3, 2015 at 2:09 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> Hi Andy,
>
> On 02/02/2015 09:12 PM, Andy Lutomirski wrote:
>> On Feb 2, 2015 1:34 AM, "Daniel Mack" <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>
>>> That's right, but again - if an application wants to gather this kind of
>>> information about tasks it interacts with, it can do so today by looking
>>> at /proc or similar sources. Desktop machines do exactly that already,
>>> and the kernel code executed in such cases very much resembles that in
>>> metadata.c, and is certainly not cheaper. kdbus just makes such
>>> information more accessible when requested. Which information is
>>> collected is defined by bit-masks on both the sender and the receiver
>>> connection, and most applications will effectively only use a very
>>> limited set by default if they go through one of the more high-level
>>> libraries.
>>
>> I should rephrase a bit.  Kdbus doesn't require use of send-time
>> metadata.  It does, however, strongly encourage it, and it sounds like
>
> On the kernel level, kdbus just *offers* that, just like sockets offer
> SO_PASSCRED. On the userland level, kdbus helps applications get that
> information race-free, easier and faster than they would otherwise.
>
>> systemd and other major users will use send-time metadata.  Once that
>> happens, it's ABI (even if it's purely in userspace), and changing it
>> is asking for security holes to pop up.  So you'll be mostly stuck
>> with it.
>
> We know we can't break the ABI. At most, we could deprecate item types
> and introduce new ones, but we want to avoid that by all means of
> course. However, I fail to see how that is related to send time
> metadata, or even to kdbus in general, as all ABIs have to be kept stable.
>
>> Do you have some simple benchmark code you can share?  I'd like to
>> play with it a bit.
>
> Sure, it's part of the self-test suite. Call it with "-t benchmark" to
> run the benchmark as isolated test with verbose output. The code for
> that lives in test-benchmark.c.

I see "latencies" of around 20 microseconds with lockdep and context
tracking off.  For example:

stats  (UNIX): 226730 packets processed, latency (nsecs) min/max/avg
 3845 //   34828 //    4069
stats (KDBUS): 37103 packets processed, latency (nsecs) min/max/avg
19123 //   99660 //   20696

This is IMO not very good.  With memfds off:

stats  (UNIX): 226061 packets processed, latency (nsecs) min/max/avg
 3885 //   32019 //    4079
stats (KDBUS): 83284 packets processed, latency (nsecs) min/max/avg
10525 //   42578 //   10932

With memfds off and the payload set to 8 bytes:

stats (KDBUS): 77669 packets processed, latency (nsecs) min/max/avg
9963 //   64325 //   11645
stats  (UNIX): 253695 packets processed, latency (nsecs) min/max/avg
 2986 //   56094 //    3565

Am I missing something here?  This is slow enough that a lightweight
userspace dbus daemon should be able to outperform kdbus, or at least
come very close.

It would be kind of nice to know how long just the send call takes, too.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
  2015-02-04 23:03                                         ` Andy Lutomirski
  (?)
@ 2015-02-05  0:16                                         ` David Herrmann
  2015-02-08 16:54                                             ` Andy Lutomirski
  -1 siblings, 1 reply; 143+ messages in thread
From: David Herrmann @ 2015-02-05  0:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Arnd Bergmann, Ted Ts'o, Michael Kerrisk,
	Linux API, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

Hi

On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> I see "latencies" of around 20 microseconds with lockdep and context
> tracking off.  For example:

Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
suspect you enabled metadata transmission, which I think is not a fair
comparison.

A few notes on that:

* kdbus is a bus layer. We don't intend to replace UDS, but improve
dbus. Comparing roundtrip times with UDS is tempting, but in no way
fair. To the very least, a bus layer has to perform peer-lookup, which
UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
Compare this to ~77us for dbus1 without marshaling.

* We have not optimized kdbus code-paths for speed, yet. Our main
concerns are algorithmic challenges, and we believe they've been
improved considerably with kdbus. I have constantly measured kdbus
performance with 'perf' and flame-graphs, and there're a lot of
possible optimizations (especially on locking). However, I think this
can be done afterwards just fine. Neither API nor ioctl overhead has
shown up in my measurements. If anyone has counter evidence, please
let us know. But I'm a bit reluctant to change our API solely based on
performance guesses.

* We're about 50% slower than UDS on 1-byte transmissions. With 32k
we're on-par. How can a lightweight user-space daemon even get close
to that?

* Broadcast performance is a completely different story. SEND gets
around 30% faster compared to kdbus unicasts (as most of the
control-paths are only taken once per message, instead of once per
destination).

* test-benchmark.c does performance tests in a single process. If the
bus-layer is implemented in user-space, you need to account for
context-switches and task wakeups. My UDS and pipe round-trip latency
tests got around 3x slower if done cross processes (3.7us instead of
1.2us). With a user-space daemon, those slow-downs are taken two times
more often for each roundtrip.

* Process time is accounted on the sender, instead of a shared process
(dbus-daemon). Broadcasts will thus no longer consume time-slices of
dbus-daemon, but only the sender's.


With kdbus, we implement a bus-layer. This is our only target! If your
target environment does not require a bus, then don't use kdbus. We
don't intend to replace UDS. On a bus-layer, we need peer-discovery,
policy-handling, destination-lookups, broadcast-management and more.
Pipes/UDS do not provide any of this.
I cannot see how any other existing bus-implementation comes even
close to kdbus, performance-wise. If someone does, please let us know!

Thanks
David

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-08 16:54                                             ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-08 16:54 UTC (permalink / raw)
  To: David Herrmann
  Cc: Arnd Bergmann, Ted Ts'o, Linux API, Michael Kerrisk,
	Daniel Mack, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Feb 4, 2015 4:16 PM, "David Herrmann" <dh.herrmann@gmail.com> wrote:
>
> Hi
>
> On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > I see "latencies" of around 20 microseconds with lockdep and context
> > tracking off.  For example:
>
> Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
> for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
> suspect you enabled metadata transmission, which I think is not a fair
> comparison.

I tried to disable metadata.  I may have failed.

Regardless, if metadata is very slow, then that's more reason not to
use it on send.  And if you shouldn't use it, then maybe the kernel
shouldn't provide it.

I assumed there was a context switch in there.  I can try to test
differently.  If UDS is twice as fast *with* a contest switch, then a
userspace solution should be faster.

Also, UDS can use memfds, too.

>
> A few notes on that:
>
> * kdbus is a bus layer. We don't intend to replace UDS, but improve
> dbus. Comparing roundtrip times with UDS is tempting, but in no way
> fair. To the very least, a bus layer has to perform peer-lookup, which
> UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
> Compare this to ~77us for dbus1 without marshaling.

This makes me wonder what dbus1 is doing wrong.

>
> * We have not optimized kdbus code-paths for speed, yet. Our main
> concerns are algorithmic challenges, and we believe they've been
> improved considerably with kdbus. I have constantly measured kdbus
> performance with 'perf' and flame-graphs, and there're a lot of
> possible optimizations (especially on locking). However, I think this
> can be done afterwards just fine. Neither API nor ioctl overhead has
> shown up in my measurements. If anyone has counter evidence, please
> let us know. But I'm a bit reluctant to change our API solely based on
> performance guesses.

But removal of send-time metadata can't be done after the fact.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH 01/13] kdbus: add documentation
@ 2015-02-08 16:54                                             ` Andy Lutomirski
  0 siblings, 0 replies; 143+ messages in thread
From: Andy Lutomirski @ 2015-02-08 16:54 UTC (permalink / raw)
  To: David Herrmann
  Cc: Arnd Bergmann, Ted Ts'o, Linux API, Michael Kerrisk,
	Daniel Mack, One Thousand Gnomes, Austin S Hemmelgarn,
	Tom Gundersen, Greg Kroah-Hartman, linux-kernel,
	Eric W. Biederman, Djalal Harouni, Johannes Stezenbach,
	Christoph Hellwig

On Feb 4, 2015 4:16 PM, "David Herrmann" <dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi
>
> On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> > I see "latencies" of around 20 microseconds with lockdep and context
> > tracking off.  For example:
>
> Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
> for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
> suspect you enabled metadata transmission, which I think is not a fair
> comparison.

I tried to disable metadata.  I may have failed.

Regardless, if metadata is very slow, then that's more reason not to
use it on send.  And if you shouldn't use it, then maybe the kernel
shouldn't provide it.

I assumed there was a context switch in there.  I can try to test
differently.  If UDS is twice as fast *with* a contest switch, then a
userspace solution should be faster.

Also, UDS can use memfds, too.

>
> A few notes on that:
>
> * kdbus is a bus layer. We don't intend to replace UDS, but improve
> dbus. Comparing roundtrip times with UDS is tempting, but in no way
> fair. To the very least, a bus layer has to perform peer-lookup, which
> UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
> Compare this to ~77us for dbus1 without marshaling.

This makes me wonder what dbus1 is doing wrong.

>
> * We have not optimized kdbus code-paths for speed, yet. Our main
> concerns are algorithmic challenges, and we believe they've been
> improved considerably with kdbus. I have constantly measured kdbus
> performance with 'perf' and flame-graphs, and there're a lot of
> possible optimizations (especially on locking). However, I think this
> can be done afterwards just fine. Neither API nor ioctl overhead has
> shown up in my measurements. If anyone has counter evidence, please
> let us know. But I'm a bit reluctant to change our API solely based on
> performance guesses.

But removal of send-time metadata can't be done after the fact.

--Andy

^ permalink raw reply	[flat|nested] 143+ messages in thread

end of thread, other threads:[~2015-02-08 16:55 UTC | newest]

Thread overview: 143+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-16 19:16 [PATCH v3 00/13] Add kdbus implementation Greg Kroah-Hartman
2015-01-16 19:16 ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 01/13] kdbus: add documentation Greg Kroah-Hartman
2015-01-20 13:53   ` Michael Kerrisk (man-pages)
2015-01-20 13:53     ` Michael Kerrisk (man-pages)
2015-01-20 14:31     ` David Herrmann
2015-01-20 14:31       ` David Herrmann
2015-01-20 14:42       ` Josh Boyer
2015-01-20 14:42         ` Josh Boyer
2015-01-20 14:53         ` Djalal Harouni
2015-01-20 14:53           ` Djalal Harouni
2015-01-20 16:08           ` Johannes Stezenbach
2015-01-20 17:00             ` David Herrmann
2015-01-20 17:00               ` David Herrmann
2015-01-20 22:00               ` Johannes Stezenbach
2015-01-20 22:00                 ` Johannes Stezenbach
2015-01-21 10:28       ` Michael Kerrisk (man-pages)
2015-01-21 10:28         ` Michael Kerrisk (man-pages)
2015-01-20 18:23     ` Daniel Mack
2015-01-20 18:23       ` Daniel Mack
2015-01-21 10:32       ` Michael Kerrisk (man-pages)
2015-01-21 10:32         ` Michael Kerrisk (man-pages)
2015-01-21 15:19         ` Theodore Ts'o
2015-01-21 15:19           ` Theodore Ts'o
2015-01-21 16:58         ` Daniel Mack
2015-01-21 16:58           ` Daniel Mack
2015-01-22 10:18           ` Michael Kerrisk (man-pages)
2015-01-22 10:18             ` Michael Kerrisk (man-pages)
2015-01-22 13:46             ` David Herrmann
2015-01-22 13:46               ` David Herrmann
2015-01-22 14:49               ` Austin S Hemmelgarn
2015-01-23 16:08                 ` Greg Kroah-Hartman
2015-01-26 14:46                   ` Michael Kerrisk (man-pages)
2015-01-26 14:46                     ` Michael Kerrisk (man-pages)
2015-01-27 15:05                     ` David Herrmann
2015-01-27 15:05                       ` David Herrmann
2015-01-27 16:03                       ` Andy Lutomirski
2015-01-27 16:03                         ` Andy Lutomirski
2015-01-29  8:53                         ` Daniel Mack
2015-01-29  8:53                           ` Daniel Mack
2015-01-29 11:25                           ` Andy Lutomirski
2015-01-29 11:42                             ` Daniel Mack
2015-01-29 12:09                               ` Andy Lutomirski
2015-02-02  9:34                                 ` Daniel Mack
2015-02-02  9:34                                   ` Daniel Mack
2015-02-02 20:12                                   ` Andy Lutomirski
2015-02-02 20:12                                     ` Andy Lutomirski
2015-02-03 10:09                                     ` Daniel Mack
2015-02-03 10:09                                       ` Daniel Mack
2015-02-04  0:41                                       ` Andy Lutomirski
2015-02-04  0:41                                         ` Andy Lutomirski
2015-02-04  2:47                                         ` Eric W. Biederman
2015-02-04  2:47                                           ` Eric W. Biederman
2015-02-04  3:14                                           ` Greg Kroah-Hartman
2015-02-04  3:14                                             ` Greg Kroah-Hartman
2015-02-04  6:30                                             ` Eric W. Biederman
2015-02-04  6:30                                               ` Eric W. Biederman
2015-02-04 23:03                                       ` Andy Lutomirski
2015-02-04 23:03                                         ` Andy Lutomirski
2015-02-05  0:16                                         ` David Herrmann
2015-02-08 16:54                                           ` Andy Lutomirski
2015-02-08 16:54                                             ` Andy Lutomirski
2015-01-27 18:03                       ` Michael Kerrisk (man-pages)
2015-01-27 18:03                         ` Michael Kerrisk (man-pages)
2015-01-23 11:47               ` Michael Kerrisk (man-pages)
2015-01-23 11:47                 ` Michael Kerrisk (man-pages)
2015-01-23 15:54             ` Greg Kroah-Hartman
2015-01-23 15:54               ` Greg Kroah-Hartman
2015-01-26 14:42               ` Michael Kerrisk (man-pages)
2015-01-26 14:42                 ` Michael Kerrisk (man-pages)
2015-01-26 15:26                 ` Tom Gundersen
2015-01-26 16:44                   ` christoph Hellwig
2015-01-26 16:44                     ` christoph Hellwig
2015-01-26 16:45                   ` Michael Kerrisk (man-pages)
2015-01-27 15:23                     ` David Herrmann
2015-01-27 17:53                       ` Michael Kerrisk (man-pages)
2015-01-27 18:14                         ` Daniel Mack
2015-01-27 18:14                           ` Daniel Mack
2015-01-28 10:46                           ` Michael Kerrisk (man-pages)
2015-01-20 13:58   ` Michael Kerrisk (man-pages)
2015-01-20 13:58     ` Michael Kerrisk (man-pages)
2015-01-20 17:50     ` Daniel Mack
2015-01-21  8:57       ` Michael Kerrisk (man-pages)
2015-01-21  8:57         ` Michael Kerrisk (man-pages)
2015-01-21  9:07         ` Daniel Mack
2015-01-21  9:07     ` Michael Kerrisk (man-pages)
2015-01-21  9:07       ` Michael Kerrisk (man-pages)
2015-01-21  9:12       ` Daniel Mack
2015-01-21  9:12         ` Daniel Mack
2015-01-23  6:28   ` Ahmed S. Darwish
2015-01-23  6:28     ` Ahmed S. Darwish
2015-01-23 13:19     ` Greg Kroah-Hartman
2015-01-23 13:29       ` Greg Kroah-Hartman
2015-01-23 13:29         ` Greg Kroah-Hartman
2015-01-25  3:30       ` Ahmed S. Darwish
2015-01-25  3:30         ` Ahmed S. Darwish
2015-01-16 19:16 ` [PATCH 02/13] kdbus: add header file Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 03/13] kdbus: add driver skeleton, ioctl entry points and utility functions Greg Kroah-Hartman
2015-01-16 19:16   ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 04/13] kdbus: add connection pool implementation Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 05/13] kdbus: add connection, queue handling and message validation code Greg Kroah-Hartman
2015-01-16 19:16   ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 06/13] kdbus: add node and filesystem implementation Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 07/13] kdbus: add code to gather metadata Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 08/13] kdbus: add code for notifications and matches Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 09/13] kdbus: add code for buses, domains and endpoints Greg Kroah-Hartman
2015-01-16 19:16   ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 10/13] kdbus: add name registry implementation Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 11/13] kdbus: add policy database implementation Greg Kroah-Hartman
2015-01-16 19:16   ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 12/13] kdbus: add Makefile, Kconfig and MAINTAINERS entry Greg Kroah-Hartman
2015-01-16 19:16   ` Greg Kroah-Hartman
2015-01-16 19:16 ` [PATCH 13/13] kdbus: add selftests Greg Kroah-Hartman
2015-01-16 22:07 ` [PATCH v3 00/13] Add kdbus implementation Josh Boyer
2015-01-16 22:07   ` Josh Boyer
2015-01-16 22:18   ` Greg Kroah-Hartman
2015-01-17  0:26     ` Daniel Mack
2015-01-17  0:26       ` Daniel Mack
2015-01-17  0:41       ` Josh Boyer
2015-01-17  0:41         ` Josh Boyer
2015-01-19 18:06 ` Johannes Stezenbach
2015-01-19 18:06   ` Johannes Stezenbach
2015-01-19 18:38   ` Greg Kroah-Hartman
2015-01-19 20:19     ` Johannes Stezenbach
2015-01-19 20:19       ` Johannes Stezenbach
2015-01-19 20:31       ` Greg Kroah-Hartman
2015-01-19 23:38         ` Johannes Stezenbach
2015-01-19 23:38           ` Johannes Stezenbach
2015-01-20  1:13           ` Greg Kroah-Hartman
2015-01-20  1:13             ` Greg Kroah-Hartman
2015-01-20 10:57             ` Johannes Stezenbach
2015-01-20 11:26               ` Greg Kroah-Hartman
2015-01-20 11:26                 ` Greg Kroah-Hartman
2015-01-20 13:24                 ` Johannes Stezenbach
2015-01-20 13:24                   ` Johannes Stezenbach
2015-01-20 14:12                   ` Michael Kerrisk (man-pages)
2015-01-26 21:32             ` One Thousand Gnomes
2015-01-26 21:32               ` One Thousand Gnomes
2015-01-19 18:33 ` Johannes Stezenbach
2015-01-19 18:33   ` Johannes Stezenbach
2015-01-20 14:05 ` Michael Kerrisk (man-pages)
2015-01-20 14:05   ` Michael Kerrisk (man-pages)
2015-01-20 14:15 ` Michael Kerrisk (man-pages)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.