linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] RFC: KBUS messaging subsystem
@ 2011-03-18 17:21 Tony Ibbs
  2011-03-18 17:21 ` [PATCH 01/11] Documentation for KBUS Tony Ibbs
  2011-03-22 19:36 ` [PATCH 00/11] RFC: KBUS messaging subsystem Jonathan Corbet
  0 siblings, 2 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml; +Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely

KBUS is a lightweight, Linux kernel mediated messaging system,
particularly intended for use in embedded environments.

It is meant to be simple to use and understand. It is designed to
provide predictable message delivery, deterministic message ordering,
and a guaranteed reply for each request. It is especially aimed at
situations where existing solutions, such as DBUS, cannot be used,
typically because of system constraints.

We have various customers using KBUS in real life, and believe it to
be useful. I had a showcase table for KBUS at the ELCE in Cambridge,
October last year, and there seemed to be interest.

The KBUS project home page is at http://kbus-messaging.org/, from
which there are links to the original Google code repository, more
documentation, and various userspace libraries.

There is a working repository with these patches applied to
Linux 2.6.37, available via:

 git pull git://github.com/crazyscot/linux-2.6-kbus.git kbus-2.6.37

These patches have been applied in branch apply-patchset-20110318

In order to keep the size of individual patches down, the main code
has been split over several patches (0004..0009). With luck this
should also make it easier to understand what KBUS is trying to do.

Tony Ibbs (11):
  Documentation for KBUS
  KBUS external header file.
  KBUS internal header file
  KBUS main source file, basic device support only
  KBUS add support for messages
  KBUS add ability to receive messages only once
  KBUS add ability to add devices at runtime
  KBUS add Replier Bind Events
  KBUS Replier Bind Event set-aside lists
  KBUS report state to userspace
  KBUS configuration and Makefile

 Documentation/Kbus.txt     | 1222 ++++++++++++
 include/linux/kbus_defns.h |  666 +++++++
 init/Kconfig               |    2 +
 ipc/Kconfig                |  117 ++
 ipc/Makefile               |    9 +
 ipc/kbus_internal.h        |  723 +++++++
 ipc/kbus_main.c            | 4690 ++++++++++++++++++++++++++++++++++++++++++++
 ipc/kbus_report.c          |  256 +++
 8 files changed, 7685 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/Kbus.txt
 create mode 100644 include/linux/kbus_defns.h
 create mode 100644 ipc/Kconfig
 create mode 100644 ipc/kbus_internal.h
 create mode 100644 ipc/kbus_main.c
 create mode 100644 ipc/kbus_report.c

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/11] Documentation for KBUS
  2011-03-18 17:21 [PATCH 00/11] RFC: KBUS messaging subsystem Tony Ibbs
@ 2011-03-18 17:21 ` Tony Ibbs
  2011-03-18 17:21   ` [PATCH 02/11] KBUS external header file Tony Ibbs
  2011-03-22 19:36 ` [PATCH 00/11] RFC: KBUS messaging subsystem Jonathan Corbet
  1 sibling, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs


Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 Documentation/Kbus.txt | 1222 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 1222 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/Kbus.txt

diff --git a/Documentation/Kbus.txt b/Documentation/Kbus.txt
new file mode 100644
index 0000000..7cf723fd6
--- /dev/null
+++ b/Documentation/Kbus.txt
@@ -0,0 +1,1222 @@
+=============================================
+KBUS -- Lightweight kernel-mediated messaging
+=============================================
+
+Summary
+=======
+KBUS provides lightweight kernel-mediated messaging for Linux.
+
+* "lightweight" means that there is no intent to provide complex or
+  sophisticated mechanisms - if you need something more, consider DBUS or
+  other alternatives.
+
+* "kernel-mediated" means that the actual business of message passing and
+  message synchronisation is handled by a kernel module.
+
+* "for Linux" means what it says, since the Linux kernel is required.
+
+Initial use is expected to be in embedded systems.
+
+There is (at least initially) no intent to aim for a "fast" system - this is
+not aimed at real-time systems.
+
+Although the implementation is kernel-mediated, there is a mechanism
+("Limpets") for commnicating KBUS messages between buses and/or systems.
+
+Intentions
+==========
+KBUS is intended:
+
+* To be simple to use and simple to understand.
+* To have a small codebase, written in C.
+* To provide predictable message delivery.
+* To give deterministic message ordering.
+* To guarantee a reply to every request.
+
+It needs to be simple to use and understand because the expected users are
+typically busy with other matters, and do not have time to spend learning
+a complex messaging system.
+
+It needs to have a small codebase, written in C, because embedded systems
+often lack resources, and may not have enough space for C++ libraries, or
+messaging systems supporting more complex protocol stacks.
+
+Our own experience on embedded systems of various sizes indicates that
+the last three points are especially important.
+
+Predictable message delivery means the user can know whether they can tell in
+what circumstances messages will or will not be received.
+
+Deterministic message ordering means that all recipients of a given set of
+messages will receive them in the same order as all other recpients (and this
+will be the order in which the messages were sent). This is important when
+several part of (for instance) an audio/video stack are interoperating.
+
+Guaranteeing that a request will always result in a reply means that the user
+will be told if the intended replier has (for instance) crashed. This again
+allows for simpler use of the system.
+
+The basics
+==========
+Python and C
+------------
+Although the KBUS kernel module is written in C, the module tests are written
+in Python, and there is a Python module providing useful interfaces, which is
+expected to be the normal way of using KBUS from Python.
+
+There is also a C library (libkbus) which provides a similar level of
+abstraction, so that C programmers can use KBUS without having to handle the
+low level details of sockets and message datastructures. Note that the C
+programer using KBUS does need to have some awareness of how KBUS messages
+work in order to get memory management right.
+
+Messages
+========
+Message names
+-------------
+All messages have names - for instance "$.Sensors.Kitchen".
+
+All message names start with "$.", followed by one or more alphanumeric words
+separated by dots. There are two wildcard characters, "*" and "%", which can
+be the last word of a name.
+
+Thus (in some notation or other)::
+
+    name := '$.'  [ word '.' ]+  ( word  | '*' | '%' )
+    word := alphanumerics
+
+Case is significant. There is probably a limit on the maximum size of a
+subname, and also on the maximum length of a message name.
+
+Names form a name hierarchy or tree - so "$.Sensors" might have children
+"$.Sensors.Kitchen" and "$.Sensors.Bedroom".
+
+If the last word of a name is "*", then this is a wildcard name that also
+includes all the child names at that level and below -- i.e., all the names
+that start with the name up to the "*". So "$.Sensors.*" includes
+"$.Sensors.Kitchen", "$.Sensors.Bedroom", "$.Sensors.Kitchen.FireAlarm",
+"$.Sensors.Kitchen.Toaster", "$.Sensors.Bedroom.FireAlarm", and so on.
+
+If the last word of a name is "%", then this is a wildcard name that also
+includes all the child names at that level -- i.e., all the names obtained by
+replacing the "%" by another word. So "$.Sensors.%" includes
+"$.Sensors.Kitchen" and "$.Sensors.Bedroom", but not
+"$.Sensors.Kitchen.Toaster".
+
+Message ids
+-----------
+Every message is expected to have a unique id.
+
+A message id is made up of two parts, a network id and a serial number.
+
+The network id is used to carry useful information when a message is
+transferred from one KBUS system to another (for instance, over a bridge). By
+default (for local messages) it is 0.
+
+A serial number is used to identify the particular message within a network.
+
+If a message is sent via KBUS with a network id of 0, then KBUS itself will
+assign a new message id to the message, with the network id (still) 0, and
+with the serial number one more than the last serial number assigned. Thus for
+local messages, message ids ascend, and their order is deterministic.
+
+If a message is sent via KBUS with a non-zero network id, then KBUS does not
+touch its message id.
+
+Network ids are represented textually as ``{n,s}``, where ``n`` is the
+network id and ``s`` is the serial number.
+
+    Message id {0,0} is reserved for use as an invalid message id. Both
+    network id and serial number are unsigned 32-bit integers. Note that this
+    means that local serial numbers will eventually wrap.
+
+Message content
+---------------
+Messages are made of the following parts:
+
+:start and end guards:
+
+  These are unsigned 32-bit words. 'start_guard' is notionally "Kbus",
+  and 'end_guard' (the 32 bit word after the rest of the message) is
+  notionally "subK". Obviously that depends on how one looks at the 32-bit
+  word. Every message shall start with a start guard and end with an end
+  guard (but see `Message implementation`_ for details).
+
+  These provide some help in checking that a message is well formed, and in
+  particular the end guard helps to check for broken length fields.
+
+  If the message layout changes in an incompatible manner (this has happened
+  once, and is strongly discouraged), then the start and end guards change.
+
+Unset
+~~~~~
+Unset values are 0, or have zero length (as appropriate).
+
+It is not possible for a message name to be unset.
+
+The message header
+~~~~~~~~~~~~~~~~~~
+:message id: identifies this particular message. This is made up of a network
+  id and a serial number, and is discussed in `Message ids`_.
+
+  When replying to a message, copy this value into the 'In reply to' field.
+
+:in_reply_to: is the message id of the message that this is a reply to.
+
+  This shall be set to 0 unless this message *is* a reply to a previous
+  message. In other words, if this value is non-0, then the message *is* a
+  reply.
+
+:to: is the Ksock id identifying who the message is to be sent to.
+
+  When writing a new message, this should normally be set to 0, meaning
+  "anyone listening" (but see below if "state" is being maintained).
+
+  When replying to a message, it shall be set to the 'from' value of the
+  orginal message.
+
+  When constructing a request message (a message wanting a reply), then it can
+  be set to a specific replier's Ksock id. When such a message is sent, if the
+  replier bound (at that time) does not have that specific Ksock id, then the
+  send will fail.
+
+:from: indicates the Ksock id of the message's sender.
+
+  When writing a new message, set this to 0, since KBUS will set it.
+
+  When reading a message, this will have been set by KBUS.
+
+:orig_from: this indicates the original sender of a message, when being
+  transported via Limpet. This will be documented in more detail in the future.
+
+:final_to: this indicates the final target of a message, when being
+  transported via Limpet. This will be documented in more detail in the future.
+
+:extra: this is a zero field, for future expansion. KBUS will always set this
+  field to zero.
+
+:flags: indicates extra information about the message. See `Message Flags`_
+  for detailed information.
+
+  When writing a message, typical uses include:
+
+  * the message is URGENT
+  * a reply is wanted
+
+  When reading a message, typical uses include:
+
+  * the message is URGENT
+  * a reply is wanted
+  * a reply is wanted from the specific reader
+
+  The top 16 bits of the flags field is reserved for use by the user - KBUS
+  will not touch it.
+
+:name_length: is the length of the message name in bytes. This will always be
+  non-zero, as a message name must always be given.
+
+:data_length: is the length of the message data in bytes. It may be zero
+  if there is no data associated with this message.
+
+:name: identifies the message. It must be terminated with a
+  zero byte (as is normal for C - in the Python binding a normal Python string
+  can be used, and the this will be done for you). Byte ordering is according
+  to that of the platform.
+
+  In an "entire" message (see `Message implementation`_ below) the name shall
+  be padded out to a multiple of 4 bytes. Neither the terminating zero byte
+  nor the padding are included in the name length.  Padding should be with
+  zero bytes.
+
+:data: is optional. KBUS does not touch the content of the
+  data, but just copies it. Byte ordering is according to that of the
+  platform.
+
+  In an "entire" message (see `Message implementation`_ below) the data shall,
+  if present, be padded out to a multiple of 4 bytes. This padding is not
+  included in the data length, and the padding bytes may be whatever byte
+  values are convenient to the user. KBUS does not guarantee to copy the exact
+  given padding bytes (in fact, current implementations just ignore them).
+
+Message implementation
+~~~~~~~~~~~~~~~~~~~~~~
+There are two ways in which a message may be constructed, "pointy" and
+"entire".
+See the ``kbus_defns.h`` header file for details.
+
+.. note:: The Python binding hides most of the detail of the message
+   implementation from the user, so if you are using Python you may be able to
+   skip this section.
+
+In a "pointy" message, the ``name`` and ``data`` fields in the message header
+are C pointers to the actual name and data. If there is no data, then the
+``data`` field is NULL. This is probably the simplest form of message for a C
+programmer to create. This might be represented as::
+
+        start_guard: 'Kbus'
+        id:          (0,0)
+        in_reply_to: (0,0)
+        to:          0
+        from:        0
+        name_len:    6
+        data_len:    0
+        name:        ---------------------------> "$.Fred"
+        data:        NULL
+        end_guard:   'subK'
+
+or (with data)::
+
+        start_guard: 'Kbus'
+        id:          (0,0)
+        in_reply_to: (0,0)
+        to:          0
+        from:        0
+        name_len:    6
+        data_len:    7
+        name:        ---------------------------> "$.Fred"
+        data:        ---------------------------> "abc1234"
+        end_guard:   'subK'
+
+.. warning:: When writing a "pointy" message in C, be very careful not to
+   free the name and data between the ``write`` and the SEND, as it is
+   only when the message is sent that KBUS actually follows the ``name`` and
+   ``data`` pointers.
+
+   *After* the SEND, KBUS will have taken its own copies of the name and
+   (any) data.
+
+In an "entire" message, both ``name`` and ``data`` fields are required to be
+NULL. The message header is followed by the message name (padded as described
+above), any message data (also padded), and another end guard. This might be
+represented as::
+
+        start_guard: 'Kbus'
+        id:          (0,0)
+        in_reply_to: (0,0)
+        to:          0
+        from:        0
+        name_len:    6
+        data_len:    0
+        name:        NULL
+        data:        NULL
+        end_guard:   'subK'
+        name_data:   '$.Fred\x0\x0'
+        end_guard:   'subK'
+
+or (again with data)::
+
+        start_guard: 'Kbus'
+        id:          (0,0)
+        in_reply_to: (0,0)
+        to:          0
+        from:        0
+        name_len:    6
+        data_len:    7
+        name:        NULL
+        data:        NULL
+        end_guard:   'subK'
+        name_data:   '$.Fred\x0\x0'
+        data_data:   'abc1234\x0'
+        end_guard:   'subK'
+
+Note that in these examples:
+
+1. The message name is padded out to 6 bytes of name, plus one of terminating
+   zero byte, plus another zero byte to make 8, but the message's ``name_len``
+   is still 6.
+2. When there is no data, there is no "data data" after the name data.
+3. When there is data, the data is presented after the name, and is padded out
+   to a multiple of 4 bytes (but without the necessity for a terminating zero
+   byte, so it is possible to have no pad bytes if the data length is already
+   a multiple of 4). Again, the ``data_len`` always reflects the "real" data
+   length.
+4. Although the data shown is presented as ASCII strings for these examples,
+   it really is just bytes, with no assumption of its content/meaning.
+
+When writing/sending messages, either form may be used (again, the "pointy"
+form may be simpler for C programmers).
+
+When reading messages, however, the "entire" form is always returned - this
+removes questions about needing to free multiple returned datastructures (for
+instance, what to do if the user were to ask for the NEXTMSG, read a few
+bytes, and then DISCARD the rest).
+
+Limits
+~~~~~~
+Message names may not be shorter than 3 characters (since they must be at
+least "$." plus another character). An arbitrary limit is also placed on the
+maximum message length - this is currently 1000 characters, but may be
+reviewed in the future.
+
+Message data may, of course, be of zero length.
+
+When reading a message, an "entire" message is always returned.
+
+    .. note:: When using C to work with KBUS messages, it is generally
+       ill-advised to reference the message name and data "directly"::
+
+            char    *name = msg->name;
+            uint8_t *data = msg->data;
+
+       since this will work for "pointy" messages, but not for "entire"
+       messages (where the ``name`` field will be NULL). Instead, it
+       is always better to do::
+
+            char    *name = kbus_msg_name_ptr(msg);
+            uint8_t *data = kbus_msg_data_ptr(msg);
+
+       regardless of the message type.
+
+Message flags
+-------------
+KBUS reserves the bottom 16 bits of the flags word for predefined purposes
+(although not all of those bits are yet used), and guarantees not to touch the
+top 16 bits, which are available for use by the programmer as a particular
+application may wish.
+
+The WANT_A_REPLY bit is set by the sender to indicate that a
+reply is wanted. This makes the message into a request.
+
+    Note that setting the WANT_A_REPLY bit (i.e., a request) and
+    setting 'in_reply_to' (i.e., a reply) is bound to lead to
+    confusion, and the results are undefined (i.e., don't do it).
+
+The WANT_YOU_TO_REPLY bit is set by KBUS on a particular message
+to indicate that the particular recipient is responsible for replying
+to (this instance of the) message. Otherwise, KBUS clears it.
+
+The SYNTHETIC bit is set by KBUS when it generates a Status message, for
+instance when a replier has gone away and will therefore not be sending a
+reply to a request that has already been queued.
+
+    Note that KBUS does not check that a sender has not set this
+    flag on a message, but doing so may lead to confusion.
+
+The URGENT bit is set by the sender if this message is to be
+treated as urgent - i.e., it should be added to the *front* of the
+recipient's message queue, not the back.
+
+Send flags
+~~~~~~~~~~
+There are two "send" flags, ALL_OR_WAIT and ALL_OR_FAIL.
+Either one may be set, or both may be unset.
+
+   If both are set, the message will be rejected as invalid.
+
+   Both flags are ignored in reply messages (i.e., messages with the
+   'in_reply_to' field set).
+
+If a message has ALL_OR_FAIL set, then a SEND will only succeed if the message
+could be added to all the (intended) recipient's message queues. Otherwise,
+SEND returns -EBUSY.
+
+If a message has ALL_OR_WAIT set, then a SEND will only succeed if the message
+could be added to all the (intended) recipient's message queues. Otherwise
+SEND returns -EAGAIN. In this case, the message is still being sent, and the
+caller should either call DISCARD (to drop it), or else use poll/select to
+wait for the send to finish. It will not be possible to call "write" until the
+send has completed or been discarded.
+
+These are primarily intended for use in debugging systems. In particular, note
+that the mechanisms dealing with ALL_OR_WAIT internally are unlikely to be
+very efficient.
+
+.. note:: The send flags will be less effective when messages are being
+   mediated via Limpets, as remote systems are involved.
+
+Things KBUS changes in a message
+--------------------------------
+In general, KBUS leaves the content of a message alone - mostly so that an
+individual KBUS module can "pass through" messages from another domain.
+However, it does change:
+
+- the message id's serial number (but only if its network id is unset)
+- the 'from' id (to indicate the Ksock this message was sent from)
+- the WANT_YOU_TO_REPLY bit in the flags (set or cleared as appropriate)
+- the SYNTHETIC bit, which will always be unset in a message sent by a
+  Sender
+
+KBUS will always set the 'extra' field to zero.
+
+Limpets will change:
+
+- the network id in any field that has one.
+- the 'orig_from' and 'final_to' fields (which in general should only be
+  manipulated by Limpets).
+
+Types of message
+================
+There are four basic message types:
+
+* Announcement -- a message aimed at any listeners, expecting no reply
+* Request -- a message aimed at a replier, who is expected to reply
+* Reply -- a reply to a request
+* Status -- a message generated by KBUS
+
+The Python interface provides a Message base class, and subclasses thereof for
+each of the "user" message types (but not currently for Status).
+
+Announcements
+-------------
+An announcement is the "plain" message type. It is a message that is being
+sent for all bound listeners to "hear".
+
+When creating a new announcement message, it has:
+
+        :message id:   see `Message ids`_
+        :in reply to:  unset (it's not a reply)
+        :to:           unset (all announcements are broadcast to any listeners)
+        :from:         unset (KBUS will set it)
+        :flags:        typically unset, see `Message flags`_
+        :message name: as appropriate
+        :message data: as appropriate
+
+The Python interface provides an ``Announcement`` class to help in creating an
+announcement message.
+
+Request message
+---------------
+A request message is a message that wants a reply.
+
+Since only one Ksock may bind as a replier for a given message name, a
+request message wants a reply from a single Ksock. By default, this is
+whichever Ksock has bound to the message name at the moment of sending, but
+see `Stateful transactions`_.
+
+When creating a new request message, it has:
+
+        :message id:   see `Message ids`_
+        :in reply to:  unset (it's not a reply)
+        :to:           either unset, or a specific Ksock id if the request
+                       should fail if that Ksock is (no longer) the replier
+                       for this message name
+        :from:         unset (KBUS will set it)
+        :flags:        the "needs a reply" flag should be set.
+                       KBUS will set the "you need to reply" flag in the
+                       copy of the message delivered to its replier.
+        :message name: as appropriate
+        :message data: as appropriate
+
+When receiving a request message, the WANT_YOU_TO_REPLY flag will be set if it
+is this recipient's responsibility to reply.
+
+The Python interface provides a ``Request`` class to help in creating a
+request message.
+
+When a request message is sent, it is an error if there is no replier bound to
+that message name.
+
+The message will, as normal, be delivered to all listeners, and will have the
+"needs a reply" flag set wherever it is received. However, only the copy of
+the message received by the replier will be marked with the WANT_YOU_TO_REPLY
+flag.
+
+    So, if a particular file descriptor is bound as listener and replier
+    for '$.Fred', it will receive two copies of the original message (one
+    marked as needing reply from that file descriptor). However, when the
+    reply is sent, only the "plain" listener will receive a copy of the reply
+    message.
+
+Reply message
+-------------
+A reply message is the expected response after reading a request message.
+
+A reply message is distinguished by having a non-zero 'in reply to' value.
+
+Each reply message is in response to a specific request, as indicated by the
+'in reply to' field in the message.
+
+The replier is helped to remember that it needs to reply to a request, because
+the request has the WANT_YOU_TO_REPLY flag set.
+
+When a reply is sent, all listeners for that message name will receive it.
+However, the original replier will not.
+
+When creating a new reply message, it has:
+
+        :message id:   see `Message ids`_
+        :in reply to:  the request message's 'message id'
+        :to:           the request message's 'from' id
+        :from:         unset (KBUS will set it)
+        :flags:        typically unset, see `Message flags`_
+        :message name: the request message's 'message name'
+        :message data: as appropriate
+
+The Python interface provides a ``Reply`` class to help in creating a reply
+message, but more usefully there is also a ``reply_to`` function that creates
+a Reply Message from the original Request.
+
+Status message
+--------------
+KBUS generates Status messages (also sometimes referred to as "synthetic"
+messages) when a request message has been successfully sent, but the replier
+is unable to reply (for instance, because it has closed its Ksock). KBUS thus
+uses a Status message to provide the "reply" that it guarantees the sender
+will get.
+
+As you might expect, a KBUS status message is thus (technically) a reply
+message.
+
+A status message looks like:
+
+        :message id:   as normal
+        :in reply to:  the 'message id' of the message whose sending or
+                       processing caused this message.
+        :to:           the Ksock id of the recipient of the message
+        :from:         the Ksock id of the sender of the message - this will
+                       be 0 if the sender is KBUS itself (which is assumed for
+                       most exceptions)
+        :flags:        typically unset, see `Message flags`_
+        :message name: for KBUS exceptions, a message name in '$.KBUS.*'
+        :message data: for KBUS exceptions, normally absent
+
+KBUS status messages always have '$.KBUS.<something>' names (this may be a
+multi-level <something>), and are always in response to a previous message, so
+always have an 'in reply to'.
+
+Requests and Replies
+--------------------
+KBUS guarantees that each Request will (eventually) be matched by a consequent
+Reply (or Status [1]_) message, and only one such.
+
+The "normal" case is when the replier reads the request, and sends its own
+reply back.
+
+If a Request message has been successfully SENT, there are the following other
+cases to consider:
+
+1. The replier unbinds from that message name before reading the request
+   message from its queue. In this case, KBUS removes the message from the
+   repliers queue, and issues a "$.KBUS.Replier.Unbound" message.
+
+2. The replier closes itself (close the Ksock), but has not yet read the
+   message. In this case, KBUS issues a "$.KBUS.Replier.GoneAway" message.
+
+3. The replier closes itself (closes the Ksock), has read the message, but has
+   not yet (and now cannot) replied to it. In this case, KBUS issues a
+   "$.KBUS.Replier.Ignored" message.
+
+4. SEND did not complete, and the replier closes itself before the message can
+   be added to its message queue (by the POLL mechanism). In this case, KBUS
+   issues a "$.KBUS.Replier.Disappeared" message.
+
+5. SEND did not complete, and an error occurs when the POLL mechanims tries to
+   send the message. In this case, KBUS issues a "$.KBUS.ErrorSending"
+   message.
+
+In all these cases, the 'in_reply_to' field is set to the original request's
+message id. In the first three cases, the 'from' field will be set to the
+Ksock id of the (originally intended) replier. In the last two cases, that
+information is not available, and a 'from' of 0 (indicating KBUS itself) is
+used.
+
+.. [1] Remember that a Status message is essentially a specialisation of a
+       Reply message.
+
+.. note:: Limpets introduce some extra messages, which will be documented when
+   the proper Limpet documentation is written.
+
+KBUS end points - Ksocks
+========================
+The KBUS devices
+----------------
+Message interactions happen via the KBUS devices. Installing the KBUS kernel
+module always creates ``/dev/kbus0``, it may also create ``/dev/kbus1``, and
+so on.
+
+    The number of devices to create is indicated by an argument at module
+    installation, for instance::
+
+        # insmod kbus.ko num_kbus_devices=10
+
+Messages are sent by writing to a KBUS device, and received by reading from
+the same device. A variety of useful ioctls are also provided. Each KBUS
+device is independent - messages cannot be sent from ``/dev/kbus0`` to
+``/dev/kbus1``, since there is no shared information.
+
+Ksocks
+------
+Specifically, messages are written to and read from KBUS device file
+descriptors. Each such is termed a *Ksock* - this is a simpler term than "file
+descriptor", and has some resonance with "socket".
+
+Each Ksock may be any (one or more) of:
+
+* a Sender (opening the device for read/write)
+* a Listener (only needing to open the device for read)
+* a Replier (opening the device for read/write)
+
+Every Ksock has an id. This is a 32-bit unsigned number assigned by KBUS when
+the device is opened. The value 0 is reserved for KBUS itself.
+
+    The terms "listener id", "sender id", "replier id", etc., thus all refer
+    to a Ksock id, depending on what it is being used for.
+
+Senders
+-------
+Message senders are called "senders". A sender should open a Ksock for read
+and write, as it may need to read replies and error/status messages.
+
+A message is sent by:
+
+1. Writing the message to the Ksock (using the standard ``write`` function)
+2. Calling the SEND ioctl on the Ksock, to actually send the message. This
+   returns (via its arguments) the message id of the message sent. It also
+   returns status information about the send
+
+        The status information is to be documented.
+
+The DISCARD ioctl can be used to "throw away" a partially written message,
+before SEND has been called on it.
+
+If there are no listeners (of any type) bound to that message name, then the
+message will be ignored.
+
+If the message is flagged as needing a reply, and there are no repliers bound
+to that message name, then an error message will be sent to the sender, by
+KBUS.
+
+It is not possible to send a message with a wildcard message name.
+
+    As a restriction this makes the life of the implementor and documentor
+    easier. I believe it would also be confusing if provided.
+
+The sender does not need to bind to any message names in order to receive
+error and status messages from KBUS.
+
+When a sender sends a Request, an internal note is made that it expects a
+corresponding Reply (or possible a Status message from KBUS if the Replier
+goes away or unbinds from that message name, before replying). A place for
+that Reply is reserved in the sender's message queue. If the message queue
+fills up (either with messages waiting to be read, or with reserved slots for
+Replies), then the sender will not be able to send another Request until there
+is room on the message queue again.
+
+    Hopefully, this can be resolved by the sender reading a message off its
+    queue. However, if there are no messages to be read, and the queue is all
+    reserved for replies, the only solution is for the sender to wait for a
+    replier to send it something that it can then read.
+
+.. note:: What order do we describe things in? Don't forget:
+
+  If the message being sent is a request, then the replier bound to that
+  message name will (presumably) write a reply to the request. Thus the normal
+  sequence for a request is likely to be:
+
+  1. write the request message
+  2. read the reply
+
+  The sender does *not* need to bind to anything in order to receive a reply to
+  a request it has sent.
+
+      Of course, if a sender binds to listen to the name it uses for its
+      request, then it will get a copy of the request as sent, and it will
+      also get (an extra) copy of the reply. But see `Receiving messages once
+      only`_.
+
+Listeners
+---------
+Message recipients are called "listeners".
+
+Listeners indicate that they want to receive particular messages, by using the
+BIND ioctl on a Ksock to specify the name of the message that is to be
+listened for. If the binding is to a wildcarded message name, then the
+listener will receive all messages with names that match the wildcard.
+
+An ordinary listener will receive all messages with that name (sent to the
+relevant Ksock). A listener may make more than one binding on the same Ksock
+(indeed, it is allowed to bind to the same name more than once).
+
+Messages are received by:
+
+1. Using the NEXTMSG ioctl to request the next message (this also returns the
+   messages length in bytes)
+2. Calling the standard ``read`` function to read the message data.
+
+If NEXTMSG is called again, the next message will be readied for reading,
+whether the previous message has been read (or partially read) or not.
+
+If a listener no longer wants to receive a particular message name, then they
+can unbind from it, using the UNBIND ioctl. The message name and flags used in
+an UNBIND must match those in the corresponding BIND. Any messages in the
+listener's message queue which match that unbinding will be removed from the
+queue (i.e., the listener will not actually receive them). This does *not*
+affect the message currently being read.
+
+    Note that this has implication for binding and unbinding wildcards,
+    which must also match.
+
+Closing the Ksock also unbinds all the message bindings made on it.
+It does not affect message bindings made on other Ksocks.
+
+Repliers
+--------
+Repliers are a special sort of listener.
+
+For each message name, there may be a single "replier". A replier binds to a
+message name in the same way as any other listener, but sets the "replier"
+flag. If someone else has already bound to the same Ksock as a replier for
+that message name, the request will fail.
+
+Repliers only receive Requests (messages that are marked as wanting a reply).
+
+A replier may (should? must?) reply to the request - this is done by sending
+a Reply message through the Ksock from which the Request was read.
+
+It is perfectly legitimate to bind to a message as both replier and listener,
+in which case two copies of the message will be read, once as replier, and
+once as (just) listener (but see `Receiving messages once only`_).
+
+
+When a request message is read by the appropriate replier, KBUS will mark
+*that particular message* with the "you must reply" flag. This will not be set
+on copies of that message read by any (non-replier) listeners.
+
+    So, in the case where a Ksock is bound as replier and listener for the
+    same message name, only one of the two copies of the message received will
+    be marked as "you must reply".
+
+If a replier binds to a wildcarded message name, then they are the *default*
+replier for any message names satisfying that wildcard. If another replier
+binds to a more specific message name (matching that wildcard),
+then the specific message name binding "wins" - the wildcard replier will no
+longer receive that message name.
+
+    In particular '$.Fred.Jim' is more specific than '$.Fred.%' which in turn
+    is more specific than '$.Fred.*'
+
+This means that if a wildcard replier wants to guarantee to see all the
+messages matching their wildcard, they also need to bind as a listener for the
+same wildcarded name.
+
+For example:
+
+    Assume message names are of the form '$.Sensors.<Room>' or
+    '$.Sensors.<Room>.<Measurement>'.
+
+    Replier 1 binds to '$.Sensors.*'. They will be the default replier for
+    all sensor requests.
+
+    Replier 2 binds to '$.Sensors.%'. They will take over as the default
+    replier for any room specific requests.
+
+    Replier 3 binds to '$.Sensors.Kitchen.Temperature'. They will take over as
+    the replier for the kitchen temperature.
+
+    So:
+
+    - A message named '$.Sensors.Kitchen.Temperature' will go to replier 3.
+    - A message named '$.Sensors.Kitchen' or '$.Sensors.LivingRoom' will go to
+      replier 2.
+    - A message named '$.Sensors.LivingRoom.Temperature' will go to replier 1.
+
+When a Replier is closed (technically, when its ``release`` function is
+called by the kernel) KBUS traverses its outstanding message queue, and for
+each Request that has not been answered, generates a Status message saying
+that the Replier has "GoneAway".
+
+Similarly, if a Replier unbinds from replying to a mesage, KBUS traverses its
+outstanding message queue, and for each Request that has not been answered, it
+generates a Status message saying that it has "Unbound" from being a replier
+for that message name. It also forgets the message, which it is now not going
+to reply to.
+
+Lastly, when a Replier is closed, if it has read any Requests (technically,
+called NEXTMSG to pop them from the message queue), but not actually replied
+to them, then KBUS will send an "Ignored" Status message for each such
+Request.
+
+More information
+================
+Stateful transactions
+---------------------
+It is possible to make stateful message transactions, by:
+
+1. sending a Request
+2. receiving the Reply, and noting the Ksock id of the replier
+3. sending another Request to that specific replier
+4. and so on
+
+Sending a request to a particular Ksock will fail if that Ksock is no longer
+bound as replier to the relevant message name. This allows a sender to
+guarantee that it is communicating with a particular instance of the replier
+for a message name.
+
+Queues filling up
+-----------------
+Messages are sent by a mechanism which:
+
+1. Checks the message is plausible (it has a plausible message name,
+   and the right sort of "shape")
+2. If the message is a Request, checks that the sender has room on its message
+   queue for the (eventual) Reply.
+3. Finds the Ksock ids of all the listeners and repliers bound to that
+   messages name
+4. Adds the message to the queue for each such listener/replier
+
+This can cause problems if one of the queues is already full (allowing
+infinite expansion of queues would also cause problems, of couse).
+
+If a *sender* attempts to send a Request, but does not have room on its
+message queue for the (corresponding) Reply, then the message will not be
+sent, and the send will fail. Note that the message id will not be set, and
+the blocking behaviours defined below do not occur.
+
+If a *replier* cannot receive a particular message, because its queue is full,
+then the message will not be sent, and the send will fail with an error. This
+does, however, set the message id (and thus the "last message id" on the
+sender).
+
+Moreover, a sender can indicate if it wants a message to be:
+
+1. Added to all the listener queues, regardless, in which case it will block
+   until that can be done (ALL_OR_WAIT, sender blocks)
+2. Added to all the listener queues, and fail if that can't be done
+   (ALL_OR_FAIL)
+3. Added to all the listener queues that have room (the default)
+
+See `Message flags`_ for more details.
+
+Urgent messages
+---------------
+Messages may be flagged urgent. In this case they will be added to the front
+of the destination message queue, rather than the end - in other words, they
+will be the next message to be "popped" by NEXTMSG.
+
+Note that this means that if two urgent messages are sent to the same target,
+and *then* a NEXTMSG/read occurs, the second urgent message will be popped and
+read first.
+
+Select, write/send and "next message", blocking
+-----------------------------------------------
+.. warning:: At the moment, ``read`` and ``write`` are always non-blocking.
+
+``read`` returns more of the currently selected message, or EOF if there is no
+more of that message to read (and thus also if there is no currently selected
+message). The NEXTMSG ioctl is used to select ("pop") the next message.
+
+``write`` writes to the end of the currently-being-written message. The
+DISCARD ioctl can be used to discard the data written so far, and the SEND
+ioctl to send the (presumably completed message). Whilst the message is being
+sent, it is not possible to use ``write``.
+
+Note that if SEND is used to send a Request, then KBUS ensures that there will
+always be either a Reply or a Status message in response to that request.
+
+Specifically, if:
+
+1. The Replier "goes away" (and its "release" function is called) before
+   reading the Request (specifically, before calling NEXTMSG to pop it from
+   the message queue)
+2. The Replier "goes away" (and its "release" function is called) before
+   replying to a Request that it has already read (i.e., used NEXTMSG to pop
+   from the message queue)
+3. The Replier unbinds from that Request message name before reading the
+   Request (with the same caveat on what that means)
+4. Select/poll attempts to send the Request, and discovers that the
+   Replier has disappeared since the initial SEND
+5. Select/poll attempts to send the Request, and some other error occurs
+
+then KBUS will "reply" with an appropriate Status message.
+
+--------------------------------------------------
+
+KBUS support its own particular variation on blocking of message sending.
+
+First of all, it supports use of "select" to determine if there are any
+messages waiting to be read. So, for instance (in Python)::
+
+        with Ksock(0,'rw') as sender:
+            with Ksock(0,'r') as listener:
+                (r,w,x) = select.select([listener],[],[],0)
+                assert r == []
+
+                listener.bind('$.Fred')
+                msg = Announcement('$.Fred','data')
+                sender.send_msg(msg)
+
+                (r,w,x) = select.select([listener],[],[],0)
+                assert r == [listener]
+
+This simply checks if there is a message in the Ksock's message list, waiting
+to be "popped" with NEXTMSG.
+
+Secondly, ``write``, SEND and DISCARD interact in what is hoped to be a
+sensible manner. Specifically:
+
+* When SEND (i.e., the SEND ioctl) is called, KBUS can either:
+
+  1. Succeed in sending the message. The Ksock is now ready for ``write`` to
+     be called on it again.
+  2. Failed in sending the message (possibly, if the message was a Request,
+     with EADDRNOTAVAIL, indicating that there is no Replier for that
+     Request). The Ksock is now ready for ``write`` to be called on it again.
+  3. If the message was marked ALL_OR_WAIT, then it may fail with EAGAIN.
+     In this case, the Ksock is still in sending state, and an attempt to
+     call ``write`` will fail (with EALREADY). The caller can either use
+     DISCARD to discard the message, or use select/poll to wait for the
+     message to finish sending.
+
+Thus "select" for the write case checks whether it is allowed to call
+"write" - for instance::
+
+        with Ksock(0,'rw') as sender:
+            write_list = [sender]
+            with Ksock(0,'r') as listener1:
+                write_list= [sender,listener1]
+                read_list = [listener1]
+
+                (r,w,x) = select.select(read_list,write_list,[],0)
+                assert r == []
+                assert w == [sender]
+                assert x == []
+
+                with Ksock(0,'rw') as listener2:
+                    write_list.append(listener2)
+                    read_list.append(listener2)
+
+                    (r,w,x) = select.select(read_list,write_list,[],0)
+                    assert r == []
+                    assert len(w) == 2
+                    assert sender in w
+                    assert listener2 in w
+                    assert x == []
+
+Receiving messages once only
+----------------------------
+In normal usage (and by default), if a Ksock binds to a message name multiple
+times, it will receive multiple copies of a message. This can happen:
+
+* explicitly (the Ksock deliberately and explicitly binds to the same name
+  more than once, seeking this effect).
+* as a result of binding to a message name and a wildcard that includes the
+  same name, or two overlapping wildcards.
+* as a result of binding as Replier to a name, and also as Listener to the
+  same name (possibly via a wildcard). In this case, multiple copies will
+  only be received when a Request with that name is made.
+
+Several programmers have complained that the last case, in particular, is very
+inconvenient, and thus the "receive a message once only" facility has been
+added.
+
+Using the MSGONCEONLY IOCTL, it is possible to tell a Ksock that only one copy
+of a particular message should be received, even if multiple are "due". In the
+case of the Replier/Listener copies, it will always be the message to which
+the Replier should reply (the one with WANT_YOU_TO_REPLY set) that will be
+received.
+
+Please use this facility with care, and only if you really need it.
+
+IOCTLS
+------
+The KBUS ioctls are defined (with explanatory comments) in the kernel module
+header file (``kbus_defns.h``). They are:
+
+:RESET:         Currently has no effect
+:BIND:          Bind to a particular message name (possibly as replier).
+:UNBIND:        Unbind from a binding - must match exactly.
+:KSOCKID:       Determine the Ksock id of the Ksock used
+:REPLIER:       Determine who is bound as replier to a particular message
+                name. This returns 0 or the Ksock id of the replier.
+:NEXTMSG:       Pop the next message from the Ksock's message queue, ready
+                for reading (with ``read``), and return its length (in bytes).
+                If there is no next message, return a length of 0.
+                The length is always the length of an "entire" message (see
+                `Message implementation`_).
+:LENLEFT:       Determine how many bytes of the message currently being read
+                are still to read.
+:SEND:          Send the current outstanding message for this Ksock (i.e., the
+                bytes written to the Ksock since the last SEND or DISCARD).
+                Return the message id of the message, and maybe other status
+                information.
+:DISCARD:       Discard (throw away) the current outstanding message for this
+                Ksock (i.e., any bytes written to the Ksock since the last
+                SEND or DISCARD).
+:LASTSENT:      Determine the message id of the last message SENT on this
+                Ksock.
+:MAXMSGS:       Set the maximum length of the (read) message queue for this
+                KSOCK, and return the actual length that is set. An attempt
+                to set the queue length to 0 will just return the current
+                queue length.
+:NUMMSGS:       Determine how many messages are outstanding in this Ksock's
+                read queue.
+:UNREPLIEDTO:   Determines how many Requests (marked "WANT_YOU_TO_REPLY")
+                this Ksock still needs to reply to. This is primarily a
+                development tool.
+:MSGONLYONCE:   Determines whether only one copy of a message will be
+                received, even if the message name is bound to multiple times.
+                May also be used to query the current state.
+:VERBOSE:       Determines whether verbose kernel messages should be output or
+                not. Affects the *device* (the entire Ksock).
+                May also be used to query the current state.
+:NEWDEVICE:     Requests another KBUS device (``/dev/kbus/<n>``). The next
+                KBUS device number (up to a maximum of 255) will be allocated.
+                Returns the new device number.
+:REPORTREPLIERBINDS: Request synthetic messages announcing Replier BIND/UNBIND
+                events. These are messages named "$.KBUS.ReplierBindEvent",
+                and are the only predefined messages with data.
+                Both Python and C bindings provide a useful function to
+                extract the ``is_bind``, ``binder`` and ``name`` values from
+                the data.
+
+/proc/kbus/bindings
+-------------------
+``/proc/kbus/bindings`` is a debugging aid for reporting the listener id,
+exclusive flag and message name for each binding, for each kbus device.
+
+An example might be::
+
+   $ cat /proc/kbus/bindings
+   # <device> is bound to <Ksock-ID> in <process-PID> as <Replier|Listener> for <message-name>
+     1:        1    22158  R  $.Sensors.*
+     1:        2    22158  R  $.Sensors.Kitchen.Temperature
+     1:        3    22158  L  $.Sensors.*
+    13:        4    22159  L  $.Jim.*
+    13:        1    22159  R  $.Fred
+    13:        1    22159  L  $.Jim
+    13:       14    23021  L  $.Jim.*
+
+This describes two KBUS devices (``/dev/kbus1`` and ``/dev/kbus13``).
+
+The first has bindings on Ksock ids 1, 2 and 3, for the given message names. The
+"R" indicates a replier binding, the "L" indicates a listener (non-replier)
+binding.
+
+The second has bindings on Ksock ids 4, 1 and 14. The order of the bindings
+reported is *not* particularly significant.
+
+Note that there is no communication between the two devices, so Ksock id 1 on
+device 1 is not related to (and has no commonality with) Ksock id 1 on device
+13.
+
+/proc/kbus/stats
+----------------
+``/proc/kbus/stats`` is a debugging aid for reporting various statistics about
+the KBUS devices and the Ksocks open on them.
+
+An example might be::
+
+  $ cat /proc/kbus/stats
+  dev  0: next file 5 next msg 8 unsent unbindings 0
+          ksock 4 last msg 0:7 queue 1 of 100
+              read byte 0 of 0, wrote byte 52 (max 60), sending
+              outstanding requests 0 (size 16, max 0), unsent replies 0 (max 0)
+          ksock 3 last msg 0:5 queue 0 of 1
+              read byte 0 of 0, wrote byte 0 (max 0), not sending
+              outstanding requests 1 (size 16, max 0), unsent replies 0 (max 0)
+
+or::
+
+  $ cat /proc/kbus/stats
+  dev  0: next file 4 next msg 101 unsent unbindings 0
+          ksock 3 last msg 0:0 queue 100 of 100
+                read byte 0 of 0, wrote byte 0 (max 0), not sending
+                outstanding requests 0 (size 16, max 0), unsent replies 0 (max 0)
+          ksock 2 last msg 0:100 queue 0 of 100
+                read byte 0 of 0, wrote byte 0 (max 0), not sending
+                outstanding requests 100 (size 102, max 92), unsent replies 0 (max 0)
+
+
+Error numbers
+-------------
+The following error numbers get special use. In Python, they are all returned
+as values inside the IOError exception.
+
+    Since we're trying to fit into the normal Un*x convention that negative
+    values are error numbers, and since Un*x defines many of these for us,
+    it is natural to make use of the relevant definitions. However, this also
+    means that we are often using them in an unnatural sense. I've tried to
+    make the error numbers used bear at least a vague relationship to their
+    (mis)use in KBUS.
+
+:EADDRINUSE:    On attempting to bind a message name as replier: There is
+                already a replier bound for this message
+:EADDRNOTAVAIL: On attempting to send a Request message: There is no replier
+                bound for this message's name.
+
+                On attempting to send a Reply message: The sender of the
+                original request (i.e., the Ksock mentioned as the ``to``
+                in the Reply) is no longer connected.
+:EALREADY:      On attempting to write to a Ksock, when a previous send has
+                returned EAGAIN. Either DISCARD the message, or use
+                select/poll to wait for the send to complete, and write to be
+                allowed.
+:EBADMSG:       On attempting to bind, unbind or send a message: The message
+                name is not valid. On sending, this can also be because the
+                message name is a wildcard.
+:EBUSY:         On attempting to send, then:
+
+                1. For a request, the replier's message queue is full.
+                2. For any message, with ALL_OR_FAIL set, one of the
+                   targetted listener/replier queues was full.
+
+:ECONNREFUSED:  On attempting to send a Reply, the intended recipient (the
+                notional original sender of the Request) is not expecting
+                a Reply with that message id in its 'in_reply_to'. Or, in
+                other words, this appears to be an attempt to reply to the
+                wrong message id or the wrong Ksock.
+:EINVAL:        Something went wrong (generic error).
+:EMSGSIZE:      On attempting to write a message: Data was written after
+                the end of the message (i.e., after the final end guard
+                of the message).
+:ENAMETOOLONG:  On attempting to bind, unbind or send a message: The message
+                name is too long.
+:ENOENT:        On attempting to open a Ksock: There is no such device
+                (normally because one has tried to open, for instance,
+                '/dev/kbus9' when there are only 3 KBUS devices).
+:ENOLCK:        On attempting to send a Request, when there is not enough room
+                in the sender's message queue to guarantee that it can
+                receive a reply for every Request already sent, *plus* this
+                one. If there are oustanding messages in the sender's message
+                queue, then the solution is to read some of them. Otherwise,
+                the sender will have to wait until one of the Repliers
+                replies to a previous Request (or goes away and KBUS replies
+                for it).
+
+                When this error is received, the send has failed (just as if
+                the message was invalid). The sender is not left in "sending"
+                state, nor has the message been assigned a message id.
+
+                Note that this is *not* EAGAIN, since we do not want to block
+                the sender (in the SEND) if it is up to the sender to perform
+                a read to sort things out.
+
+:ENOMSG:        On attempting to send, when there is no message waiting to be
+                sent (either because there has been no write since the last
+                send, or because the message being written has been
+                discarded).
+:EPIPE:         On attempting to send 'to' a specific replier, the replier
+                with that id is no longer bound to the given message's name.
+
+:EFAULT:    Memory allocation, copy from user space, or other such failed. This
+            is normally very bad, it should not happen, UNLESS it is the result
+            of calling an ioctl, when it indicates that the ioctl argument
+            cannot be accessed.
+
+:ENOMEM:    Memory allocation failed (return NULL). This is normally very bad,
+            it should not happen.
+
+:EAGAIN:    On attempting to send, the message being sent had ALL_OR_WAIT set,
+            and one of the targetted listener/replier queues was full.
+
+            On attempting to unbind when Replier Bind Events have been
+            requested, one or more of the KSocks bound to receive
+            "$.KBUS.ReplierBindEvent" messages has a full message queue,
+            and thus cannot receive the unbind event. The unbind has not been
+            done.
+
+In the ``utils`` directory of the KBUS sources, there is a script called
+``errno.py`` which takes an ``errno`` integer or name and prints out both the
+"normal" meaning of that error number, and also (if there is one) the KBUS use
+of it. For instance::
+
+    $ errno.py 1
+    Error 1 (0x1) is EPERM: Operation not permitted
+    $
+    $ errno.py EPIPE
+    EPIPE is error 32 (0x20): Broken pipe
+
+    KBUS:
+    On attempting to send 'to' a specific replier, the replier with that id
+    is no longer bound to the given message's name.
+
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/11] KBUS external header file.
  2011-03-18 17:21 ` [PATCH 01/11] Documentation for KBUS Tony Ibbs
@ 2011-03-18 17:21   ` Tony Ibbs
  2011-03-18 17:21     ` [PATCH 03/11] KBUS internal " Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

This defines the message datastructures, the IOCTLs, etc.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 include/linux/kbus_defns.h |  613 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 613 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/kbus_defns.h

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
new file mode 100644
index 0000000..d43c498
--- /dev/null
+++ b/include/linux/kbus_defns.h
@@ -0,0 +1,613 @@
+/* Kbus kernel module external headers
+ *
+ * This file provides the definitions (datastructures and ioctls) needed to
+ * communicate with the KBUS character device driver.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ *   Kynesim, Cambridge UK
+ *   Tony Ibbs <tibs@tonyibbs.co.uk>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above.  If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL.  If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#ifndef _kbus_defns
+#define _kbus_defns
+
+#if !__KERNEL__ && defined(__cplusplus)
+extern "C" {
+#endif
+
+#if __KERNEL__
+#include <linux/kernel.h>
+#include <linux/ioctl.h>
+#else
+#include <stdint.h>
+#include <sys/ioctl.h>
+#endif
+
+/*
+ * A message id is made up of two fields.
+ *
+ * If the network id is 0, then it is up to us (KBUS) to assign the
+ * serial number. In other words, this is a local message.
+ *
+ * If the network id is non-zero, then this message is presumed to
+ * have originated from another "network", and we preserve both the
+ * network id and the serial number.
+ *
+ * The message id {0,0} is special and reserved (for use by KBUS).
+ */
+struct kbus_msg_id {
+	__u32 network_id;
+	__u32 serial_num;
+};
+
+/*
+ * kbus_orig_from is used for the "originally from" and "finally to" ids
+ * in the message header. These in turn are used when messages are
+ * being sent between KBUS systems (via KBUS "Limpets"). KBUS the kernel
+ * module transmits them, unaltered, but does not use them (although
+ * debug messages may report them).
+ *
+ * An "originally from" or "finally to" id is made up of two fields, the
+ * network id (which indicates the Limpet, if any, that originally gated the
+ * message), and a local id, which is the Ksock id of the original sender
+ * of the message, on its local KBUS.
+ *
+ * If the network id is 0, then the "originally from" id is not being used.
+ *
+ * Limpets and these fields are discussed in more detail in the userspace
+ * KBUS documentation - see http://kbus-messaging.org/ for pointers to
+ * more information.
+ */
+struct kbus_orig_from {
+	__u32 network_id;
+	__u32 local_id;
+};
+
+/* When the user asks to bind a message name to an interface, they use: */
+struct kbus_bind_request {
+	__u32 is_replier;	/* are we a replier? */
+	__u32 name_len;
+	char *name;
+};
+
+/* When the user requests the id of the replier to a message, they use: */
+struct kbus_bind_query {
+	__u32 return_id;
+	__u32 name_len;
+	char *name;
+};
+
+/* When the user writes/reads a message, they use: */
+struct kbus_message_header {
+	/*
+	 * The guards
+	 * ----------
+	 *
+	 * * 'start_guard' is notionally "Kbus", and 'end_guard' (the 32 bit
+	 *   word after the rest of the message datastructure) is notionally
+	 *   "subK". Obviously that depends on how one looks at the 32-bit
+	 *   word. Every message datastructure shall start with a start guard
+	 *   and end with an end guard.
+	 *
+	 * These provide some help in checking that a message is well formed,
+	 * and in particular the end guard helps to check for broken length
+	 * fields.
+	 *
+	 * - 'id' identifies this particular message.
+	 *
+	 *   When a user writes a new message, they should set this to {0,0}.
+	 *   KBUS will then set a new message id for the message.
+	 *
+	 *   When a user reads a message, this will have been set by KBUS.
+	 *
+	 *   When a user replies to a message, they should copy this value
+	 *   into the 'in_reply_to' field, so that the recipient will know
+	 *   what message this was a reply to.
+	 *
+	 * - 'in_reply_to' identifies the message this is a reply to.
+	 *
+	 *   This shall be set to {0,0} unless this message *is* a reply to a
+	 *   previous message. In other words, if this value is non-0, then
+	 *   the message *is* a reply.
+	 *
+	 * - 'to' is who the message is to be sent to.
+	 *
+	 *   When a user writes a new message, this should normally be set
+	 *   to {0,0}, meaning "anyone listening" (but see below if "state"
+	 *   is being maintained).
+	 *
+	 *   When replying to a message, it shall be set to the 'from' value
+	 *   of the orginal message.
+	 *
+	 *   When constructing a request message (a message wanting a reply),
+	 *   the user can set it to a specific replier id, to produce a stateful
+	 *   request. This is normally done by copying the 'from' of a previous
+	 *   Reply from the appropriate replier. When such a message is sent,
+	 *   if the replier bound (at that time) does not have that specific
+	 *   id, then the send will fail.
+	 *
+	 *   Note that if 'to' is set, then 'orig_from' should also be set.
+	 *
+	 * - 'from' indicates who sent the message.
+	 *
+	 *   When a user is writing a new message, they should set this
+	 *   to {0,0}.
+	 *
+	 *   When a user is reading a message, this will have been set
+	 *   by KBUS.
+	 *
+	 *   When a user replies to a message, the reply should have its
+	 *   'to' set to the original messages 'from', and its 'from' set
+	 *   to {0,0} (see the "hmm" caveat under 'to' above, though).
+	 *
+	 * - 'orig_from' and 'final_to' are used when Limpets are mediating
+	 *   KBUS messages between KBUS devices (possibly on different
+	 *   machines). See the description by the datastructure definition
+	 *   above. The KBUS kernel preserves and propagates their values,
+	 *   but does not alter or use them.
+	 *
+	 * - 'extra' is currently unused, and KBUS will set it to zero.
+	 *   Future versions of KBUS may treat it differently.
+	 *
+	 * - 'flags' indicates the type of message.
+	 *
+	 *   When a user writes a message, this can be used to indicate
+	 *   that:
+	 *
+	 *   * the message is URGENT
+	 *   * a reply is wanted
+	 *
+	 *   When a user reads a message, this indicates if:
+	 *
+	 *   * the message is URGENT
+	 *   * a reply is wanted
+	 *
+	 *   When a user writes a reply, this field should be set to 0.
+	 *
+	 *   The top half of the 'flags' is not touched by KBUS, and may
+	 *   be used for any purpose the user wishes.
+	 *
+	 * - 'name_len' is the length of the message name in bytes.
+	 *
+	 *   This must be non-zero.
+	 *
+	 * - 'data_len' is the length of the message data in bytes. It may be
+	 *   zero if there is no data.
+	 *
+	 * - 'name' is a pointer to the message name. This should be null
+	 *   terminated, as is the normal case for C strings.
+	 *
+	 *   NB: If this is zero, then the name will be present, but after
+	 *   the end of this datastructure, and padded out to a multiple of
+	 *   four bytes (see kbus_entire_message). When doing this padding,
+	 *   remember to allow for the terminating null byte. If this field is
+	 *   zero, then 'data' shall also be zero.
+	 *
+	 * - 'data' is a pointer to the data. If there is no data (if
+	 *   'data_len' is zero), then this shall also be zero. The data is
+	 *   not touched by KBUS, and may include any values.
+	 *
+	 *   NB: If this is zero, then the data will occur immediately
+	 *   after the message name, padded out to a multiple of four bytes.
+	 *   See the note for 'name' above.
+	 *
+	 */
+	__u32 start_guard;
+	struct kbus_msg_id id;	/* Unique to this message */
+	struct kbus_msg_id in_reply_to;	/* Which message this is a reply to */
+	__u32 to;		/* 0 (empty) or a replier id */
+	__u32 from;		/* 0 (KBUS) or the sender's id */
+	struct kbus_orig_from orig_from;/* Cross-network linkage */
+	struct kbus_orig_from final_to;	/* Cross-network linkage */
+	__u32 extra;	/* ignored field - future proofing */
+	__u32 flags;	/* Message type/flags */
+	__u32 name_len;	/* Message name's length, in bytes */
+	__u32 data_len;	/* Message length, also in bytes */
+	char *name;
+	void *data;
+	__u32 end_guard;
+};
+
+#define KBUS_MSG_START_GUARD	0x7375624B
+#define KBUS_MSG_END_GUARD	0x4B627573
+
+/*
+ * When a message is returned by 'read', it is actually returned using the
+ * following datastructure, in which:
+ *
+ * - 'header.name' will point to 'rest[0]'
+ * - 'header.data' will point to 'rest[(header.name_len+3)/4]'
+ *
+ * followed by the name (padded to 4 bytes, remembering to allow for the
+ * terminating null byte), followed by the data (padded to 4 bytes) followed by
+ * (another) end_guard.
+ */
+struct kbus_entire_message {
+	struct kbus_message_header header;
+	__u32 rest[];
+};
+
+/*
+ * We limit a message name to at most 1000 characters (some limit seems
+ * sensible, after all)
+ */
+#define KBUS_MAX_NAME_LEN	1000
+
+/*
+ * The length (in bytes) of the name after padding, allowing for a terminating
+ * null byte.
+ */
+#define KBUS_PADDED_NAME_LEN(name_len)   (4 * ((name_len + 1 + 3) / 4))
+
+/*
+ * The length (in bytes) of the data after padding
+ */
+#define KBUS_PADDED_DATA_LEN(data_len)   (4 * ((data_len + 3) / 4))
+
+/*
+ * Given name_len (in bytes) and data_len (in bytes), return the
+ * length of the appropriate kbus_entire_message_struct, in bytes
+ *
+ * Note that we're allowing for a zero byte after the end of the message name.
+ *
+ * Remember that "sizeof" doesn't count the 'rest' field in our message
+ * structure.
+ */
+#define KBUS_ENTIRE_MSG_LEN(name_len, data_len)    \
+	(sizeof(struct kbus_entire_message) + \
+	 KBUS_PADDED_NAME_LEN(name_len) + \
+	 KBUS_PADDED_DATA_LEN(data_len) + 4)
+
+/*
+ * The message name starts at entire->rest[0].
+ * The message data starts after the message name - given the message
+ * name's length (in bytes), that is at index:
+ */
+#define KBUS_ENTIRE_MSG_DATA_INDEX(name_len)     ((name_len+1+3)/4)
+/*
+ * Given the message name length (in bytes) and the message data length (also
+ * in bytes), the index of the entire message end guard is thus:
+ */
+#define KBUS_ENTIRE_MSG_END_GUARD_INDEX(name_len, data_len)  \
+	((name_len+1+3)/4 + (data_len+3)/4)
+
+/*
+ * Find a pointer to the message's name.
+ *
+ * It's either the given name pointer, or just after the header (if the pointer
+ * is NULL)
+ */
+static inline char *kbus_msg_name_ptr(const struct kbus_message_header
+				      *hdr)
+{
+	if (hdr->name) {
+		return hdr->name;
+	} else {
+		struct kbus_entire_message *entire;
+		entire = (struct kbus_entire_message *)hdr;
+		return (char *)&entire->rest[0];
+	}
+}
+
+/*
+ * Find a pointer to the message's data.
+ *
+ * It's either the given data pointer, or just after the name (if the pointer
+ * is NULL)
+ */
+static inline void *kbus_msg_data_ptr(const struct kbus_message_header
+				      *hdr)
+{
+	if (hdr->data) {
+		return hdr->data;
+	} else {
+		struct kbus_entire_message *entire;
+		__u32 data_idx;
+
+		entire = (struct kbus_entire_message *)hdr;
+		data_idx = KBUS_ENTIRE_MSG_DATA_INDEX(hdr->name_len);
+		return (void *)&entire->rest[data_idx];
+	}
+}
+
+/*
+ * Find a pointer to the message's (second/final) end guard.
+ */
+static inline __u32 *kbus_msg_end_ptr(struct kbus_entire_message
+					 *entire)
+{
+	__u32 end_guard_idx =
+		KBUS_ENTIRE_MSG_END_GUARD_INDEX(entire->header.name_len,
+						entire->header.data_len);
+	return (__u32 *) &entire->rest[end_guard_idx];
+}
+
+/*
+ * Things KBUS changes in a message
+ * --------------------------------
+ * In general, KBUS leaves the content of a message alone. However, it does
+ * change:
+ *
+ * - the message id (if id.network_id is unset - it assigns a new serial
+ *   number unique to this message)
+ * - the from id (if from.network_id is unset - it sets the local_id to
+ *   indicate the Ksock this message was sent from)
+ * - the KBUS_BIT_WANT_YOU_TO_REPLY bit in the flags (set or cleared
+ *   as appropriate)
+ * - the SYNTHETIC bit, which KBUS will always unset in a user message
+ */
+
+/*
+ * Flags for the message 'flags' word
+ * ----------------------------------
+ * The KBUS_BIT_WANT_A_REPLY bit is set by the sender to indicate that a
+ * reply is wanted. This makes the message into a request.
+ *
+ *     Note that setting the WANT_A_REPLY bit (i.e., a request) and
+ *     setting 'in_reply_to' (i.e., a reply) is bound to lead to
+ *     confusion, and the results are undefined (i.e., don't do it).
+ *
+ * The KBUS_BIT_WANT_YOU_TO_REPLY bit is set by KBUS on a particular message
+ * to indicate that the particular recipient is responsible for replying
+ * to (this instance of the) message. Otherwise, KBUS clears it.
+ *
+ * The KBUS_BIT_SYNTHETIC bit is set by KBUS when it generates a synthetic
+ * message (an exception, if you will), for instance when a replier has
+ * gone away and therefore a reply will never be generated for a request
+ * that has already been queued.
+ *
+ *     Note that KBUS does not check that a sender has not set this
+ *     on a message, but doing so may lead to confusion.
+ *
+ * The KBUS_BIT_URGENT bit is set by the sender if this message is to be
+ * treated as urgent - i.e., it should be added to the *front* of the
+ * recipient's message queue, not the back.
+ *
+ * Send flags
+ * ==========
+ * There are two "send" flags, KBUS_BIT_ALL_OR_WAIT and KBUS_BIT_ALL_OR_FAIL.
+ * Either one may be set, or both may be unset.
+ *
+ *    If both bits are set, the message will be rejected as invalid.
+ *
+ *    Both flags are ignored in reply messages (i.e., messages with the
+ *    'in_reply_to' field set).
+ *
+ * If both are unset, then a send will behave in the default manner. That is,
+ * the message will be added to a listener's queue if there is room but
+ * otherwise the listener will (silently) not receive the message.
+ *
+ *     (Obviously, if the listener is a replier, and the message is a request,
+ *     then a KBUS message will be synthesised in the normal manner when a
+ *     request is lost.)
+ *
+ * If the KBUS_BIT_ALL_OR_WAIT bit is set, then a send should block until
+ * all recipients can be sent the message. Specifically, before the message is
+ * sent, all recipients must have room on their message queues for this
+ * message, and if they do not, the send will block until there is room for the
+ * message on all the queues.
+ *
+ * If the KBUS_BIT_ALL_OR_FAIL bit is set, then a send should fail if all
+ * recipients cannot be sent the message. Specifically, before the message is
+ * sent, all recipients must have room on their message queues for this
+ * message, and if they do not, the send will fail.
+ */
+
+/*
+ * When a $.KBUS.ReplierBindEvent message is constructed, we use the
+ * following to encapsulate its data.
+ *
+ * This indicates whether it is a bind or unbind event, who is doing the
+ * bind or unbind, and for what message name. The message name is padded
+ * out to a multiple of four bytes, allowing for a terminating null byte,
+ * but the name length is the length without said padding (so, in C terms,
+ * strlen(name)).
+ *
+ * As for the message header data structure, the actual data "goes off the end"
+ * of the datastructure.
+ */
+struct kbus_replier_bind_event_data {
+	__u32 is_bind;	/* 1=bind, 0=unbind */
+	__u32 binder;	/* Ksock id of binder */
+	__u32 name_len;	/* Length of name */
+	__u32 rest[];	/* Message name */
+};
+
+#if !__KERNEL__
+#define BIT(num)                 (((unsigned)1) << (num))
+#endif
+
+#define	KBUS_BIT_WANT_A_REPLY		BIT(0)
+#define KBUS_BIT_WANT_YOU_TO_REPLY	BIT(1)
+#define KBUS_BIT_SYNTHETIC		BIT(2)
+#define KBUS_BIT_URGENT			BIT(3)
+
+#define KBUS_BIT_ALL_OR_WAIT		BIT(8)
+#define KBUS_BIT_ALL_OR_FAIL		BIT(9)
+
+/*
+ * Standard message names
+ * ======================
+ * KBUS itself has some predefined message names.
+ *
+ * Synthetic Replies with no data
+ * ------------------------------
+ * These are sent to the original Sender of a Request when KBUS knows that the
+ * Replier is not going to Reply. In all cases, you can identify which message
+ * they concern by looking at the "in_reply_to" field:
+ *
+ * * Replier.GoneAway - the Replier has gone away before reading the Request.
+ * * Replier.Ignored - the Replier has gone away after reading a Request, but
+ *   before replying to it.
+ * * Replier.Unbound - the Replier has unbound (as Replier) from the message
+ *   name, and is thus not going to reply to this Request in its unread message
+ *   queue.
+ * * Replier.Disappeared - the Replier has disappeared when an attempt is made
+ *   to send a Request whilst polling (i.e., after EAGAIN was returned from an
+ *   earlier attempt to send a message). This typically means that the Ksock
+ *   bound as Replier closed.
+ * * ErrorSending - an unexpected error occurred when trying to send a Request
+ *   to its Replier whilst polling.
+ */
+#define KBUS_MSG_NAME_REPLIER_GONEAWAY		"$.KBUS.Replier.GoneAway"
+#define KBUS_MSG_NAME_REPLIER_IGNORED		"$.KBUS.Replier.Ignored"
+#define KBUS_MSG_NAME_REPLIER_UNBOUND		"$.KBUS.Replier.Unbound"
+#define KBUS_MSG_NAME_REPLIER_DISAPPEARED	"$.KBUS.Replier.Disappeared"
+#define KBUS_MSG_NAME_ERROR_SENDING		"$.KBUS.ErrorSending"
+
+#define KBUS_IOC_MAGIC	'k'	/* 0x6b - which seems fair enough for now */
+/*
+ * RESET: reserved for future use
+ */
+#define KBUS_IOC_RESET	    _IO(KBUS_IOC_MAGIC,  1)
+/*
+ * BIND - bind a Ksock to a message name
+ * arg: struct kbus_bind_request, indicating what to bind to
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_BIND	   _IOW(KBUS_IOC_MAGIC,  2, char *)
+/*
+ * UNBIND - unbind a Ksock from a message id
+ * arg: struct kbus_bind_request, indicating what to unbind from
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_UNBIND	   _IOW(KBUS_IOC_MAGIC,  3, char *)
+/*
+ * KSOCKID - determine a Ksock's Ksock id
+ *
+ * The network_id for the current Ksock is, by definition, 0, so we don't need
+ * to return it.
+ *
+ * arg (out): __u32, indicating this Ksock's local_id
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_KSOCKID   _IOR(KBUS_IOC_MAGIC,  4, char *)
+/*
+ * REPLIER - determine the Ksock id of the replier for a message name
+ * arg: struct kbus_bind_query
+ *
+ *    - on input, specify the message name to ask about.
+ *    - on output, KBUS fills in the relevant Ksock id in the return_value,
+ *      or 0 if there is no bound replier
+ *
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_REPLIER  _IOWR(KBUS_IOC_MAGIC,  5, char *)
+/*
+ * NEXTMSG - pop the next message from the read queue
+ * arg (out): __u32, number of bytes in the next message, 0 if there is no
+ *            next message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NEXTMSG   _IOR(KBUS_IOC_MAGIC,  6, char *)
+/*
+ * LENLEFT - determine how many bytes are left to read of the current message
+ * arg (out): __u32, number of bytes left, 0 if there is no current read
+ *            message
+ * retval: 1 if there was a message, 0 if there wasn't, negative for failure
+ */
+#define KBUS_IOC_LENLEFT   _IOR(KBUS_IOC_MAGIC,  7, char *)
+/*
+ * SEND - send the current message
+ * arg (out): struct kbus_msg_id, the message id of the sent message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_SEND	   _IOR(KBUS_IOC_MAGIC,  8, char *)
+/*
+ * DISCARD - discard the message currently being written (if any)
+ * arg: none
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_DISCARD    _IO(KBUS_IOC_MAGIC,  9)
+/*
+ * LASTSENT - determine the message id of the last message SENT
+ * arg (out): struct kbus_msg_id, {0,0} if there was no last message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_LASTSENT  _IOR(KBUS_IOC_MAGIC, 10, char *)
+/*
+ * MAXMSGS - set the maximum number of messages on a Ksock read queue
+ * arg (in): __u32, the requested length of the read queue, or 0 to just
+ *           request how many there are
+ * arg (out): __u32, the length of the read queue after this call has
+ *            succeeded
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_MAXMSGS  _IOWR(KBUS_IOC_MAGIC, 11, char *)
+/*
+ * NUMMSGS - determine how many messages are in the read queue for this Ksock
+ * arg (out): __u32, the number of messages in the read queue.
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NUMMSGS   _IOR(KBUS_IOC_MAGIC, 12, char *)
+/*
+ * UNREPLIEDTO - determine the number of requests (marked "WANT_YOU_TO_REPLY")
+ * which we still need to reply to.
+ * arg(out): __u32, said number
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_UNREPLIEDTO _IOR(KBUS_IOC_MAGIC, 13, char *)
+
+/*
+ * IOCTL 14 is not used, because it is introduced in the next revision,
+ * (obviously, in real history this was done in a different order) and
+ * I don't want to alter the number for VERBOSE.
+ */
+
+/*
+ * VERBOSE - should KBUS output verbose "printk" messages (for this device)?
+ *
+ * This IOCTL tells a Ksock whether it should output debugging messages. It is
+ * only effective if the kernel module has been built with the VERBOSE_DEBUGGING
+ * flag set.
+ *
+ * arg(in): __u32, 1 to change to "verbose", 0 to change to "quiet",
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
+ */
+#define KBUS_IOC_VERBOSE  _IOWR(KBUS_IOC_MAGIC, 15, char *)
+
+/* If adding another IOCTL, remember to increment the next number! */
+#define KBUS_IOC_MAXNR	15
+
+#if !__KERNEL__ && defined(__cplusplus)
+}
+#endif
+
+#endif /* _kbus_defns */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/11] KBUS internal header file
  2011-03-18 17:21   ` [PATCH 02/11] KBUS external header file Tony Ibbs
@ 2011-03-18 17:21     ` Tony Ibbs
  2011-03-18 17:21       ` [PATCH 04/11] KBUS main source file, basic device support only Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

Various internal datastructures, and communication between source
files.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 ipc/kbus_internal.h |  626 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 626 insertions(+), 0 deletions(-)
 create mode 100644 ipc/kbus_internal.h

diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
new file mode 100644
index 0000000..2d9e737
--- /dev/null
+++ b/ipc/kbus_internal.h
@@ -0,0 +1,626 @@
+/* KBUS kernel module - internal definitions
+ *
+ * This is a character device driver, providing the messaging support
+ * for KBUS.
+ *
+ * This header contains the definitions used internally by kbus.c.
+ * At the moment nothing else is expected to include this file.
+ *
+ * KBUS clients should include (at least) kbus_defns.h.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ *   Kynesim, Cambridge UK
+ *   Tony Ibbs <tibs@tonyibbs.co.uk>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above.  If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL.  If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#ifndef _kbus_internal
+#define _kbus_internal
+
+/*
+ * KBUS can support multiple devices, as /dev/kbus<N>. These all have
+ * the same major device number, and map to differing minor device
+ * numbers. <N> will also be the minor device number, but don't rely
+ * on that for anything.
+ *
+ * When KBUS starts up, it will always setup a single device (/dev/kbus0),
+ * but it can be asked to setup more - for instance:
+ *
+ *     # insmod kbus.ko kbus_num_devices=5
+ *
+ * There is also an IOCTL to allow user-space to request a new device as
+ * necessary. The hot plugging mechanisms should cause the device to appear "as
+ * if by magic".
+ *
+ *     (This last means that we *could* default to setting up zero devices
+ *     at module startup, and leave the user to ask for the first one, but
+ *     that seems rather cruel.)
+ *
+ * We need to set a maximum number of KBUS devices (corresponding to a limit on
+ * minor device numbers). The obvious limit (corresponding to what we'd have
+ * got if we used the deprecated "register_chrdev" to setup our device) is 256,
+ * so we'll go with that.
+ */
+#define KBUS_MIN_NUM_DEVICES		  1
+
+#ifdef CONFIG_KBUS_MAX_NUM_DEVICES
+#define KBUS_MAX_NUM_DEVICES		CONFIG_KBUS_MAX_NUM_DEVICES
+#else
+#define KBUS_MAX_NUM_DEVICES		256
+#endif
+
+#ifndef CONFIG_KBUS_DEF_NUM_DEVICES
+#define CONFIG_KBUS_DEF_NUM_DEVICES	1
+#endif
+
+/*
+ * Our initial array sizes could arguably be made configurable
+ * for tuning, if we discover this is useful
+ */
+#define KBUS_INIT_MSG_ID_MEMSIZE	16
+#define KBUS_INIT_LISTENER_ARRAY_SIZE	8
+
+/*
+ * Setting CONFIG_KBUS_DEBUG will cause the Makefile
+ * to define DEBUG for us
+ */
+#ifdef DEBUG
+#define kbus_maybe_dbg(kbus_dev, format, args...) do { \
+	if ((kbus_dev)->verbose) \
+		(void) dev_dbg((kbus_dev)->dev, format, ## args); \
+} while (0)
+#else
+#define kbus_maybe_dbg(kbus_dev, format, args...) ((void)0)
+#endif
+
+/*
+ * This is really only directly useful if CONFIG_KBUS_DEBUG is on
+ */
+#ifdef CONFIG_KBUS_DEBUG_DEFAULT_VERBOSE
+#define KBUS_DEFAULT_VERBOSE_SETTING true
+#else
+#define KBUS_DEFAULT_VERBOSE_SETTING false
+#endif
+
+/* ========================================================================= */
+
+/* We need a way of remembering message bindings */
+struct kbus_message_binding {
+	struct list_head list;
+	struct kbus_private_data *bound_to;	/* who we're bound to */
+	u32 bound_to_id;	/* but the id is often useful */
+	u32 is_replier;		/* bound as a replier */
+	u32 name_len;
+	char *name;		/* the message name */
+};
+
+/*
+ * For both keeping track of requests sent (to which we still want replies)
+ * and replies read (to which we haven't yet sent a reply), we need some
+ * means of remembering message ids. Since I'd rather not worry the rest of
+ * the code with how this is implemented (which is code for "I'll implement
+ * it very simply and worry about making it efficient/scalable later"), and
+ * since we always want to remember both the message ids and also how many
+ * there are, it seems sensible to bundle this up in its own datastructure.
+ */
+struct kbus_msg_id_mem {
+	u32 count;	/* Number of entries in use */
+	u32 size;	/* Actual size of the array */
+	u32 max_count;	/* Max 'count' we've had */
+	/*
+	 * An array is probably the worst way to store a list of message ids,
+	 * but it's *very simple*, and should work OK for a smallish number of
+	 * message ids. So it's a place to start...
+	 *
+	 * Note the array may have "unused" slots, signified by message id {0:0}
+	 */
+	struct kbus_msg_id *ids;
+};
+
+/* An item in the list of requests that a Ksock has not yet replied to */
+struct kbus_unreplied_item {
+	struct list_head list;
+	struct kbus_msg_id id;	/* the request's id */
+	u32 from;		/* the sender's id */
+	struct kbus_name_ptr *name_ref;	/* and its name... */
+	u32 name_len;
+};
+
+/*
+ * The parts of a message being written to KBUS (via kbus_write[_parts]),
+ * or read by the user (via kbus_read) are:
+ *
+ * * the user-space message header - as in 'struct kbus_message_header'
+ *
+ * from which we copy various items into our own internal message header.
+ *
+ * For a "pointy" message, that is all there is.
+ *
+ * For an "entire" message, this is then followed by:
+ *
+ * * the message name
+ * * padding to bring that up to a NULL terminator and then a 4-byte boundary.
+ *
+ * If the "entire" message has data, then this is followed by:
+ *
+ * * N data parts (all but the last of size PART_LEN)
+ * * padding to bring that up to a 4-byte boundary.
+ *
+ * and finally, whether there was data or not:
+ *
+ * * the final end guard.
+ *
+ * Remember that kbus_read always delivers an "entire" message.
+ */
+enum kbus_msg_parts {
+	KBUS_PART_HDR = 0,
+	KBUS_PART_NAME,
+	KBUS_PART_NPAD,
+	KBUS_PART_DATA,
+	KBUS_PART_DPAD,
+	KBUS_PART_FINAL_GUARD
+};
+/* N.B. New message parts require switch cases in kbus_msg_part_name()
+ * and kbus_write_parts().
+ */
+
+#define KBUS_NUM_PARTS (KBUS_PART_FINAL_GUARD+1)
+
+/*
+ * Replier typing.
+ * The higher the replier type, the more specific it is.
+ * We trust the binding mechanisms not to have created two replier
+ * bindings of the same type for the same name (so we shan't, for
+ * example, get '$.Fred.*' bound as replier twice).
+ */
+
+enum kbus_replier_type {
+	UNSET = 0,
+	WILD_STAR,
+	WILD_PERCENT,
+	SPECIFIC
+};
+
+/*
+ * A reference counting wrapper for message data
+ *
+ * If 'as_pages' is false, then the data is stored as a single kmalloc'd
+ * entity, pointed to by 'parts[0]'. In this case, 'num_parts' will be 1,
+ * and 'last_page_len' will be the size of the allocated data.
+ *
+ * If 'as_pages' is true, then the data is stored as 'num_parts' pages, each
+ * pointed to by 'parts[n]'. The last page should be treated as being size
+ * 'last_page_len' (even if the implementation is not enforcing this). All
+ * other pages are of size PART_LEN.
+ *
+ * In either case, 'lengths[n]' is a "fill counter" for how many bytes of data
+ * are actually being stored in page 'n'. Once the data is all in place, this
+ * should be equal to PART_LEN or 'last_page_len' as appropriate.
+ *
+ * 'refcount' is a stanard kernel reference count for the data - when it reaches
+ * 0, everything (the parts, the arrays and the datastructure) gets freed.
+ */
+struct kbus_data_ptr {
+	int as_pages;
+	unsigned num_parts;
+	unsigned long *parts;
+	unsigned *lengths;
+	unsigned last_page_len;
+	struct kref refcount;
+};
+
+/*
+ * A reference counting wrapper for message names
+ *
+ * RATIONALE:
+ *
+ * When a message name is copied from user space, we take our first copy.
+ *
+ * In order to send a message to a Ksock, we use kbus_push_message(), which
+ * takes a copy of the message for each recipient.
+ *
+ * We also take a copy of the message name for our list of messages that have
+ * been read but not replied to.
+ *
+ * It makes sense to copy the message header, because the contents thereof are
+ * changed according to the particular recipient.
+ *
+ * Copying the message data (if any) is handled by the kbus_data_ptr, above,
+ * which provides reference counting. This makes sense because message data may
+ * be large.
+ *
+ * If we only have one recipient, copying the message name is not a big issue,
+ * but if there are many, we would prefer not to make many copies of the
+ * string. It is, perhaps, worth keeping a dictionary of message names. and
+ * referring to the name in that - but that's not an incremental change from
+ * the "simple copying" state we start from.
+ *
+ * The simplest change to make, which may have some benefit, is to reference
+ * count the names for an individual message, as is done for the message data.
+ *
+ * If we have a single recipient, we will have copied the string from user
+ * space, and also created the kbus_name_ptr datastructure - an overhead of 8
+ * bytes. However, when we copy the message for the recipient, we do not need
+ * to copy the message name, so if the message name is more than 8 bytes, we
+ * have immediately made a gain (and experience shows that message names tend
+ * to be at least that long).
+ *
+ * As soon as we have more than one recipient, it becomes extremely likely that
+ * we have saved space, and we will definitely have saved allocations which
+ * could be fragmenting memory. So it sounds like a good thing to try.
+ *
+ * Also, if I later on want to store a hash code for the string (hoping to
+ * speed up comparisons), the new datastructure gives me somewhere to put it...
+ */
+struct kbus_name_ptr {
+	char *name;
+	struct kref refcount;
+};
+
+/*
+ * When the user reads a message from us, they receive a kbus_entire_message
+ * structure.
+ *
+ * When the user writes a message to us, they write a "pointy" message, using
+ * the kbus_message_header structure, or an "entire" message, using the
+ * kbus_entire_message structure.
+ *
+ * Within the kernel, all messages are held as "pointy" messages, but instead
+ * of direct pointers to the message name and data, we use reference counted
+ * pointers.
+ *
+ * Rather than overload the 'name' and 'data' pointer fields, with all the
+ * danger of getting it wrong that that implies, it seems simpler to have our
+ * own, internal to the kernel, clone of the datastructure, but with these
+ * fields defined correctly...
+ *
+ * Hmm. If we have the name and data references, perhaps we should move the
+ * name and data *lengths* into those same.
+ */
+
+struct kbus_msg {
+	struct kbus_msg_id id;	/* Unique to this message */
+	struct kbus_msg_id in_reply_to;	/* Which message this is a reply to */
+	u32 to;		/* 0 (empty) or a replier id */
+	u32 from;	/* 0 (KBUS) or the sender's id */
+	struct kbus_orig_from orig_from;	/* Cross-network linkage */
+	struct kbus_orig_from final_to;	/* Cross-network linkage */
+	u32 extra;	/* ignored field - future proofing */
+	u32 flags;	/* Message type/flags */
+	u32 name_len;	/* Message name's length, in bytes */
+	u32 data_len;	/* Message length, also in bytes */
+	struct kbus_name_ptr *name_ref;
+	struct kbus_data_ptr *data_ref;
+};
+
+/*
+ * The current message that the user is reading (with kbus_read())
+ *
+ * If 'msg' is NULL, then the data structure is "empty" (i.e., there is no
+ * message being read).
+ */
+struct kbus_read_msg {
+	struct kbus_entire_message user_hdr;	/* the header for user space */
+
+	struct kbus_msg *msg;	/* the internal message */
+	char *parts[KBUS_NUM_PARTS];
+	unsigned lengths[KBUS_NUM_PARTS];
+	int which;		/* The current item */
+	u32 pos;		/* How far they've read in it */
+	/*
+	 * If the current item is KBUS_PART_DATA then we 'ref_data_index' is
+	 * which part of the data we're in, and 'pos' is how far we are through
+	 * that particular item.
+	 */
+	u32 ref_data_index;
+};
+
+/*
+ * See kbus_write_parts() for how this data structure is actually used.
+ *
+ * If 'msg' is NULL, then the data structure is "empty" (i.e., there is no
+ * message being written).
+ *
+ * * 'is_finished' is true when we've got all the bytes for our message,
+ *   and thus don't want any more. It's an error for the user to try to
+ *   write more message after it is finished.
+ *
+ *   For a "pointy" message, this is set immediately after the message header
+ *   end guard is finished (the message name and any data aren't "pulled in"
+ *   until the user does SEND). For an "entire" message, this is set after the
+ *   final end guard is finished (so we will have the message name and any data
+ *   in memory).
+ *
+ * * 'pointers_are_local' is true if the message's name and data have been
+ *   transferred to kernel space (as reference counted entities), and false
+ *   if they are (still) in user space.
+ *
+ * * 'hdr' is the message header, the shorter in-kernel version.
+ *
+ * * 'which' indicates which part of the message we think we're being given
+ *   bytes for, from KBUS_PART_HDR through to (for an "entire" message)
+ *   KBUS_PART_FINAL_GUARD.
+ * * 'pos' is the index of the next byte within the current part of whatever
+ *   we're working on, as indicated by 'which'. Note that for message data,
+ *   this is the index within the whole of the data (not the index within a
+ *   data part).
+ *
+ * * If we're reading an "entire" message, then the message name gets written
+ *   to 'ref_name', which is a reference-counted string. This is allocated to
+ *   the correct size/shape for the entire message name, after the head has
+ *   been read.
+ *
+ *   The intention is that, if 'ref_name' is non-NULL, it should be legal
+ *   to call 'kbus_lower_name_ref()' on it, to free its contents.
+ *
+ * * Similarly, 'ref_data' is reference-counted data, again allocated to the
+ *   correct size/shape for the entire message data length, after the header
+ *   has been read. The 'length' for each part is used to indicate how far
+ *   through that part we have populated with bytes.
+ *
+ *   The intention is that, if 'ref_data' is non-NULL, it should be legal
+ *   to call 'kbus_lower_data_ref()' on it, to free its contents.
+ *
+ *  'ref_data_index' is then the index (starting at 0) of the referenced
+ *   data part that we are populating.
+ */
+struct kbus_write_msg {
+	struct kbus_entire_message user_msg;	/* from user space */
+	struct kbus_msg *msg;	/* our version of it */
+
+	u32 is_finished;
+	u32 pointers_are_local;
+	u32 guard;		/* Whichever guard we're reading */
+	char *user_name_ptr;	/* User space name */
+	void *user_data_ptr;	/* User space data */
+	enum kbus_msg_parts which;
+	u32 pos;
+	struct kbus_name_ptr *ref_name;
+	struct kbus_data_ptr *ref_data;
+	u32 ref_data_index;
+};
+
+/*
+ * This is the data for an individual Ksock
+ *
+ * Each time we open /dev/kbus<n>, we need to remember a unique id for
+ * our file-instance. Using 'filp' might work, but it's not something
+ * we have control over, and in particular, if the file is closed and
+ * then reopened, there's no guarantee that a particular value of 'filp'
+ * won't be used again. A simple serial number is safer.
+ *
+ * Each such "opening" also has a message queue associated with it. Any
+ * messages this "opening" has declared itself a listener (or replier)
+ * for will be added to that queue.
+ *
+ * 'id' is the unique id for this file descriptor - it enables stateful message
+ * transtions, etc. It is local to the particular KBUS device.
+ *
+ * 'last_msg_id_sent' is the message id of the last message that was
+ * (successfully) written to this file descriptor. It is needed when
+ * constructing a reply.
+ *
+ * We have a queue of messages waiting for us to read them, in 'message_queue'.
+ * 'message_count' is how many messages are in the queue, and 'max_messages'
+ * is an indication of how many messages we shall allow in the queue.
+ *
+ * Note that, however a message was originally sent to us, messages held
+ * internally are always a message header plus pointers to a message name and
+ * (optionally) message data. See kbus_send() for details.
+ */
+struct kbus_private_data {
+	struct list_head list;
+	struct kbus_dev *dev;	/* Which device we are on */
+	u32 id;		/* Our own id */
+	struct kbus_msg_id last_msg_id_sent;	/* As it says - see above */
+	u32 message_count;	/* How many messages for us */
+	u32 max_messages;	/* How many messages allowed */
+	struct list_head message_queue;	/* Messages for us */
+
+	/*
+	 * It's useful (for /proc/kbus/bindings) to remember the PID of the
+	 * current process
+	 */
+	pid_t pid;
+
+	/* Wait for something to appear in the message_queue */
+	wait_queue_head_t read_wait;
+
+	/* The message currently being read by the user */
+	struct kbus_read_msg read;
+
+	/* The message currently being written by the user */
+	struct kbus_write_msg write;
+
+	/* Are we currently sending that message? */
+	int sending;
+
+	/*
+	 * Each request we send should (eventually) generate us a reply, or
+	 * at worst a status message from KBUS itself telling us there isn't
+	 * going to be one. So we need to ensure that there is room in our
+	 * (as the sender) message queue to receive all/any such.
+	 *
+	 * Note that this *also* allows SEND to forbid sending a Reply to a
+	 * Request that we did not receive (or to which we have already
+	 * replied)
+	 */
+	struct kbus_msg_id_mem outstanding_requests;
+
+	/*
+	 * If we are a replier for a message, then KBUS wants to ensure
+	 * that a reply is *definitely* made. If we release ourselves, then
+	 * we're clearly not going to reply to any requests that we have
+	 * read but not replied to, and KBUS would like to generate a status
+	 * message for each such. So we need a list of the information needed
+	 * to form such Status/Reply messages.
+	 *
+	 *     (Thus we don't need the whole of the original message, since
+	 *     we're only *really* needing its name, its id and who its
+	 *     from -- given which its easiest just to keep the parts we
+	 *     *do* need, and ignore the data.)
+	 *
+	 * It was decided not to place a limit on the size of this list.
+	 * Its size is limited by the ability of sender(s) to send
+	 * requests, which in turn is limited by the the number of slots
+	 * they can reserve for the replies to those requests in their
+	 * own message queues.
+	 *
+	 * If a limit was imposed, then we would also need to stop a sender
+	 * sending a request because the replier has too many replies
+	 * outstanding (for instance, because it has gone to sleep). But
+	 * then we'd assume that it is not responding to messages in
+	 * general, and so its message queue would fill up, and that
+	 * should be sufficient protection.
+	 */
+	struct list_head replies_unsent;
+	u32 num_replies_unsent;
+	u32 max_replies_unsent;
+
+	/*
+	 * Managing which messages a replier may reply to
+	 * ----------------------------------------------
+	 * We need to police replying, such that a replier may only reply
+	 * to requests that it has received (where "received" means "had
+	 * placed into its message queue", because KBUS must reply for us
+	 * if the particular Ksock is not going to).
+	 *
+	 * It is possible to do this using either the 'outstanding_requests'
+	 * or the 'replies_unsent' list.
+	 *
+	 * Using the 'outstanding_requests' list means that when a replier
+	 * wants to send a reply, it needs to look up who the original-sender
+	 * is (from its Ksock id, in the "from" field of the message), and
+	 * check against that. This is a bit inefficient.
+	 *
+	 * Using the 'replies_unsent' list means that when a replier wants
+	 * to send a reply, it just needs to find the right message stub
+	 * in said 'replies_unsent' list, and check that the reply *does*
+	 * match the original request. This may be more efficient, depending.
+	 *
+	 * In fact, the 'outstanding_requests' list is used, simply because
+	 * it was implemented first.
+	 */
+};
+
+/* What is a sensible number for the default maximum number of messages? */
+#ifndef CONFIG_KBUS_DEF_MAX_MESSAGES
+#define CONFIG_KBUS_DEF_MAX_MESSAGES	100
+#endif
+
+/* Information belonging to each /dev/kbus<N> device */
+struct kbus_dev {
+	struct cdev cdev;	/* Character device data */
+	struct device *dev;	/* Our very selves */
+
+	u32 index;		/* Which /dev/kbus<n> device we are */
+
+	/*
+	 * The Big Lock
+	 * We use a single mutex for all purposes, and all locking is done
+	 * at the "top level", i.e., in the externally called functions.
+	 * This simplifies the design of the internal (list processing,
+	 * etc.) functions, at the possible cost of making interaction
+	 * with KBUS, in general, slower.
+	 *
+	 * On the other hand, we favour reliable over fast.
+	 */
+	struct mutex mux;
+
+	/* Who has bound to receive which messages in what manner */
+	struct list_head bound_message_list;
+
+	/*
+	 * The actual Ksock entries (one per 'open("/dev/kbus<n>")')
+	 * This is to allow us to find the 'kbus_private_data' instances,
+	 * so that we can get at all the message queues. The details of
+	 * how we do this are *definitely* going to change...
+	 */
+	struct list_head open_ksock_list;
+
+	/* Has one of our Ksocks made space available in its message queue? */
+	wait_queue_head_t write_wait;
+
+	/*
+	 * Each open file descriptor needs an internal id - this is used
+	 * when binding messages to listeners, but is also needed when we
+	 * want to reply. We reserve the id 0 as a special value ("none").
+	 */
+	u32 next_ksock_id;
+
+	/*
+	 * Every message sent has a unique id (again, unique per device).
+	 */
+	u32 next_msg_serial_num;
+
+	/* Are we wanting debugging messages? */
+	u32 verbose;
+};
+
+/*
+ * Each entry in a message queue holds a single message, and a pointer to
+ * the message name binding that caused it to be added to the list. This
+ * makes it simple to remove messages from the queue if the message name
+ * binding is unbound. The binding shall be NULL for:
+ *
+ *  * Replies
+ *  * KBUS "synthetic" messages, which are also (essentialy) Replies
+ */
+struct kbus_message_queue_item {
+	struct list_head list;
+	struct kbus_msg *msg;
+	struct kbus_message_binding *binding;
+};
+
+/* The sizes of the parts in our reference counted data */
+#define KBUS_PART_LEN		PAGE_SIZE
+#define KBUS_PAGE_THRESHOLD	(PAGE_SIZE >> 1)
+
+/* Manage the files used to report KBUS internal state */
+/* From kbus_internal.c */
+#ifndef CONFIG_PROC_FS
+void kbus_setup_reporting(void) {}
+void kbus_remove_reporting(void) {}
+#else
+extern void kbus_setup_reporting(void);
+extern void kbus_remove_reporting(void);
+#endif
+/* From kbus.c itself */
+extern void kbus_get_device_data(int *num_devices,
+				 struct kbus_dev ***devices);
+extern u32 kbus_lenleft(struct kbus_private_data *priv);
+
+#endif /* _kbus_internal */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/11] KBUS main source file, basic device support only
  2011-03-18 17:21     ` [PATCH 03/11] KBUS internal " Tony Ibbs
@ 2011-03-18 17:21       ` Tony Ibbs
  2011-03-18 17:21         ` [PATCH 05/11] KBUS add support for messages Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

This first version of the main source file just provides device
open/close and various basic IOCTLs, including bind/unbind.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 ipc/kbus_main.c | 1061 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 1061 insertions(+), 0 deletions(-)
 create mode 100644 ipc/kbus_main.c

diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
new file mode 100644
index 0000000..87c7506
--- /dev/null
+++ b/ipc/kbus_main.c
@@ -0,0 +1,1061 @@
+/* Kbus kernel module
+ *
+ * This is a character device driver, providing the messaging support
+ * for kbus.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ *   Kynesim, Cambridge UK
+ *   Tony Ibbs <tibs@tonyibbs.co.uk>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above.  If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL.  If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <linux/fs.h>
+#include <linux/device.h>	/* device classes (for hotplugging), &c */
+#include <linux/cdev.h>		/* registering character devices */
+#include <linux/list.h>
+#include <linux/ctype.h>	/* for isalnum */
+#include <linux/poll.h>
+#include <linux/slab.h>		/* for kmalloc, etc. */
+#include <linux/sched.h>	/* for current->pid */
+#include <linux/uaccess.h>	/* copy_*_user() functions */
+#include <asm/page.h>		/* PAGE_SIZE */
+
+#include <linux/kbus_defns.h>
+#include "kbus_internal.h"
+
+static int kbus_num_devices = CONFIG_KBUS_DEF_NUM_DEVICES;
+
+/* Who we are -- devices */
+static int kbus_major;	/* 0 => We'll go for dynamic allocation */
+static int kbus_minor;	/* 0 => We're happy to start with device 0 */
+
+/* Our actual devices, 0 through kbus_num_devices-1 */
+static struct kbus_dev **kbus_devices;
+
+static struct class *kbus_class_p;
+
+/* ========================================================================= */
+
+/* I really want this function where it is in the code, so need to foreshadow */
+static int kbus_setup_new_device(int which);
+
+/* ========================================================================= */
+
+/* What's the symbolic name of a replier type? */
+__maybe_unused
+static const char *kbus_replier_type_name(enum kbus_replier_type t)
+{
+	switch (t) {
+	case UNSET:		return "UNSET";
+	case WILD_STAR:		return "WILD_STAR";
+	case WILD_PERCENT:	return "WILD_PERCENT";
+	case SPECIFIC:		return "SPECIFIC";
+	}
+	pr_err("kbus: unhandled enum lookup %d in "
+		   "kbus_replier_type_name - memory corruption?", t);
+	return "???";
+}
+
+/*
+ * Given a message name, is it valid?
+ *
+ * We have nothing to say on maximum length.
+ *
+ * Returns 0 if it's OK, 1 if it's naughty
+ */
+static int kbus_bad_message_name(char *name, size_t name_len)
+{
+	size_t ii;
+	int dot_at = 1;
+
+	if (name_len < 3)
+		return 1;
+
+	if (name == NULL || name[0] != '$' || name[1] != '.')
+		return 1;
+
+	if (name[name_len - 2] == '.' && name[name_len - 1] == '*')
+		name_len -= 2;
+	else if (name[name_len - 2] == '.' && name[name_len - 1] == '%')
+		name_len -= 2;
+
+	if (name[name_len - 1] == '.')
+		return 1;
+
+	for (ii = 2; ii < name_len; ii++) {
+		if (name[ii] == '.') {
+			if (dot_at == ii - 1)
+				return 1;
+			dot_at = ii;
+		} else if (!isalnum(name[ii]))
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * Find out who, if anyone, is bound as a replier to the given message name.
+ *
+ * Returns 1 if we found a replier, 0 if we did not (but all went well), and
+ * a negative value if something went wrong.
+ */
+static int kbus_find_replier(struct kbus_dev *dev,
+			     struct kbus_private_data **bound_to,
+			     u32 name_len, char *name)
+{
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+		/*
+		 * We are only interested in a replier binding to the name.
+		 * We *could* check for the name and then check for
+		 * reply-ness - if we found a name match that was *not* a
+		 * replyer, then we'd have finished. However, checking the
+		 * name is expensive, and I rather assume that a caller is
+		 * only checking if they expect a positive result, so it's
+		 * simpler to do a lazier check.
+		 */
+		if (!ptr->is_replier || ptr->name_len != name_len ||
+		    strncmp(name, ptr->name, name_len))
+			continue;
+
+		kbus_maybe_dbg(dev, "  '%.*s' has replier %u\n",
+			       ptr->name_len, ptr->name, ptr->bound_to_id);
+		*bound_to = ptr->bound_to;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Add a new binding.
+ *
+ * Doesn't allow more than one replier to be bound for a message name.
+ *
+ * NB: If it succeeds, then it wants to keep hold of 'name', so don't
+ *     free it...
+ *
+ * Returns 0 if all went well, a negative value if it did not. Specifically,
+ * -EADDRINUSE if an attempt was made to bind as a replier to a message name
+ * that already has a replier bound.
+ */
+static int kbus_remember_binding(struct kbus_dev *dev,
+				 struct kbus_private_data *priv,
+				 u32 replier,
+				 u32 name_len, char *name)
+{
+	int retval = 0;
+	struct kbus_message_binding *new;
+
+	/* If we want a replier, and there already is one, we lose */
+	if (replier) {
+		struct kbus_private_data *reply_to;
+		retval = kbus_find_replier(dev, &reply_to, name_len, name);
+		/*
+		 * "Address in use" isn't quite right, but lets the caller
+		 * have some hope of telling what went wrong, and this is a
+		 * useful case to distinguish.
+		 */
+		if (retval == 1) {
+			kbus_maybe_dbg(dev,
+				       "%u CANNOT BIND '%.*s' as "
+				       "replier, already bound\n",
+				       priv->id, name_len, name);
+			return -EADDRINUSE;
+		}
+	}
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	new->bound_to = priv;
+	new->bound_to_id = priv->id;	/* Useful shorthand? */
+	new->is_replier = replier;
+	new->name_len = name_len;
+	new->name = name;
+
+	list_add(&new->list, &dev->bound_message_list);
+	return 0;
+}
+
+/*
+ * Find a particular binding.
+ *
+ * Return a pointer to the binding, or NULL if it was not found.
+ */
+static struct kbus_message_binding
+*kbus_find_binding(struct kbus_dev *dev,
+		   struct kbus_private_data *priv,
+		   u32 replier, u32 name_len, char *name)
+{
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+		if (priv != ptr->bound_to)
+			continue;
+		if (replier != ptr->is_replier)
+			continue;
+		if (name_len != ptr->name_len)
+			continue;
+		if (strncmp(name, ptr->name, name_len))
+			continue;
+
+		kbus_maybe_dbg(priv->dev, "  %u Found %c '%.*s'\n",
+			       priv->id, (ptr->is_replier ? 'R' : 'L'),
+			       ptr->name_len, ptr->name);
+		return ptr;
+	}
+	return NULL;
+}
+
+/*
+ * Remove an existing binding.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_forget_binding(struct kbus_dev *dev,
+			       struct kbus_private_data *priv,
+			       u32 replier, u32 name_len, char *name)
+{
+	struct kbus_message_binding *binding;
+
+	binding = kbus_find_binding(dev, priv, replier, name_len, name);
+	if (binding == NULL) {
+		kbus_maybe_dbg(priv->dev,
+			       "  %u Could not find/unbind "
+			       "%u %c '%.*s'\n",
+			       priv->id, priv->id,
+			       (replier ? 'R' : 'L'), name_len, name);
+		return -EINVAL;
+	}
+
+	kbus_maybe_dbg(priv->dev, "  %u Unbound %u %c '%.*s'\n",
+		       priv->id, binding->bound_to_id,
+		       (binding->is_replier ? 'R' : 'L'),
+		       binding->name_len, binding->name);
+
+	/*
+	 * If we supported sending messages (yet), we'd need to forget
+	 * any messages in our queue that match this binding.
+	 */
+
+	/* And remove the binding once that has been done. */
+	list_del(&binding->list);
+	kfree(binding->name);
+	kfree(binding);
+	return 0;
+}
+
+/*
+ * Remove all bindings for a particular listener.
+ *
+ * Called from kbus_release, which will itself handle removing messages
+ * (that *were* bound) from the message queue.
+ */
+static void kbus_forget_my_bindings(struct kbus_private_data *priv)
+{
+	struct kbus_dev *dev = priv->dev;
+	u32 bound_to_id = priv->id;
+
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	kbus_maybe_dbg(dev, "%u Forgetting my bindings\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+		if (bound_to_id != ptr->bound_to_id)
+			continue;
+
+		kbus_maybe_dbg(dev, "  Unbound %u %c '%.*s'\n",
+			       ptr->bound_to_id, (ptr->is_replier ? 'R' : 'L'),
+			       ptr->name_len, ptr->name);
+
+		list_del(&ptr->list);
+		kfree(ptr->name);
+		kfree(ptr);
+	}
+}
+
+/*
+ * Remove all bindings.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock,
+ * nor does it worry about generating synthetic messages as requests are doomed
+ * not to get replies.
+ */
+static void kbus_forget_all_bindings(struct kbus_dev *dev)
+{
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	kbus_maybe_dbg(dev, "Forgetting bindings\n");
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+
+		kbus_maybe_dbg(dev, "  Unbinding %u %c '%.*s'\n",
+			       ptr->bound_to_id,
+			       (ptr->is_replier ? 'R' : 'L'),
+			       ptr->name_len, ptr->name);
+
+		list_del(&ptr->list);
+		kfree(ptr->name);
+		kfree(ptr);
+	}
+}
+
+/*
+ * Add a new open file to our remembrances.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_remember_open_ksock(struct kbus_dev *dev,
+				    struct kbus_private_data *priv)
+{
+	list_add(&priv->list, &dev->open_ksock_list);
+
+	kbus_maybe_dbg(priv->dev, "Remembered 'open file' id %u\n",
+		       priv->id);
+	return 0;
+}
+
+/*
+ * Remove an open file remembrance.
+ *
+ * Returns 0 if all went well, -EINVAL if we couldn't find the open Ksock
+ */
+static int kbus_forget_open_ksock(struct kbus_dev *dev, u32 id)
+{
+	struct kbus_private_data *ptr;
+	struct kbus_private_data *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+		if (id != ptr->id)
+			continue;
+
+		kbus_maybe_dbg(dev, "  Forgetting open Ksock %u\n", id);
+
+		/* So remove it from our list */
+		list_del(&ptr->list);
+		/* But *we* mustn't free the actual datastructure! */
+		return 0;
+	}
+	kbus_maybe_dbg(dev, "  Could not forget open Ksock %u\n", id);
+
+	return -EINVAL;
+}
+
+/*
+ * Forget all our "open file" remembrances.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock.
+ */
+static void kbus_forget_all_open_ksocks(struct kbus_dev *dev)
+{
+	struct kbus_private_data *ptr;
+	struct kbus_private_data *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+
+		kbus_maybe_dbg(dev, "  Forgetting open Ksock %u\n", ptr->id);
+
+		/* So remove it from our list */
+		list_del(&ptr->list);
+		/* But *we* mustn't free the actual datastructure! */
+	}
+}
+
+static int kbus_open(struct inode *inode, struct file *filp)
+{
+	struct kbus_private_data *priv;
+	struct kbus_dev *dev;
+
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	/*
+	 * Use the official magic to retrieve our actual device data
+	 * so we can remember it for other file operations.
+	 */
+	dev = container_of(inode->i_cdev, struct kbus_dev, cdev);
+
+	if (mutex_lock_interruptible(&dev->mux)) {
+		kfree(priv);
+		return -ERESTARTSYS;
+	}
+
+	/*
+	 * Our file descriptor id ("listener id") needs to be unique for this
+	 * device, and thus we want to be carefully inside our lock.
+	 *
+	 * We shall (for now at least) ignore wrap-around - 32 bits is big
+	 * enough that it shouldn't cause non-unique ids in our target
+	 * applications.
+	 *
+	 * Listener id 0 is reserved, and we'll use that (on occasion) to mean
+	 * kbus itself.
+	 */
+	if (dev->next_ksock_id == 0)
+		dev->next_ksock_id++;
+
+	memset(priv, 0, sizeof(*priv));
+	priv->dev = dev;
+	priv->id = dev->next_ksock_id++;
+	priv->pid = current->pid;
+	priv->max_messages = CONFIG_KBUS_DEF_MAX_MESSAGES;
+	priv->sending = false;
+	priv->num_replies_unsent = 0;
+	priv->max_replies_unsent = 0;
+
+	INIT_LIST_HEAD(&priv->message_queue);
+	INIT_LIST_HEAD(&priv->replies_unsent);
+
+	(void)kbus_remember_open_ksock(dev, priv);
+
+	filp->private_data = priv;
+
+	mutex_unlock(&dev->mux);
+
+	kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+
+	return 0;
+}
+
+static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+{
+	int retval2 = 0;
+	struct kbus_private_data *priv = filp->private_data;
+	struct kbus_dev *dev = priv->dev;
+
+	if (mutex_lock_interruptible(&dev->mux))
+		return -ERESTARTSYS;
+
+	kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+
+	kbus_forget_my_bindings(priv);
+	retval2 = kbus_forget_open_ksock(dev, priv->id);
+	kfree(priv);
+
+	mutex_unlock(&dev->mux);
+
+	return retval2;
+}
+
+static int kbus_bind(struct kbus_private_data *priv,
+		     struct kbus_dev *dev, unsigned long arg)
+{
+	int retval = 0;
+	struct kbus_bind_request *bind;
+	char *name = NULL;
+
+	bind = kmalloc(sizeof(*bind), GFP_KERNEL);
+	if (!bind)
+		return -ENOMEM;
+	if (copy_from_user(bind, (void __user *)arg, sizeof(*bind))) {
+		retval = -EFAULT;
+		goto done;
+	}
+
+	if (bind->name_len == 0) {
+		kbus_maybe_dbg(dev, "bind name is length 0\n");
+		retval = -EBADMSG;
+		goto done;
+	} else if (bind->name_len > KBUS_MAX_NAME_LEN) {
+		kbus_maybe_dbg(dev, "bind name is length %d\n",
+			       bind->name_len);
+		retval = -ENAMETOOLONG;
+		goto done;
+	}
+
+	name = kmalloc(bind->name_len + 1, GFP_KERNEL);
+	if (!name) {
+		retval = -ENOMEM;
+		goto done;
+	}
+	if (copy_from_user(name, (char __user *) bind->name, bind->name_len)) {
+		retval = -EFAULT;
+		goto done;
+	}
+	name[bind->name_len] = 0;
+
+	if (kbus_bad_message_name(name, bind->name_len)) {
+		retval = -EBADMSG;
+		goto done;
+	}
+
+	kbus_maybe_dbg(priv->dev, "%u BIND %c '%.*s'\n", priv->id,
+		       (bind->is_replier ? 'R' : 'L'), bind->name_len, name);
+
+	retval = kbus_remember_binding(dev, priv,
+				       bind->is_replier, bind->name_len, name);
+	if (retval == 0)
+		/* The binding will use our copy of the message name */
+		name = NULL;
+
+done:
+	kfree(name);
+	kfree(bind);
+	return retval;
+}
+
+static int kbus_unbind(struct kbus_private_data *priv,
+		       struct kbus_dev *dev, unsigned long arg)
+{
+	int retval = 0;
+	struct kbus_bind_request *bind;
+	char *name = NULL;
+
+	bind = kmalloc(sizeof(*bind), GFP_KERNEL);
+	if (!bind)
+		return -ENOMEM;
+	if (copy_from_user(bind, (void __user *)arg, sizeof(*bind))) {
+		retval = -EFAULT;
+		goto done;
+	}
+
+	if (bind->name_len == 0) {
+		kbus_maybe_dbg(priv->dev, "unbind name is length 0\n");
+		retval = -EBADMSG;
+		goto done;
+	} else if (bind->name_len > KBUS_MAX_NAME_LEN) {
+		kbus_maybe_dbg(priv->dev, "unbind name is length %d\n",
+			       bind->name_len);
+		retval = -ENAMETOOLONG;
+		goto done;
+	}
+
+	name = kmalloc(bind->name_len + 1, GFP_KERNEL);
+	if (!name) {
+		retval = -ENOMEM;
+		goto done;
+	}
+	if (copy_from_user(name, (char __user *) bind->name, bind->name_len)) {
+		retval = -EFAULT;
+		goto done;
+	}
+	name[bind->name_len] = 0;
+
+	if (kbus_bad_message_name(name, bind->name_len)) {
+		retval = -EBADMSG;
+		goto done;
+	}
+
+	kbus_maybe_dbg(priv->dev, "%u UNBIND %c '%.*s'\n", priv->id,
+		       (bind->is_replier ? 'R' : 'L'), bind->name_len, name);
+
+	retval = kbus_forget_binding(dev, priv,
+				     bind->is_replier, bind->name_len, name);
+
+done:
+	kfree(name);
+	kfree(bind);
+	return retval;
+}
+
+static int kbus_replier(struct kbus_private_data *priv __maybe_unused,
+			struct kbus_dev *dev, unsigned long arg)
+{
+	struct kbus_private_data *replier;
+	struct kbus_bind_query *query;
+	char *name = NULL;
+	int retval = 0;
+
+	query = kmalloc(sizeof(*query), GFP_KERNEL);
+	if (!query)
+		return -ENOMEM;
+	if (copy_from_user(query, (void __user *)arg, sizeof(*query))) {
+		retval = -EFAULT;
+		goto done;
+	}
+
+	if (query->name_len == 0 || query->name_len > KBUS_MAX_NAME_LEN) {
+		kbus_maybe_dbg(priv->dev, "Replier name is length %d\n",
+			       query->name_len);
+		retval = -ENAMETOOLONG;
+		goto done;
+	}
+
+	name = kmalloc(query->name_len + 1, GFP_KERNEL);
+	if (!name) {
+		retval = -ENOMEM;
+		goto done;
+	}
+	if (copy_from_user(name, (char __user *) query->name,
+						 query->name_len)) {
+		retval = -EFAULT;
+		goto done;
+	}
+	name[query->name_len] = 0;
+
+	kbus_maybe_dbg(priv->dev, "%u REPLIER for '%.*s'\n",
+		       priv->id, query->name_len, name);
+
+	retval = kbus_find_replier(dev, &replier, query->name_len, name);
+	if (retval < 0)
+		goto done;
+
+	if (retval)
+		query->return_id = replier->id;
+	else
+		query->return_id = 0;
+	/*
+	 * Copy the whole structure back, rather than try to work out (in a
+	 * guaranteed-safe manner) where the 'id' actually lives
+	 */
+	if (copy_to_user((void __user *)arg, query, sizeof(*query))) {
+		retval = -EFAULT;
+		goto done;
+	}
+done:
+	kfree(name);
+	kfree(query);
+	return retval;
+}
+
+/* How much of the current message is left to read? */
+extern u32 kbus_lenleft(struct kbus_private_data *priv)
+{
+	return 0; /* no message => nothing to read */
+}
+
+static int kbus_maxmsgs(struct kbus_private_data *priv,
+			unsigned long arg)
+{
+	int retval = 0;
+	u32 requested_max;
+
+	retval = __get_user(requested_max, (u32 __user *) arg);
+	if (retval)
+		return retval;
+
+	kbus_maybe_dbg(priv->dev, "%u MAXMSGS requests %u (was %u)\n",
+		       priv->id, requested_max, priv->max_messages);
+
+	/* A value of 0 is just a query for what the current length is */
+	if (requested_max > 0)
+		priv->max_messages = requested_max;
+
+	return __put_user(priv->max_messages, (u32 __user *) arg);
+}
+
+static int kbus_nummsgs(struct kbus_private_data *priv,
+			struct kbus_dev *dev __maybe_unused, unsigned long arg)
+{
+	u32 count = priv->message_count;
+
+	kbus_maybe_dbg(dev, "%u NUMMSGS %u\n", priv->id, count);
+
+	return __put_user(count, (u32 __user *) arg);
+}
+
+static int kbus_set_verbosity(struct kbus_private_data *priv,
+			      unsigned long arg)
+{
+	int retval = 0;
+	u32 verbose;
+	int old_value = priv->dev->verbose;
+
+	retval = __get_user(verbose, (u32 __user *) arg);
+	if (retval)
+		return retval;
+
+	/*
+	 * If we're *leaving* verbose mode, we should say so.
+	 * However, we also want to  say if we're *entering* verbose
+	 * mode, and that means we can't use kbus_maybe_dbg (since
+	 * we're not yet in verbose mode)
+	 */
+#ifdef DEBUG
+	dev_dbg(priv->dev->dev,
+		"%u VERBOSE requests %u (was %d)\n",
+		priv->id, verbose, old_value);
+#endif
+
+	switch (verbose) {
+	case 0:
+		priv->dev->verbose = false;
+		break;
+	case 1:
+		priv->dev->verbose = true;
+		break;
+	case 0xFFFFFFFF:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return __put_user(old_value, (u32 __user *) arg);
+}
+
+static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	int err = 0;
+	int retval = 0;
+	struct kbus_private_data *priv = filp->private_data;
+	struct kbus_dev *dev = priv->dev;
+	u32 id = priv->id;
+
+	if (_IOC_TYPE(cmd) != KBUS_IOC_MAGIC)
+		return -ENOTTY;
+	if (_IOC_NR(cmd) > KBUS_IOC_MAXNR)
+		return -ENOTTY;
+	/*
+	 * Check our arguments at least vaguely match. Note that VERIFY_WRITE
+	 * allows R/W transfers. Remember that 'type' is user-oriented, while
+	 * access_ok is kernel-oriented, so the concept of "read" and "write"
+	 * is reversed
+	 */
+	if (_IOC_DIR(cmd) & _IOC_READ)
+		err = !access_ok(VERIFY_WRITE, (void __user *)arg,
+				_IOC_SIZE(cmd));
+	else if (_IOC_DIR(cmd) & _IOC_WRITE)
+		err = !access_ok(VERIFY_READ, (void __user *)arg,
+				_IOC_SIZE(cmd));
+	if (err)
+		return -EFAULT;
+
+	if (mutex_lock_interruptible(&dev->mux))
+		return -ERESTARTSYS;
+
+	switch (cmd) {
+
+	case KBUS_IOC_RESET:
+		/* This is currently a no-op, but may be useful later */
+		kbus_maybe_dbg(priv->dev, "%u RESET\n", id);
+		break;
+
+	case KBUS_IOC_BIND:
+		/*
+		 * BIND: indicate that a file wants to receive messages of a
+		 * given name
+		 */
+		retval = kbus_bind(priv, dev, arg);
+		break;
+
+	case KBUS_IOC_UNBIND:
+		/*
+		 * UNBIND: indicate that a file no longer wants to receive
+		 * messages of a given name
+		 */
+		retval = kbus_unbind(priv, dev, arg);
+		break;
+
+	case KBUS_IOC_KSOCKID:
+		/*
+		 * What is the "Ksock id" for this file descriptor
+		 */
+		kbus_maybe_dbg(priv->dev, "%u KSOCKID %u\n", id, id);
+		retval = __put_user(id, (u32 __user *) arg);
+		break;
+
+	case KBUS_IOC_REPLIER:
+		/*
+		 * Who (if anyone) is bound to reply to this message?
+		 * arg in: message name
+		 * arg out: listener id
+		 * return: 0 means no-one, 1 means someone
+		 * We can't just return the id as the return value of the ioctl,
+		 * because it's an unsigned int, and the ioctl return must be
+		 * signed...
+		 */
+		retval = kbus_replier(priv, dev, arg);
+		break;
+
+	case KBUS_IOC_MAXMSGS:
+		/*
+		 * Set (and/or query) maximum number of messages in this
+		 * interfaces queue.
+		 *
+		 * arg in: 0 (for query) or maximum number wanted
+		 * arg out: maximum number allowed
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_maxmsgs(priv, arg);
+		break;
+
+	case KBUS_IOC_NUMMSGS:
+		/* How many messages are in our queue?
+		 *
+		 * arg out: maximum number allowed
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_nummsgs(priv, dev, arg);
+		break;
+
+	case KBUS_IOC_VERBOSE:
+		/*
+		 * Should we output verbose/debug messages?
+		 *
+		 * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+		 * arg out: the previous value, before we were called
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_set_verbosity(priv, arg);
+		break;
+
+	default:
+		/* *Should* be redundant, if we got our range checks right */
+		retval = -ENOTTY;
+		break;
+	}
+
+	mutex_unlock(&dev->mux);
+	return retval;
+}
+
+/* File operations for /dev/kbus<n> */
+static const struct file_operations kbus_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = kbus_ioctl,
+	.open = kbus_open,
+	.release = kbus_release,
+};
+
+static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
+{
+	int err;
+
+	/*
+	 * Remember to initialise the mutex *before* making the device
+	 * available!
+	 */
+	mutex_init(&dev->mux);
+
+	/*
+	 * This seems like a sensible place to setup other device specific
+	 * stuff, too.
+	 */
+	INIT_LIST_HEAD(&dev->bound_message_list);
+	INIT_LIST_HEAD(&dev->open_ksock_list);
+
+	dev->next_ksock_id = 0;
+	dev->next_msg_serial_num = 0;
+
+	cdev_init(&dev->cdev, &kbus_fops);
+	dev->cdev.owner = THIS_MODULE;
+
+	err = cdev_add(&dev->cdev, devno, 1);
+	if (err)
+		pr_err("Error %d adding kbus0 as a character device\n",
+		       err);
+}
+
+static void kbus_teardown_cdev(struct kbus_dev *dev)
+{
+	kbus_forget_all_bindings(dev);
+	kbus_forget_all_open_ksocks(dev);
+
+	cdev_del(&dev->cdev);
+}
+
+
+/*
+ * Actually setup /dev/kbus<which>.
+ *
+ * Returns <which> or a negative error code.
+ */
+static int kbus_setup_new_device(int which)
+{
+	struct kbus_dev *new = NULL;
+	dev_t this_devno;
+
+	if (which < 0 || which > (KBUS_MAX_NUM_DEVICES - 1)) {
+		pr_err("kbus: next device index %d not %d..%d\n",
+		       which, KBUS_MIN_NUM_DEVICES, KBUS_MAX_NUM_DEVICES);
+		return -EINVAL;
+	}
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	memset(new, 0, sizeof(*new));
+
+	/* Connect the device up with its operations */
+	this_devno = MKDEV(kbus_major, kbus_minor + which);
+	kbus_setup_cdev(new, this_devno);
+	new->index = which;
+
+	new->verbose = KBUS_DEFAULT_VERBOSE_SETTING;
+
+	new->dev = device_create(kbus_class_p, NULL,
+				 this_devno, NULL, "kbus%d", which);
+
+	kbus_devices[which] = new;
+	return which;
+}
+
+/* Allow the reporting infrastructure to "see" our internals */
+extern void kbus_get_device_data(int *num_devices,
+				 struct kbus_dev ***devices)
+{
+	*num_devices = kbus_num_devices;
+	*devices = kbus_devices;
+}
+
+static int __init kbus_init(void)
+{
+	int result;
+	int ii;
+	dev_t devno = 0;
+
+#ifdef DEBUG
+	pr_notice("Initialising KBUS module (%d device%s)\n",
+		  kbus_num_devices, kbus_num_devices == 1 ? "" : "s");
+#endif
+
+	if (kbus_num_devices < KBUS_MIN_NUM_DEVICES ||
+	    kbus_num_devices > KBUS_MAX_NUM_DEVICES) {
+		pr_err("kbus: requested number of devices %d not %d..%d\n",
+		       kbus_num_devices,
+		       KBUS_MIN_NUM_DEVICES, KBUS_MAX_NUM_DEVICES);
+		return -EINVAL;
+	}
+
+	/* ================================================================= */
+	/*
+	 * Our main purpose is to provide /dev/kbus
+	 * We wish to start our device numbering with device 0, and device 0
+	 * should always be present,
+	 */
+	result = alloc_chrdev_region(&devno, kbus_minor, KBUS_MAX_NUM_DEVICES,
+				     "kbus");
+	/* We're quite happy with dynamic allocation of our major number */
+	kbus_major = MAJOR(devno);
+	if (result < 0) {
+		pr_warn("kbus: Cannot allocate character device region "
+		       "(error %d)\n", -result);
+		return result;
+	}
+
+	kbus_devices = kmalloc(KBUS_MAX_NUM_DEVICES * sizeof(struct kbus_dev *),
+			       GFP_KERNEL);
+	if (!kbus_devices) {
+		pr_warn("kbus: Cannot allocate devices\n");
+		unregister_chrdev_region(devno, kbus_num_devices);
+		return -ENOMEM;
+	}
+	memset(kbus_devices, 0, kbus_num_devices * sizeof(struct kbus_dev *));
+
+	/*
+	 * To make the user's life as simple as possible, let's make our device
+	 * hot pluggable -- this means that on a modern system it *should* just
+	 * appear, as if by magic (and go away again when the module is
+	 * removed).
+	 */
+	kbus_class_p = class_create(THIS_MODULE, "kbus");
+	if (IS_ERR(kbus_class_p)) {
+		long err = PTR_ERR(kbus_class_p);
+		if (err == -EEXIST) {
+			pr_warn("kbus: Cannot create kbus class, "
+			       "it already exists\n");
+		} else {
+			pr_err("kbus: Error creating kbus class\n");
+			unregister_chrdev_region(devno, kbus_num_devices);
+			return err;
+		}
+	}
+
+	/* And connect up the number of devices we've been asked for */
+	for (ii = 0; ii < kbus_num_devices; ii++) {
+		int res = kbus_setup_new_device(ii);
+		if (res < 0) {
+			unregister_chrdev_region(devno, kbus_num_devices);
+			class_destroy(kbus_class_p);
+			return res;
+		}
+	}
+
+	/* Set up the files that allow users to see something of our state */
+	kbus_setup_reporting();
+
+	return 0;
+}
+
+static void __exit kbus_exit(void)
+{
+	/* No locking done, as we're standing down */
+
+	int ii;
+	dev_t devno = MKDEV(kbus_major, kbus_minor);
+
+#ifdef DEBUG
+	pr_notice("Standing down kbus module\n");
+#endif
+
+	for (ii = 0; ii < kbus_num_devices; ii++) {
+		kbus_teardown_cdev(kbus_devices[ii]);
+		kfree(kbus_devices[ii]);
+	}
+	unregister_chrdev_region(devno, kbus_num_devices);
+
+	/*
+	 * If I'm destroying the class, do I actually need to destroy the
+	 * individual device therein? Best safe...
+	 */
+	for (ii = 0; ii < kbus_num_devices; ii++) {
+		dev_t this_devno = MKDEV(kbus_major, kbus_minor + ii);
+		device_destroy(kbus_class_p, this_devno);
+	}
+	class_destroy(kbus_class_p);
+
+	kbus_remove_reporting();
+}
+
+module_param(kbus_num_devices, int, S_IRUGO);
+MODULE_PARM_DESC(kbus_num_devices,
+		"Number of KBUS device nodes to create initially");
+module_init(kbus_init);
+module_exit(kbus_exit);
+
+MODULE_DESCRIPTION("KBUS lightweight messaging system");
+MODULE_AUTHOR("tibs@tonyibbs.co.uk, tony.ibbs@gmail.com");
+/*
+ * All well-behaved Linux kernel modules should be licensed under GPL v2.
+ * So shall it be.
+ *
+ * (According to the comments in <linux/module.h>, the "v2" is implicit here)
+ *
+ * We also license under the MPL, to allow free use outwith Linux if anyone
+ * wishes.
+ */
+MODULE_LICENSE("Dual MPL/GPL");
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/11] KBUS add support for messages
  2011-03-18 17:21       ` [PATCH 04/11] KBUS main source file, basic device support only Tony Ibbs
@ 2011-03-18 17:21         ` Tony Ibbs
  2011-03-18 17:21           ` [PATCH 06/11] KBUS add ability to receive messages only once Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

This patch adds the code that actually allows KBUS to send
and receive messages.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 ipc/kbus_main.c | 2766 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 2743 insertions(+), 23 deletions(-)

diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 87c7506..944b60c 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -65,6 +65,10 @@ static int kbus_num_devices = CONFIG_KBUS_DEF_NUM_DEVICES;
 static int kbus_major;	/* 0 => We'll go for dynamic allocation */
 static int kbus_minor;	/* 0 => We're happy to start with device 0 */
 
+/* We can't need more than 8 characters of padding, by definition! */
+static char *static_zero_padding = "\0\0\0\0\0\0\0\0";
+static u32 static_end_guard = KBUS_MSG_END_GUARD;
+
 /* Our actual devices, 0 through kbus_num_devices-1 */
 static struct kbus_dev **kbus_devices;
 
@@ -72,9 +76,21 @@ static struct class *kbus_class_p;
 
 /* ========================================================================= */
 
+/* As few foreshadowings as I can get away with */
+static struct kbus_private_data *kbus_find_open_ksock(struct kbus_dev *dev,
+						      u32 id);
+
 /* I really want this function where it is in the code, so need to foreshadow */
 static int kbus_setup_new_device(int which);
 
+/* More or less ditto */
+static int kbus_write_to_recipients(struct kbus_private_data *priv,
+				    struct kbus_dev *dev,
+				    struct kbus_msg *msg);
+
+static int kbus_alloc_ref_data(struct kbus_private_data *priv,
+			       u32 data_len,
+			       struct kbus_data_ptr **ret_ref_data);
 /* ========================================================================= */
 
 /* What's the symbolic name of a replier type? */
@@ -93,6 +109,376 @@ static const char *kbus_replier_type_name(enum kbus_replier_type t)
 }
 
 /*
+ * Wrap a set of data pointers and lengths in a reference
+ */
+static struct kbus_data_ptr *kbus_wrap_data_in_ref(int as_pages,
+						   unsigned num_parts,
+						   unsigned long *parts,
+						   unsigned *lengths,
+						   unsigned last_page_len)
+{
+	struct kbus_data_ptr *new = NULL;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return NULL;
+
+	new->as_pages = as_pages;
+	new->parts = parts;
+	new->lengths = lengths;
+	new->num_parts = num_parts;
+	new->last_page_len = last_page_len;
+
+	kref_init(&new->refcount);
+	return new;
+}
+
+/*
+ * Increment the reference count for our pointer.
+ *
+ * Returns the (same) reference, for convenience.
+ */
+static struct kbus_data_ptr *kbus_raise_data_ref(struct kbus_data_ptr *refdata)
+{
+	if (refdata != NULL)
+		kref_get(&refdata->refcount);
+	return refdata;
+}
+
+/*
+ * Data release callback for data reference pointers. Called when the reference
+ * count says to...
+ */
+static void kbus_release_data_ref(struct kref *ref)
+{
+	struct kbus_data_ptr *refdata = container_of(ref,
+						     struct kbus_data_ptr,
+						     refcount);
+	if (refdata->parts == NULL) {
+		/* Not that I think this can happen */
+		pr_err("kbus: Removing data reference,"
+		       " but data ptr already freed\n");
+	} else {
+		int jj;
+		if (refdata->as_pages)
+			for (jj = 0; jj < refdata->num_parts; jj++)
+				free_page((unsigned long)refdata->parts[jj]);
+		else
+			for (jj = 0; jj < refdata->num_parts; jj++)
+				kfree((void *)refdata->parts[jj]);
+		kfree(refdata->parts);
+		kfree(refdata->lengths);
+		refdata->parts = NULL;
+		refdata->lengths = NULL;
+	}
+	kfree(refdata);
+}
+
+/*
+ * Forget a reference to our pointer, and if no-one cares anymore, free it and
+ * its contents.
+ */
+static void kbus_lower_data_ref(struct kbus_data_ptr *refdata)
+{
+	if (refdata == NULL)
+		return;
+	kref_put(&refdata->refcount, kbus_release_data_ref);
+}
+
+/*
+ * Wrap a string in a reference. Does not take a copy of the string,
+ * but note that the release mechanism (triggered when there are no more
+ * references to the string) *will* free it.
+ */
+static struct kbus_name_ptr *kbus_wrap_name_in_ref(char *str)
+{
+	struct kbus_name_ptr *new = NULL;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return NULL;
+
+	new->name = str;
+	kref_init(&new->refcount);
+	return new;
+}
+
+/*
+ * Increment the reference count for a string reference
+ *
+ * Returns the (same) reference, for convenience.
+ */
+static struct kbus_name_ptr *kbus_raise_name_ref(struct kbus_name_ptr *refname)
+{
+	if (refname != NULL)
+		kref_get(&refname->refcount);
+	return refname;
+}
+
+/*
+ * Data release callback for string reference pointers.
+ * Called when the reference count says to...
+ */
+static void kbus_release_name_ref(struct kref *ref)
+{
+	struct kbus_name_ptr *refname = container_of(ref,
+						     struct kbus_name_ptr,
+						     refcount);
+	if (refname->name == NULL) {
+		/* Not that I think this can happen */
+		pr_err("kbus: Removing name reference,"
+		       " but name ptr already freed\n");
+	} else {
+		kfree(refname->name);
+		refname->name = NULL;
+	}
+	kfree(refname);
+}
+
+/*
+ * Forget a reference to our string, and if no-one cares anymore, free it and
+ * its contents.
+ */
+static void kbus_lower_name_ref(struct kbus_name_ptr *refname)
+{
+	if (refname == NULL)
+		return;
+
+	kref_put(&refname->refcount, kbus_release_name_ref);
+}
+
+/*
+ * Return a stab at the next size for an array
+ */
+static u32 kbus_next_size(u32 old_size)
+{
+	if (old_size < 16)
+		/* For very small numbers, just double */
+		return old_size << 1;
+	/* Otherwise, try something like the mechanism used for Python
+	 * lists - doubling feels a bit over the top */
+	return old_size + (old_size >> 3);
+}
+
+/* Determine (and return) the next message serial number */
+static u32 kbus_next_serial_num(struct kbus_dev *dev)
+{
+	if (dev->next_msg_serial_num == 0)
+		dev->next_msg_serial_num++;
+	return dev->next_msg_serial_num++;
+}
+
+static int kbus_same_message_id(struct kbus_msg_id *msg_id,
+				u32 network_id, u32 serial_num)
+{
+	return msg_id->network_id == network_id &&
+	    msg_id->serial_num == serial_num;
+}
+
+static int kbus_init_msg_id_memory(struct kbus_private_data *priv)
+{
+	struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+	struct kbus_msg_id *ids;
+
+	ids = kmalloc(sizeof(*ids) * KBUS_INIT_MSG_ID_MEMSIZE, GFP_KERNEL);
+	if (!ids)
+		return -ENOMEM;
+
+	memset(ids, 0, sizeof(*ids) * KBUS_INIT_MSG_ID_MEMSIZE);
+
+	mem->count = 0;
+	mem->max_count = 0;
+	mem->ids = ids;
+	mem->size = KBUS_INIT_MSG_ID_MEMSIZE;
+	return 0;
+}
+
+static void kbus_empty_msg_id_memory(struct kbus_private_data *priv)
+{
+	struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+
+	if (mem->ids == NULL)
+		return;
+
+	kfree(mem->ids);
+	mem->ids = NULL;
+	mem->size = 0;
+	mem->max_count = 0;
+	mem->count = 0;
+}
+
+/*
+ * Note we don't worry about whether the id is already in there - if
+ * the user cares, that's up to them (I don't think I do)
+ */
+static int kbus_remember_msg_id(struct kbus_private_data *priv,
+				struct kbus_msg_id *id)
+{
+	struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+	int ii, which;
+
+	kbus_maybe_dbg(priv->dev, "  %u Remembering outstanding"
+		       " request %u:%u (count->%d)\n",
+		       priv->id, id->network_id, id->serial_num, mem->count+1);
+
+	/* First, try for an empty slot we can re-use */
+	for (ii = 0; ii < mem->size; ii++) {
+		if (kbus_same_message_id(&mem->ids[ii], 0, 0)) {
+			which = ii;
+			goto done;
+		}
+
+	}
+	/* Otherwise, give in and use a new one */
+	if (mem->count == mem->size) {
+		u32 old_size = mem->size;
+		u32 new_size = kbus_next_size(old_size);
+
+		kbus_maybe_dbg(priv->dev, "  %u XXX outstanding"
+			       " request array size %u -> %u\n",
+			       priv->id, old_size, new_size);
+
+		mem->ids = krealloc(mem->ids,
+				    new_size * sizeof(struct kbus_msg_id),
+				    GFP_KERNEL);
+		if (!mem->ids)
+			return -EFAULT;
+		for (ii = old_size; ii < new_size; ii++) {
+			mem->ids[ii].network_id = 0;
+			mem->ids[ii].serial_num = 0;
+		}
+		mem->size = new_size;
+		which = mem->count;
+	}
+	which = mem->count;
+done:
+	mem->ids[which] = *id;
+	mem->count++;
+	if (mem->count > mem->max_count)
+		mem->max_count = mem->count;
+	return 0;
+}
+
+/* Returns 0 if we found it, -1 if we couldn't find it */
+static int kbus_find_msg_id(struct kbus_private_data *priv,
+			    struct kbus_msg_id *id)
+{
+	struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+	int ii;
+	for (ii = 0; ii < mem->size; ii++) {
+		if (!kbus_same_message_id(&mem->ids[ii],
+					  id->network_id, id->serial_num))
+			continue;
+		kbus_maybe_dbg(priv->dev, "  %u Found outstanding "
+			       "request %u:%u (count=%d)\n",
+			       priv->id, id->network_id,
+			       id->serial_num, mem->count);
+		return 0;
+	}
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Could not find outstanding "
+		       "request %u:%u (count=%d)\n",
+		       priv->id, id->network_id,
+		       id->serial_num, mem->count);
+	return -1;
+}
+
+/* Returns 0 if we found and forgot it, -1 if we couldn't find it */
+static int kbus_forget_msg_id(struct kbus_private_data *priv,
+			      struct kbus_msg_id *id)
+{
+	struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+	int ii;
+	for (ii = 0; ii < mem->size; ii++) {
+		if (!kbus_same_message_id(&mem->ids[ii],
+					 id->network_id, id->serial_num))
+			continue;
+
+		mem->ids[ii].network_id = 0;
+		mem->ids[ii].serial_num = 0;
+		mem->count--;
+		kbus_maybe_dbg(priv->dev,
+			       "  %u Forgot outstanding "
+			       "request %u:%u (count<-%d)\n",
+			       priv->id, id->network_id,
+			       id->serial_num, mem->count);
+
+		return 0;
+	}
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Could not forget outstanding "
+		       "request %u:%u (count<-%d)\n",
+		       priv->id, id->network_id,
+		       id->serial_num, mem->count);
+	return -1;
+}
+
+/* A message is a reply iff 'in_reply_to' is non-zero */
+static int kbus_message_is_reply(struct kbus_msg *msg)
+{
+	return !kbus_same_message_id(&msg->in_reply_to, 0, 0);
+}
+
+/*
+ * Build a KBUS synthetic message/exception. We assume no data.
+ *
+ * The message built is a 'pointy' message.
+ *
+ * 'msg_name' is copied.
+ *
+ * Use kbus_free_message() to free this message when it is finished with.
+ */
+static struct kbus_msg
+*kbus_build_kbus_message(struct kbus_dev *dev,
+			 char *msg_name,
+			 u32 from,
+			 u32 to, struct kbus_msg_id in_reply_to)
+{
+	struct kbus_msg *new_msg;
+	struct kbus_name_ptr *name_ref;
+
+	size_t msg_name_len = strlen(msg_name);
+	char *msg_name_copy;
+
+	new_msg = kmalloc(sizeof(*new_msg), GFP_KERNEL);
+	if (!new_msg) {
+		dev_err(dev->dev, "Cannot kmalloc synthetic message\n");
+		return NULL;
+	}
+
+	msg_name_copy = kmalloc(msg_name_len + 1, GFP_KERNEL);
+	if (!msg_name_copy) {
+		dev_err(dev->dev, "Cannot kmalloc synthetic message's name\n");
+		kfree(new_msg);
+		return NULL;
+	}
+
+	strncpy(msg_name_copy, msg_name, msg_name_len);
+	msg_name_copy[msg_name_len] = '\0';
+
+	name_ref = kbus_wrap_name_in_ref(msg_name_copy);
+	if (!name_ref) {
+		dev_err(dev->dev, "Cannot kmalloc synthetic message's string ref\n");
+		kfree(new_msg);
+		kfree(msg_name_copy);
+		return NULL;
+	}
+
+	memset(new_msg, 0, sizeof(*new_msg));
+
+	new_msg->from = from;
+	new_msg->to = to;
+	new_msg->in_reply_to = in_reply_to;
+	new_msg->flags = KBUS_BIT_SYNTHETIC;
+	new_msg->name_ref = name_ref;
+	new_msg->name_len = msg_name_len;
+
+	new_msg->id.serial_num = kbus_next_serial_num(dev);
+
+	return new_msg;
+}
+
+/*
  * Given a message name, is it valid?
  *
  * We have nothing to say on maximum length.
@@ -130,6 +516,682 @@ static int kbus_bad_message_name(char *name, size_t name_len)
 }
 
 /*
+ * Is a message name wildcarded?
+ *
+ * We assume it is already checked to be a valid name
+ *
+ * Returns 1 if it is, 0 if not. In other words, returns 1
+ * if the name is not a valid destination.
+ */
+static int kbus_wildcarded_message_name(char *name, size_t name_len)
+{
+	return name[name_len - 1] == '*' || name[name_len - 1] == '%';
+}
+
+/*
+ * Is a message name legitimate for writing/sending?
+ *
+ * This is an omnibus call of the last two checks, with error output.
+ *
+ * Returns 0 if it's OK, 1 if it's naughty
+ */
+static int kbus_invalid_message_name(struct kbus_dev *dev,
+				     char *name, size_t name_len)
+{
+	if (kbus_bad_message_name(name, name_len)) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " (send) message name '%.*s' is not allowed\n",
+		       current->pid, current->comm, (int)name_len, name);
+		return 1;
+	}
+	if (kbus_wildcarded_message_name(name, name_len)) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " (send) sending to wildcards not allowed, "
+		       "message name '%.*s'\n",
+		       current->pid, current->comm, (int)name_len, name);
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Does this message name match the given binding?
+ *
+ * The binding may be a normal message name, or a wildcard.
+ *
+ * We assume that both names are legitimate.
+ */
+static int kbus_message_name_matches(char *name, size_t name_len, char *other)
+{
+	size_t other_len = strlen(other);
+
+	if (other[other_len - 1] == '*' || other[other_len - 1] == '%') {
+		char *rest = name + other_len - 1;
+		size_t rest_len = name_len - other_len + 1;
+
+		/*
+		 * If we have '$.Fred.*', then we need at least '$.Fred.X'
+		 * to match
+		 */
+		if (name_len < other_len)
+			return false;
+		/*
+		 * Does the name match all of the wildcard except the
+		 * last character?
+		 */
+		if (strncmp(other, name, other_len - 1))
+			return false;
+
+		/* '*' matches anything at all, so we're done */
+		if (other[other_len - 1] == '*')
+			return true;
+
+		/* '%' only matches if we don't have another dot */
+		if (strnchr(rest, rest_len, '.'))
+			return false;
+		else
+			return true;
+	} else {
+		if (name_len != other_len)
+			return false;
+		else
+			return !strncmp(name, other, name_len);
+	}
+}
+
+/*
+ * Check if a message read by kbus_write() is well formed
+ *
+ * Return 0 if a message is well-formed, negative otherwise.
+ */
+static int kbus_check_message_written(struct kbus_dev *dev,
+				      struct kbus_write_msg *this)
+{
+	struct kbus_message_header *user_msg =
+	    (struct kbus_message_header *)&this->user_msg;
+
+	if (this == NULL) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " Tried to check NULL message\n",
+		       current->pid, current->comm);
+		return -EINVAL;
+	}
+
+	if (user_msg->start_guard != KBUS_MSG_START_GUARD) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " message start guard is %08x, not %08x",
+		       current->pid, current->comm,
+		       user_msg->start_guard, KBUS_MSG_START_GUARD);
+		return -EINVAL;
+	}
+	if (user_msg->end_guard != KBUS_MSG_END_GUARD) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " message end guard is %08x, not %08x\n",
+		       current->pid, current->comm,
+		       user_msg->end_guard, KBUS_MSG_END_GUARD);
+		return -EINVAL;
+	}
+
+	if (user_msg->name_len == 0) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " Message name length is 0\n",
+		       current->pid, current->comm);
+		return -EINVAL;
+	}
+	if (user_msg->name_len > KBUS_MAX_NAME_LEN) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " Message name length is %u, more than %u\n",
+		       current->pid, current->comm,
+		       user_msg->name_len, KBUS_MAX_NAME_LEN);
+		return -ENAMETOOLONG;
+	}
+
+	if (user_msg->name == NULL) {
+		if (user_msg->data != NULL) {
+			dev_err(dev->dev, "pid %u [%s]"
+			       " Message name is inline, data is not\n",
+			       current->pid, current->comm);
+			return -EINVAL;
+		}
+	} else {
+		if (user_msg->data == NULL && user_msg->data_len != 0) {
+			dev_err(dev->dev, "pid %u [%s]"
+			       " Message data is inline, name is not\n",
+			       current->pid, current->comm);
+			return -EINVAL;
+		}
+	}
+
+	if (user_msg->data_len == 0 && user_msg->data != NULL) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " Message data length is 0, but data pointer is set\n",
+		       current->pid, current->comm);
+		return -EINVAL;
+	}
+
+	/* It's not legal to set both ALL_OR_WAIT and ALL_OR_FAIL */
+	if ((user_msg->flags & KBUS_BIT_ALL_OR_WAIT) &&
+	    (user_msg->flags & KBUS_BIT_ALL_OR_FAIL)) {
+		dev_err(dev->dev, "pid %u [%s]"
+		       " Message cannot have both ALL_OR_WAIT and "
+		       "ALL_OR_FAIL set\n",
+		       current->pid, current->comm);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Output a description of an in-kernel message
+ */
+static void kbus_maybe_report_message(struct kbus_dev *dev __maybe_unused,
+				      struct kbus_msg *msg __maybe_unused)
+{
+	if (msg->data_len) {
+		struct kbus_data_ptr *data_p = msg->data_ref;
+		uint8_t *part0 __maybe_unused = (uint8_t *) data_p->parts[0];
+		kbus_maybe_dbg(dev, "=== %u:%u '%.*s'"
+		       " to %u from %u in-reply-to %u:%u orig %u,%u "
+		       "final %u:%u flags %04x:%04x"
+		       " data/%u<in%u> %02x.%02x.%02x.%02x\n",
+		       msg->id.network_id, msg->id.serial_num,
+		       msg->name_len, msg->name_ref->name,
+		       msg->to, msg->from,
+		       msg->in_reply_to.network_id, msg->in_reply_to.serial_num,
+		       msg->orig_from.network_id, msg->orig_from.local_id,
+		       msg->final_to.network_id, msg->final_to.local_id,
+		       (msg->flags & 0xFFFF0000) >> 4,
+		       (msg->flags & 0x0000FFFF), msg->data_len,
+		       data_p->num_parts, part0[0], part0[1], part0[2],
+		       part0[3]);
+	} else {
+		kbus_maybe_dbg(dev, "=== %u:%u '%.*s'"
+		       " to %u from %u in-reply-to %u:%u orig %u,%u "
+		       "final %u,%u flags %04x:%04x\n",
+		       msg->id.network_id, msg->id.serial_num,
+		       msg->name_len, msg->name_ref->name,
+		       msg->to, msg->from,
+		       msg->in_reply_to.network_id, msg->in_reply_to.serial_num,
+		       msg->orig_from.network_id, msg->orig_from.local_id,
+		       msg->final_to.network_id, msg->final_to.local_id,
+		       (msg->flags & 0xFFFF0000) >> 4,
+		       (msg->flags & 0x0000FFFF));
+	}
+}
+
+/*
+ * Copy a message, doing whatever is deemed necessary.
+ *
+ * Copies the message header, and also copies the message name and any
+ * data. The message must be a 'pointy' message with reference counted
+ * name and data.
+ */
+static struct kbus_msg *kbus_copy_message(struct kbus_dev *dev,
+					  struct kbus_msg *old_msg)
+{
+	struct kbus_msg *new_msg;
+
+	new_msg = kmalloc(sizeof(*new_msg), GFP_KERNEL);
+	if (!new_msg) {
+		dev_err(dev->dev, "Cannot kmalloc copy of message header\n");
+		return NULL;
+	}
+	if (!memcpy(new_msg, old_msg, sizeof(*new_msg))) {
+		dev_err(dev->dev, "Cannot copy message header\n");
+		kfree(new_msg);
+		return NULL;
+	}
+
+	/* In case of error before we're finished... */
+	new_msg->name_ref = NULL;
+	new_msg->data_ref = NULL;
+
+	new_msg->name_ref = kbus_raise_name_ref(old_msg->name_ref);
+
+	if (new_msg->data_len)
+		/* Take a new reference to the data */
+		new_msg->data_ref = kbus_raise_data_ref(old_msg->data_ref);
+	return new_msg;
+}
+
+/*
+ * Free a message.
+ *
+ * Also dereferences the message name and any message data.
+ */
+static void kbus_free_message(struct kbus_msg *msg)
+{
+	if (msg->name_ref)
+		kbus_lower_name_ref(msg->name_ref);
+	msg->name_len = 0;
+	msg->name_ref = NULL;
+
+	if (msg->data_len && msg->data_ref)
+		kbus_lower_data_ref(msg->data_ref);
+
+	msg->data_len = 0;
+	msg->data_ref = NULL;
+	kfree(msg);
+}
+
+static void kbus_empty_read_msg(struct kbus_private_data *priv)
+{
+	struct kbus_read_msg *this = &(priv->read);
+	int ii;
+
+	if (this->msg) {
+		kbus_free_message(this->msg);
+		this->msg = NULL;
+	}
+
+	for (ii = 0; ii < KBUS_NUM_PARTS; ii++) {
+		this->parts[ii] = NULL;
+		this->lengths[ii] = 0;
+	}
+	this->which = 0;
+	this->pos = 0;
+	this->ref_data_index = 0;
+}
+
+static void kbus_empty_write_msg(struct kbus_private_data *priv)
+{
+	struct kbus_write_msg *this = &priv->write;
+	if (this->msg) {
+		kbus_free_message(this->msg);
+		this->msg = NULL;
+	}
+
+	if (this->ref_name) {
+		kbus_lower_name_ref(this->ref_name);
+		this->ref_name = NULL;
+	}
+
+	if (this->ref_data) {
+		kbus_lower_data_ref(this->ref_data);
+		this->ref_data = NULL;
+	}
+
+	this->is_finished = false;
+	this->pos = 0;
+	this->which = 0;
+}
+
+/*
+ * Copy the given message, and add it to the end of the queue.
+ *
+ * This is the *only* way of adding a message to a queue. It shall remain so.
+ *
+ * We assume the message has been checked for sanity.
+ *
+ * 'msg' is the message to add to the queue.
+ *
+ * 'binding' is a pointer to the KBUS message name binding that caused the
+ * message to be added.
+ *
+ * 'for_replier' is true if this particular message is being pushed to the
+ * message's replier's queue. Specifically, it's true if this is a Reply
+ * to this Ksock, or a Request aimed at this Ksock (as Replier).
+ *
+ * Returns 0 if all goes well, or -EFAULT/-ENOMEM if we can't allocate
+ * datastructures.
+ *
+ * May also return negative values if the message is mis-named or malformed,
+ * at least at the moment.
+ */
+static int kbus_push_message(struct kbus_private_data *priv,
+			     struct kbus_msg *msg,
+			     struct kbus_message_binding *binding,
+			     int for_replier)
+{
+	struct list_head *queue = &priv->message_queue;
+	struct kbus_msg *new_msg = NULL;
+	struct kbus_message_queue_item *item;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Pushing message onto queue (%s)\n",
+		       priv->id, for_replier ? "replier" : "listener");
+
+	new_msg = kbus_copy_message(priv->dev, msg);
+	if (!new_msg)
+		return -EFAULT;
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item) {
+		dev_err(priv->dev->dev, "Cannot kmalloc new message item\n");
+		kbus_free_message(new_msg);
+		return -ENOMEM;
+	}
+	kbus_maybe_report_message(priv->dev, new_msg);
+
+	if (for_replier && (KBUS_BIT_WANT_A_REPLY & msg->flags)) {
+		/*
+		 * This message wants a reply, and is for the message's
+		 * replier, so they need to be told that they are to reply to
+		 * this message
+		 */
+		new_msg->flags |= KBUS_BIT_WANT_YOU_TO_REPLY;
+		kbus_maybe_dbg(priv->dev,
+			       "  Setting WANT_YOU_TO_REPLY, "
+			       "flags %08x\n",
+			       new_msg->flags);
+	} else {
+		/*
+		 * The recipient is *not* the replier for this message,
+		 * so it is not responsible for replying.
+		 */
+		new_msg->flags &= ~KBUS_BIT_WANT_YOU_TO_REPLY;
+	}
+
+	/* And join it up... */
+	item->msg = new_msg;
+	item->binding = binding;
+
+	/* By default, we're using the list as a FIFO, so we want to add our
+	 * new message to the end (just before the first item). However, if the
+	 * URGENT flag is set, then we instead want to add it to the start.
+	 */
+	if (msg->flags & KBUS_BIT_URGENT) {
+		kbus_maybe_dbg(priv->dev, "  Message is URGENT\n");
+		list_add(&item->list, queue);
+	} else {
+		list_add_tail(&item->list, queue);
+	}
+
+	priv->message_count++;
+
+	if (!kbus_same_message_id(&msg->in_reply_to, 0, 0)) {
+		/*
+		 * If it's a reply (and this will include a synthetic reply,
+		 * since we're checking the "in_reply_to" field) then the
+		 * original sender has now had its request satisfied.
+		 */
+		int retval = kbus_forget_msg_id(priv, &msg->in_reply_to);
+
+		if (retval)
+			/* But there's not much we can do about it */
+			dev_err(priv->dev->dev,
+			       "%u Error forgetting "
+			       "outstanding request %u:%u\n",
+			       priv->id, msg->in_reply_to.network_id,
+			       msg->in_reply_to.serial_num);
+	}
+
+	/* And indicate that there is something available to read */
+	wake_up_interruptible(&priv->read_wait);
+
+	kbus_maybe_dbg(priv->dev,
+		       "%u Leaving %d message%s in queue\n",
+		       priv->id, priv->message_count,
+		       priv->message_count == 1 ? "" : "s");
+
+	return 0;
+}
+
+/*
+ * Generate a synthetic message, and add it to the recipient's message queue.
+ *
+ * This is to be used when a Reply is not going to be generated
+ * by the intended Replier. Since we don't want KBUS itself to block on
+ * (trying to) SEND a message to someone not expecting it, I don't think
+ * there are any other occasions when it is useful.
+ *
+ * 'from' is the id of the recipient who has gone away, not received the
+ * message, or whatever.
+ *
+ * 'to' is the 'from' for the message we're bouncing (or whatever). This
+ * needs to be local (it cannot be on another network), so we don't specify
+ * the network id.
+ *
+ * 'in_reply_to' should be the message id of that same message.
+ *
+ * Note that the message is essentially a Reply, so it only goes to the
+ * original Sender.
+ *
+ * Doesn't return anything since I can't think of anything useful to do if it
+ * goes wrong.
+ */
+static void kbus_push_synthetic_message(struct kbus_dev *dev,
+					u32 from,
+					u32 to,
+					struct kbus_msg_id in_reply_to,
+					char *name)
+{
+	struct kbus_private_data *priv = NULL;
+	struct kbus_msg *new_msg;
+
+	/* Who *was* the original message to? */
+	priv = kbus_find_open_ksock(dev, to);
+	if (!priv) {
+		dev_err(dev->dev,
+		       "pid %u [%s] Cannot send synthetic reply to %u,"
+		       " as they are gone\n", current->pid, current->comm, to);
+		return;
+	}
+
+	kbus_maybe_dbg(priv->dev, "  Pushing synthetic message '%s'"
+		       " onto queue for %u\n", name, to);
+
+	/*
+	 * Note that we do not check if the destination queue is full
+	 * - we're going to trust that the "keep enough room in the
+	 * message queue for a reply to each request" mechanism does
+	 * it's job properly.
+	 */
+
+	new_msg = kbus_build_kbus_message(dev, name, from, to, in_reply_to);
+	if (!new_msg)
+		return;
+
+	(void)kbus_push_message(priv, new_msg, NULL, false);
+	/* ignore retval; we can't do anything useful if this goes wrong */
+
+	/* kbus_push_message takes a copy of our message */
+	kbus_free_message(new_msg);
+}
+
+/*
+ * Pop the next message off our queue.
+ *
+ * Returns a pointer to the message, or NULL if there is no next message.
+ */
+static struct kbus_msg *kbus_pop_message(struct kbus_private_data *priv)
+{
+	struct list_head *queue = &priv->message_queue;
+	struct kbus_message_queue_item *item;
+	struct kbus_msg *msg = NULL;
+
+	kbus_maybe_dbg(priv->dev, "  %u Popping message from queue\n",
+				   priv->id);
+
+	if (list_empty(queue))
+		return NULL;
+
+	/* Retrieve the next message */
+	item = list_first_entry(queue, struct kbus_message_queue_item, list);
+
+	/* And straightway remove it from the list */
+	list_del(&item->list);
+
+	priv->message_count--;
+
+	msg = item->msg;
+	kfree(item);
+
+	/* If doing that made us go from no-room to some-room, wake up */
+	if (priv->message_count == (priv->max_messages - 1))
+		wake_up_interruptible(&priv->dev->write_wait);
+
+	kbus_maybe_report_message(priv->dev, msg);
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s in queue\n",
+		       priv->id, priv->message_count,
+		       priv->message_count == 1 ? "" : "s");
+
+	return msg;
+}
+
+/*
+ * Empty a message queue. Send synthetic messages for any outstanding
+ * request messages that are now not going to be delivered/replied to.
+ */
+static void kbus_empty_message_queue(struct kbus_private_data *priv)
+{
+	struct list_head *queue = &priv->message_queue;
+	struct kbus_message_queue_item *ptr;
+	struct kbus_message_queue_item *next;
+
+	kbus_maybe_dbg(priv->dev, "  %u Emptying message queue\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, queue, list) {
+		struct kbus_msg *msg = ptr->msg;
+		int is_OUR_request = (KBUS_BIT_WANT_YOU_TO_REPLY & msg->flags);
+
+		kbus_maybe_report_message(priv->dev, msg);
+
+		/*
+		 * If it wanted a reply (from us). let the sender know it's
+		 * going away (but take care not to send a message to
+		 * ourselves, by accident!)
+		 */
+		if (is_OUR_request && msg->to != priv->id)
+			kbus_push_synthetic_message(priv->dev, priv->id,
+					    msg->from, msg->id,
+					    KBUS_MSG_NAME_REPLIER_GONEAWAY);
+
+		list_del(&ptr->list);
+		kbus_free_message(ptr->msg);
+
+		priv->message_count--;
+	}
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s in queue\n",
+		       priv->id, priv->message_count,
+		       priv->message_count == 1 ? "" : "s");
+}
+
+/*
+ * Add a message to the list of messages read by the replier, but still needing
+ * a reply.
+ */
+static int kbus_reply_needed(struct kbus_private_data *priv,
+			     struct kbus_msg *msg)
+{
+	struct list_head *queue = &priv->replies_unsent;
+	struct kbus_unreplied_item *item;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Adding message %u:%u to unsent "
+		       "replies list\n",
+		       priv->id, msg->id.network_id,
+		       msg->id.serial_num);
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item) {
+		dev_err(priv->dev->dev, "Cannot kmalloc reply-needed item\n");
+		return -ENOMEM;
+	}
+
+	item->id = msg->id;
+	item->from = msg->from;
+	item->name_len = msg->name_len;
+	/*
+	 * It seems sensible to use a reference to the name. I believe
+	 * we are safe to do this because we have the message "in hand".
+	 */
+	item->name_ref = kbus_raise_name_ref(msg->name_ref);
+
+	list_add(&item->list, queue);
+
+	priv->num_replies_unsent++;
+
+	if (priv->num_replies_unsent > priv->max_replies_unsent)
+		priv->max_replies_unsent = priv->num_replies_unsent;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s unreplied-to\n",
+		       priv->id, priv->num_replies_unsent,
+		       priv->num_replies_unsent == 1 ? "" : "s");
+
+	return 0;
+}
+
+/*
+ * Remove a message from the list of (read) messages needing a reply
+ *
+ * Returns 0 on success, -1 if it could not find the message
+ */
+static int kbus_reply_now_sent(struct kbus_private_data *priv,
+			       struct kbus_msg_id *msg_id)
+{
+	struct list_head *queue = &priv->replies_unsent;
+	struct kbus_unreplied_item *ptr;
+	struct kbus_unreplied_item *next;
+
+	list_for_each_entry_safe(ptr, next, queue, list) {
+		if (!kbus_same_message_id(&ptr->id,
+					 msg_id->network_id,
+					 msg_id->serial_num))
+			continue;
+
+		kbus_maybe_dbg(priv->dev,
+		       "  %u Reply to %u:%u %.*s now sent\n",
+		       priv->id, msg_id->network_id,
+		       msg_id->serial_num, ptr->name_len, ptr->name_ref->name);
+
+		list_del(&ptr->list);
+		kbus_lower_name_ref(ptr->name_ref);
+		kfree(ptr);
+
+		priv->num_replies_unsent--;
+
+		kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s unreplied-to\n",
+		       priv->id, priv->num_replies_unsent,
+		       priv->num_replies_unsent == 1 ? "" : "s");
+
+		return 0;
+	}
+
+	dev_err(priv->dev->dev, "%u Could not find message %u:%u in unsent "
+	       "replies list\n",
+	       priv->id, msg_id->network_id, msg_id->serial_num);
+	return -1;
+}
+
+/*
+ * Empty our "replies unsent" queue. Send synthetic messages for any
+ * request messages that are now not going to be replied to.
+ */
+static void kbus_empty_replies_unsent(struct kbus_private_data *priv)
+{
+	struct list_head *queue = &priv->replies_unsent;
+	struct kbus_unreplied_item *ptr;
+	struct kbus_unreplied_item *next;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Emptying unreplied messages list\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, queue, list) {
+
+		kbus_push_synthetic_message(priv->dev, priv->id,
+					    ptr->from, ptr->id,
+					    KBUS_MSG_NAME_REPLIER_IGNORED);
+
+		list_del(&ptr->list);
+		kbus_lower_name_ref(ptr->name_ref);
+		kfree(ptr);
+
+		priv->num_replies_unsent--;
+	}
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s unreplied-to\n",
+		       priv->id, priv->num_replies_unsent,
+		       priv->num_replies_unsent == 1 ? "" : "s");
+}
+
+/*
  * Find out who, if anyone, is bound as a replier to the given message name.
  *
  * Returns 1 if we found a replier, 0 if we did not (but all went well), and
@@ -156,12 +1218,139 @@ static int kbus_find_replier(struct kbus_dev *dev,
 		    strncmp(name, ptr->name, name_len))
 			continue;
 
-		kbus_maybe_dbg(dev, "  '%.*s' has replier %u\n",
-			       ptr->name_len, ptr->name, ptr->bound_to_id);
-		*bound_to = ptr->bound_to;
-		return 1;
+		kbus_maybe_dbg(dev, "  '%.*s' has replier %u\n",
+			       ptr->name_len, ptr->name, ptr->bound_to_id);
+		*bound_to = ptr->bound_to;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Find out who, if anyone, is bound as listener/replier to this message name.
+ *
+ * 'listeners' is an array of (pointers to) listener bindings. It may be NULL
+ * (if there are no listeners or if there was an error). It is up to the caller
+ * to free it. It does not include (pointers to) any replier binding.
+ *
+ * If there is also a replier for this message, then 'replier' will be (a
+ * pointer to) its binding, otherwise it will be NULL. The replier will not be
+ * in the 'listeners' array, so the caller must check both.
+ *
+ * Note that a particular listener may be present more than once, if that
+ * particular listener has bound to the message more than once (but no
+ * *binding* will be represented more than once).
+ *
+ * Returns the number of listeners found (i.e., the length of the array), or a
+ * negative value if something went wrong. This is a bit clumsy, because the
+ * caller needs to check the return value *and* the 'replier' value, but there
+ * is only one caller, so...
+ */
+static int kbus_find_listeners(struct kbus_dev *dev,
+			       struct kbus_message_binding **listeners[],
+			       struct kbus_message_binding **replier,
+			       u32 name_len, char *name)
+{
+	int count = 0;
+	int array_size = KBUS_INIT_LISTENER_ARRAY_SIZE;
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	enum kbus_replier_type replier_type = UNSET;
+	enum kbus_replier_type new_replier_type = UNSET;
+
+	kbus_maybe_dbg(dev,
+		       "  Looking for listeners/repliers for '%.*s'\n",
+		       name_len, name);
+
+	*listeners =
+	    kmalloc(array_size * sizeof(struct kbus_message_binding *),
+		    GFP_KERNEL);
+	if (!(*listeners))
+		return -ENOMEM;
+
+	*replier = NULL;
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+
+		if (!kbus_message_name_matches(name, name_len, ptr->name))
+			continue;
+
+		kbus_maybe_dbg(dev, "     Name '%.*s' matches "
+			       "'%s' for %s %u\n",
+			       name_len, name, ptr->name,
+			       ptr->is_replier ? "replier" : "listener",
+			       ptr->bound_to_id);
+
+		if (ptr->is_replier) {
+			/* It *may* be the replier for this message */
+			size_t last_char = strlen(ptr->name) - 1;
+			if (ptr->name[last_char] == '*')
+				new_replier_type = WILD_STAR;
+			else if (ptr->name[last_char] == '%')
+				new_replier_type = WILD_PERCENT;
+			else
+				new_replier_type = SPECIFIC;
+
+			kbus_maybe_dbg(dev,
+				"     ..previous replier was %u "
+				"(%s), looking at %u (%s)\n",
+				((*replier) == NULL ? 0 :
+					(*replier)->bound_to_id),
+				kbus_replier_type_name(replier_type),
+				ptr->bound_to_id,
+				kbus_replier_type_name(new_replier_type));
+
+			/*
+			 * If this is the first replier, just remember
+			 * it. Otherwise, if it's more specific than
+			 * our previous replier, remember it instead.
+			 */
+			if (*replier == NULL ||
+			    new_replier_type > replier_type) {
+
+				if (*replier)
+					kbus_maybe_dbg(dev,
+					   "     ..new replier %u (%s)\n",
+					   ptr->bound_to_id,
+					   kbus_replier_type_name(
+					   new_replier_type));
+
+				*replier = ptr;
+				replier_type = new_replier_type;
+			} else {
+			    if (*replier)
+				kbus_maybe_dbg(dev,
+				       "     ..keeping replier %u (%s)\n",
+				       (*replier)->bound_to_id,
+				       kbus_replier_type_name(replier_type));
+			}
+		} else {
+			/* It is a listener */
+			if (count == array_size) {
+				u32 new_size = kbus_next_size(array_size);
+
+				kbus_maybe_dbg(dev, "     XXX listener "
+				       "array size %d -> %d\n",
+				       array_size, new_size);
+
+				array_size = new_size;
+				*listeners = krealloc(*listeners,
+				      sizeof(**listeners) * array_size,
+				      GFP_KERNEL);
+				if (!(*listeners))
+					return -EFAULT;
+			}
+			(*listeners)[count++] = ptr;
+		}
 	}
-	return 0;
+
+	kbus_maybe_dbg(dev, "     Found %d listener%s%s for '%.*s'\n",
+		       count, (count == 1 ? "" : "s"),
+		       (*replier == NULL ? "" : " and a replier"),
+		       name_len, name);
+
+	return count;
 }
 
 /*
@@ -248,6 +1437,69 @@ static struct kbus_message_binding
 }
 
 /*
+ * Forget any messages (in our queue) that were only in the queue because of
+ * the binding we're removing.
+ *
+ * If the message was a request (needing a reply) generate an appropriate
+ * synthetic message.
+ */
+static void kbus_forget_matching_messages(struct kbus_private_data *priv,
+					  struct kbus_message_binding *binding)
+{
+	struct list_head *queue = &priv->message_queue;
+	struct kbus_message_queue_item *ptr;
+	struct kbus_message_queue_item *next;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Forgetting matching messages\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, queue, list) {
+		struct kbus_msg *msg = ptr->msg;
+		int is_OUR_request = (KBUS_BIT_WANT_YOU_TO_REPLY & msg->flags);
+
+		/*
+		 * If this message was not added to the queue because of this
+		 * binding, then we are not interested in it...
+		 */
+		if (ptr->binding != binding)
+			continue;
+
+		kbus_maybe_dbg(priv->dev,
+				"  Deleting message from queue\n");
+		kbus_maybe_report_message(priv->dev, msg);
+
+		/*
+		 * If it wanted a reply (from us). let the sender know it's
+		 * going away (but take care not to send a message to
+		 * ourselves, by accident!)
+		 */
+		if (is_OUR_request && msg->to != priv->id) {
+
+			kbus_maybe_dbg(priv->dev, "  >>> is_OUR_request,"
+				       " sending fake reply\n");
+			kbus_maybe_report_message(priv->dev, msg);
+			kbus_push_synthetic_message(priv->dev, priv->id,
+					    msg->from, msg->id,
+					    KBUS_MSG_NAME_REPLIER_UNBOUND);
+		}
+
+		list_del(&ptr->list);
+		kbus_free_message(ptr->msg);
+
+		priv->message_count--;
+
+		/* If that made us go from no-room to some-room, wake up */
+		if (priv->message_count == (priv->max_messages - 1))
+			wake_up_interruptible(&priv->dev->write_wait);
+	}
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Leaving %d message%s in queue\n",
+		       priv->id, priv->message_count,
+		       priv->message_count == 1 ? "" : "s");
+}
+
+/*
  * Remove an existing binding.
  *
  * Returns 0 if all went well, a negative value if it did not.
@@ -273,9 +1525,15 @@ static int kbus_forget_binding(struct kbus_dev *dev,
 		       (binding->is_replier ? 'R' : 'L'),
 		       binding->name_len, binding->name);
 
+	/* And forget any messages we now shouldn't receive */
+	kbus_forget_matching_messages(priv, binding);
+
 	/*
-	 * If we supported sending messages (yet), we'd need to forget
-	 * any messages in our queue that match this binding.
+	 * We carefully don't try to do anything about requests that
+	 * have already been read - the fact that the user has unbound
+	 * from receiving new messages with this name doesn't imply
+	 * anything about whether they're going to reply to requests
+	 * (with that name) which they've already read.
 	 */
 
 	/* And remove the binding once that has been done. */
@@ -358,6 +1616,27 @@ static int kbus_remember_open_ksock(struct kbus_dev *dev,
 }
 
 /*
+ * Retrieve the pointer to an open file's data
+ *
+ * Return NULL if we can't find it.
+ */
+static struct kbus_private_data *kbus_find_open_ksock(struct kbus_dev *dev,
+						      u32 id)
+{
+	struct kbus_private_data *ptr;
+	struct kbus_private_data *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+		if (id == ptr->id) {
+			kbus_maybe_dbg(dev, "  Found open Ksock %u\n", id);
+			return ptr;
+		}
+	}
+	kbus_maybe_dbg(dev, "  Could not find open Ksock %u\n", id);
+	return NULL;
+}
+
+/*
  * Remove an open file remembrance.
  *
  * Returns 0 if all went well, -EINVAL if we couldn't find the open Ksock
@@ -446,38 +1725,820 @@ static int kbus_open(struct inode *inode, struct file *filp)
 	priv->num_replies_unsent = 0;
 	priv->max_replies_unsent = 0;
 
+	if (kbus_init_msg_id_memory(priv)) {
+		kbus_empty_read_msg(priv);
+		kfree(priv);
+		return -EFAULT;
+	}
 	INIT_LIST_HEAD(&priv->message_queue);
 	INIT_LIST_HEAD(&priv->replies_unsent);
 
-	(void)kbus_remember_open_ksock(dev, priv);
+	init_waitqueue_head(&priv->read_wait);
+
+	/* Note that we immediately have a space available for a message */
+	wake_up_interruptible(&dev->write_wait);
+
+	(void)kbus_remember_open_ksock(dev, priv);
+
+	filp->private_data = priv;
+
+	mutex_unlock(&dev->mux);
+
+	kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+
+	return 0;
+}
+
+static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+{
+	int retval2 = 0;
+	struct kbus_private_data *priv = filp->private_data;
+	struct kbus_dev *dev = priv->dev;
+
+	if (mutex_lock_interruptible(&dev->mux))
+		return -ERESTARTSYS;
+
+	kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+
+	kbus_empty_read_msg(priv);
+	kbus_empty_write_msg(priv);
+
+	kbus_empty_msg_id_memory(priv);
+
+	kbus_empty_message_queue(priv);
+	kbus_forget_my_bindings(priv);
+	kbus_empty_replies_unsent(priv);
+	retval2 = kbus_forget_open_ksock(dev, priv->id);
+	kfree(priv);
+
+	mutex_unlock(&dev->mux);
+
+	return retval2;
+}
+
+/*
+ * Determine the private data for the given listener/replier id.
+ *
+ * Return NULL if we can't find it.
+ */
+static struct kbus_private_data
+*kbus_find_private_data(struct kbus_private_data *our_priv,
+			struct kbus_dev *dev, u32 id)
+{
+	struct kbus_private_data *l_priv;
+	if (id == our_priv->id) {
+		/* Heh, it's us, we know who we are! */
+		kbus_maybe_dbg(dev, "  -- Id %u is us\n", id);
+
+		l_priv = our_priv;
+	} else {
+		/* OK, look it up */
+		kbus_maybe_dbg(dev, "  -- Looking up id %u\n", id);
+
+		l_priv = kbus_find_open_ksock(dev, id);
+	}
+	return l_priv;
+}
+
+/*
+ * Determine if the specified recipient has room for a message in their queue
+ *
+ * - 'priv' is the recipient
+ * - 'what' is a string describing them (e.g., "sender", "replier"), just
+ *   for use in debugging/grumbling
+ * - if 'is_reply' is true, then we're checking for a Reply message,
+ *   which we already know is expected by the specified recipient.
+ */
+static int kbus_queue_is_full(struct kbus_private_data *priv,
+			      char *what __maybe_unused, int is_reply)
+{
+	/*
+	 * When figuring out how "full" the message queue is, we need
+	 * to take account of the messages already in the queue (!),
+	 * and also the replies that still need to be written to the
+	 * queue.
+	 *
+	 * Of course, if we're checking because we want to send one
+	 * of the Replies that we are keeping room for, we need to
+	 * remember to account for that!
+	 */
+	int already_accounted_for = priv->message_count +
+	    priv->outstanding_requests.count;
+
+	if (is_reply)
+		already_accounted_for--;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Message queue: count %d + "
+		       "outstanding %d %s= %d, max %d\n",
+		       priv->id, priv->message_count,
+		       priv->outstanding_requests.count,
+		       (is_reply ? "-1 " : ""), already_accounted_for,
+		       priv->max_messages);
+
+	if (already_accounted_for < priv->max_messages) {
+		return false;
+	} else {
+		kbus_maybe_dbg(priv->dev,
+			       "  Message queue for %s %u is full"
+			       " (%u+%u%s > %u messages)\n", what, priv->id,
+			       priv->message_count,
+			       priv->outstanding_requests.count,
+			       (is_reply ? "-1" : ""), priv->max_messages);
+		return true;
+	}
+}
+
+/*
+ * Actually write to anyone interested in this message.
+ *
+ * Remember that the caller is going to free the message data after
+ * calling us, on the assumption that we're taking a copy...
+ *
+ * Returns 0 on success.
+ *
+ * If the message is a Request, and there is no replier for it, then we return
+ * -EADDRNOTAVAIL.
+ *
+ * If the message is a Reply, and the is sender is no longer connected (it has
+ * released its Ksock), then we return -EADDRNOTAVAIL.
+ *
+ * If the message couldn't be sent because some of the targets (those that we
+ * *have* to deliver to) had full queues, then it will return -EAGAIN or
+ * -EBUSY. If -EAGAIN is returned, then the caller should try again later, if
+ * -EBUSY then it should not.
+ *
+ * Otherwise, it returns a negative value for error.
+ */
+static int kbus_write_to_recipients(struct kbus_private_data *priv,
+				    struct kbus_dev *dev,
+				    struct kbus_msg *msg)
+{
+	struct kbus_message_binding **listeners = NULL;
+	struct kbus_message_binding *replier = NULL;
+	struct kbus_private_data *reply_to = NULL;
+	ssize_t retval = 0;
+	int num_listeners;
+	int ii;
+	int num_sent = 0;	/* # successfully "sent" */
+
+	int all_or_fail = msg->flags & KBUS_BIT_ALL_OR_FAIL;
+	int all_or_wait = msg->flags & KBUS_BIT_ALL_OR_WAIT;
+
+	kbus_maybe_dbg(priv->dev, "  all_or_fail %d, all_or_wait %d\n",
+		       all_or_fail, all_or_wait);
+
+	/*
+	 * Remember that
+	 * (a) a listener may occur more than once in our array, and
+	 * (b) we have 0 or 1 repliers, but
+	 * (c) the replier is *not* one of the listeners.
+	 */
+	num_listeners = kbus_find_listeners(dev, &listeners, &replier,
+					    msg->name_len, msg->name_ref->name);
+	if (num_listeners < 0) {
+		kbus_maybe_dbg(priv->dev,
+			       "  Error %d finding listeners\n",
+			       num_listeners);
+
+		retval = num_listeners;
+		goto done_sending;
+	}
+
+	/*
+	 * In general, we don't mind if no-one is listening, but
+	 *
+	 * a. If we want a reply, we want there to be a replier
+	 * b. If we *are* a reply, we want there to be an original sender
+	 * c. If we have the "to" field set, and we want a reply, then we
+	 *    want that specific replier to exist
+	 *
+	 * We can check the first of those immediately.
+	 */
+
+	if (msg->flags & KBUS_BIT_WANT_A_REPLY && replier == NULL) {
+		kbus_maybe_dbg(priv->dev,
+			       "  Message wants a reply, "
+			       "but no replier\n");
+		retval = -EADDRNOTAVAIL;
+		goto done_sending;
+	}
+
+	/* And we need to add it to the queue for each interested party */
+
+	/*
+	 * ===================================================================
+	 * Check if the proposed recipients *can* receive
+	 * ===================================================================
+	 */
+
+	/*
+	 * Are we replying to a sender's request?
+	 * Replies are unusual in that the recipient will not normally have
+	 * bound to the appropriate message name.
+	 */
+	if (kbus_message_is_reply(msg)) {
+		kbus_maybe_dbg(priv->dev,
+			       "  Considering sender-of-request %u\n",
+			       msg->to);
+
+		reply_to = kbus_find_private_data(priv, dev, msg->to);
+		if (reply_to == NULL) {
+			kbus_maybe_dbg(priv->dev,
+				       "  Can't find sender-of-request"
+				       " %u\n", msg->to);
+
+			/* We can't find the original Sender */
+			retval = -EADDRNOTAVAIL;
+			goto done_sending;
+		}
+
+		/* Are they expecting this reply? */
+		if (kbus_find_msg_id(reply_to, &msg->in_reply_to)) {
+			/* No, so we aren't allowed to send it */
+			retval = -ECONNREFUSED;
+			goto done_sending;
+		}
+
+		if (kbus_queue_is_full(reply_to, "sender-of-request", true)) {
+			if (all_or_wait)
+				retval = -EAGAIN;	/* try again later */
+			else
+				retval = -EBUSY;
+			goto done_sending;
+		}
+	}
+
+	/* Repliers only get request messages */
+	if (replier && !(msg->flags & KBUS_BIT_WANT_A_REPLY))
+		replier = NULL;
+
+	/*
+	 * And even then, only if they have room in their queue
+	 * Note that it is *always* fatal (to this send) if we can't
+	 * add a Request to a Replier's queue -- we just need to figure
+	 * out what sort of error to return
+	 */
+	if (replier) {
+		kbus_maybe_dbg(priv->dev, "  Considering replier %u\n",
+			       replier->bound_to_id);
+		/*
+		 * If the 'to' field was set, then we only want to send it if
+		 * it is *that* specific replier (and otherwise we want to fail
+		 * with "that's the wrong person for this (stateful) request").
+		 */
+		if (msg->to && (replier->bound_to_id != msg->to)) {
+
+			kbus_maybe_dbg(priv->dev, "  ..Request to %u,"
+				       " but replier is %u\n", msg->to,
+				       replier->bound_to_id);
+
+			retval = -EPIPE;	/* Well, sort of */
+			goto done_sending;
+		}
+
+		if (kbus_queue_is_full(replier->bound_to, "replier", false)) {
+			if (all_or_wait)
+				retval = -EAGAIN;	/* try again later */
+			else
+				retval = -EBUSY;
+			goto done_sending;
+		}
+	}
+
+	for (ii = 0; ii < num_listeners; ii++) {
+
+		kbus_maybe_dbg(priv->dev, "  Considering listener %u\n",
+			       listeners[ii]->bound_to_id);
+
+		if (kbus_queue_is_full
+		    (listeners[ii]->bound_to, "listener", false)) {
+			if (all_or_wait) {
+				retval = -EAGAIN;	/* try again later */
+				goto done_sending;
+			} else if (all_or_fail) {
+				retval = -EBUSY;
+				goto done_sending;
+			} else {
+				/* For now, just ignore *this* listener */
+				listeners[ii] = NULL;
+				continue;
+			}
+		}
+	}
+
+	/*
+	 * ===================================================================
+	 * Actually send the messages
+	 * ===================================================================
+	 */
+
+	/*
+	 * Remember that kbus_push_message takes a copy of the message for us.
+	 *
+	 * This is inefficient, since otherwise we could keep a single copy of
+	 * the message (or at least the message header) and just bump a
+	 * reference count for each "use" of the message name/data.
+	 *
+	 * However, it also allows us to easily set the "needs a reply" flag
+	 * (and associated data) when sending a "needs a reply" message to a
+	 * replier, and *unset* the same when sending said message to "just"
+	 * listeners...
+	 *
+	 * Be careful if altering this...
+	 */
+
+	/*
+	 * We know that kbus_push_message() can return 0 or -EFAULT.
+	 * It seems sensible to treat that latter as a "local" error, as it
+	 * means that our internals have gone wrong. Thus we don't need to
+	 * generate a message for it.
+	 */
+
+	/* If it's a reply message and we've got someone to reply to, send it */
+	if (reply_to) {
+		retval = kbus_push_message(reply_to, msg, NULL, true);
+		if (retval == 0) {
+			num_sent++;
+			/*
+			 * In which case, we *have* sent this reply,
+			 * and can forget about needing to do so
+			 * (there's not much we can do with an error
+			 * in this, so just ignore it)
+			 */
+			(void)kbus_reply_now_sent(priv, &msg->in_reply_to);
+		} else {
+			goto done_sending;
+		}
+	}
+
+	/* If it's a request, and we've got a replier for it, send it */
+	if (replier) {
+		retval =
+		    kbus_push_message(replier->bound_to, msg, replier, true);
+		if (retval)
+			goto done_sending;
+
+		num_sent++;
+		/* And we'll need a reply for that, thank you */
+		retval = kbus_remember_msg_id(priv, &msg->id);
+		if (retval)
+			/*
+			 * Out of memory - what *can* we do?
+			 * (basically, nothing, it's all gone horribly
+			 * wrong)
+			 */
+			goto done_sending;
+	}
+
+	/* For each listener, if they're still interested, send it */
+	for (ii = 0; ii < num_listeners; ii++) {
+		struct kbus_message_binding *listener = listeners[ii];
+		if (listener) {
+			retval = kbus_push_message(listener->bound_to, msg,
+						   listener, false);
+			if (retval == 0)
+				num_sent++;
+			else
+				goto done_sending;
+		}
+	}
+
+	retval = 0;
+
+done_sending:
+	kfree(listeners);
+	return retval;
+}
+
+/*
+ * Handle moving over the next chunk of data bytes from the user.
+ */
+static int kbus_write_data_parts(struct kbus_private_data *priv,
+				 const char __user *buf,
+				 size_t buf_pos, size_t bytes_to_use)
+{
+	struct kbus_write_msg *this = &(priv->write);
+
+	u32 num_parts = this->ref_data->num_parts;
+	size_t local_count = bytes_to_use;
+	size_t local_buf_pos = 0;
+
+	while (local_count) {
+		unsigned ii = this->ref_data_index;
+		unsigned this_part_len;
+		size_t sofar, needed, to_use;
+
+		unsigned *lengths = this->ref_data->lengths;
+		unsigned long *parts = this->ref_data->parts;
+
+		if (ii == num_parts - 1)
+			this_part_len = this->ref_data->last_page_len;
+		else
+			this_part_len = KBUS_PART_LEN;
+
+		sofar = lengths[ii];
+
+		needed = this_part_len - sofar;
+		to_use = min(needed, local_count);
+
+		if (copy_from_user((char *)parts[ii] + sofar,
+				   buf + buf_pos + local_buf_pos, to_use)) {
+			dev_err(priv->dev->dev, "copy from data failed"
+			       " (part %d: %u of %u to %p + %u)\n",
+			       this->ref_data_index,
+			       (unsigned)to_use, (unsigned)local_count,
+			       (void *)parts[ii], (unsigned)sofar);
+			return -EFAULT;
+		}
+
+		lengths[ii] += to_use;
+		local_count -= to_use;
+		local_buf_pos += to_use;
+
+		if (lengths[ii] == this_part_len) {
+			/* This part is full */
+			this->ref_data_index++;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Handle moving over the next chunk of bytes from the user to our message.
+ *
+ * 'buf' is the buffer of data the user gave us.
+ *
+ * 'buf_pos' is the offset in that buffer from which we are to take bytes.
+ * We alter that by how many bytes we do take.
+ *
+ * 'count' is the number of bytes we're still to take from 'buf'. We also
+ * alter 'count' by how many bytes we do take (downwards).
+ */
+static int kbus_write_parts(struct kbus_private_data *priv,
+			    const char __user *buf,
+			    size_t *buf_pos, size_t *count)
+{
+	struct kbus_write_msg *this = &(priv->write);
+	ssize_t retval = 0;
+
+	size_t bytes_needed;	/* ...to fill the current part */
+	size_t bytes_to_use;	/* ...from the user's data */
+
+	struct kbus_msg *msg = this->msg;
+	struct kbus_message_header *user_msg =
+	    (struct kbus_message_header *)&this->user_msg;
+
+	if (this->is_finished) {
+		dev_err(priv->dev->dev, "pid %u [%s]"
+		       " Attempt to write data after the end guard in a"
+		       " message (%u extra byte%s) - did you forget to"
+		       " 'send'?\n",
+		       current->pid, current->comm,
+		       (unsigned)*count, *count == 1 ? "" : "s");
+		return -EMSGSIZE;
+	}
+
+	switch (this->which) {
+
+	case KBUS_PART_HDR:
+		bytes_needed = sizeof(*user_msg) - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+
+		if (copy_from_user((char *)user_msg + this->pos,
+				   buf + *buf_pos, bytes_to_use)) {
+			dev_err(priv->dev->dev,
+			       "copy from user failed (msg hdr: "
+			       "%u of %u to %p + %u)\n",
+			       (unsigned)bytes_to_use, (unsigned)*count, msg,
+			       this->pos);
+			return -EFAULT;
+		}
+		if (bytes_needed == bytes_to_use) {
+			/*
+			 * At this point, we can check the message header makes
+			 * sense
+			 */
+			retval = kbus_check_message_written(priv->dev, this);
+			if (retval)
+				return retval;
+
+			msg->id = user_msg->id;
+			msg->in_reply_to = user_msg->in_reply_to;
+			msg->to = user_msg->to;
+			msg->from = user_msg->from;
+			msg->orig_from = user_msg->orig_from;
+			msg->final_to = user_msg->final_to;
+			msg->extra = user_msg->extra;
+			msg->flags = user_msg->flags;
+			msg->name_len = user_msg->name_len;
+			msg->data_len = user_msg->data_len;
+			/* Leaving msg->name|data_ref still unset */
+
+			this->user_name_ptr = user_msg->name;
+			this->user_data_ptr = user_msg->data;
+
+			if (user_msg->name)
+				/*
+				 * If we're reading a "pointy" message header,
+				 * then that's all we need - we shan't try to
+				 * copy the message name and any data until the
+				 * user says to SEND.
+				 */
+				this->is_finished = true;
+			else
+				this->pointers_are_local = true;
+		}
+		break;
+
+	case KBUS_PART_NAME:
+		if (this->ref_name == NULL) {
+			char *name = kmalloc(msg->name_len + 1, GFP_KERNEL);
+			if (!name) {
+				dev_err(priv->dev->dev,
+					"Cannot kmalloc message name\n");
+				return -ENOMEM;
+			}
+			name[msg->name_len] = 0;	/* always */
+			name[0] = 0;	/* we don't know the name yet */
+			this->ref_name = kbus_wrap_name_in_ref(name);
+			if (!this->ref_name) {
+				kfree(name);
+				dev_err(priv->dev->dev,
+					"Cannot kmalloc ref to message name\n");
+				return -ENOMEM;
+			}
+		}
+		bytes_needed = msg->name_len - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+
+		if (copy_from_user(this->ref_name->name + this->pos,
+				   buf + *buf_pos, bytes_to_use)) {
+			dev_err(priv->dev->dev, "copy from user failed"
+			       " (name: %d of %d to %p + %u)\n",
+			       (unsigned)bytes_to_use, (unsigned)*count,
+			       this->ref_name->name, this->pos);
+			return -EFAULT;
+		}
+		if (bytes_needed == bytes_to_use) {
+			/*
+			 * We can check the name now it is in kernel space - we
+			 * want to do this before we sort out the data, since
+			 * that can involve a *lot* of copying...
+			 */
+			if (kbus_invalid_message_name(priv->dev,
+						      this->ref_name->name,
+						      msg->name_len))
+				return -EBADMSG;
+
+			this->msg->name_ref = this->ref_name;
+			this->ref_name = NULL;
+		}
+		break;
+
+	case KBUS_PART_NPAD:
+		bytes_needed = KBUS_PADDED_NAME_LEN(msg->name_len) -
+		    msg->name_len - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+		break;
+
+	case KBUS_PART_DATA:
+		if (msg->data_len == 0) {
+			bytes_needed = 0;
+			bytes_to_use = 0;
+			break;
+		}
+		if (this->ref_data == NULL) {
+			if (kbus_alloc_ref_data(priv, msg->data_len,
+						&this->ref_data))
+				return -ENOMEM;
+			this->ref_data_index = 0;	/* current part index */
+		}
+		/* Overall, how far are we through the message's data? */
+		bytes_needed = msg->data_len - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+		/* So let's add 'bytes_to_use' bytes to our message data */
+		retval = kbus_write_data_parts(priv, buf, *buf_pos,
+					       bytes_to_use);
+		if (retval) {
+			kbus_lower_data_ref(this->ref_data);
+			this->ref_data = NULL;
+			return retval;
+		}
+		if (bytes_needed == bytes_to_use) {
+			/* Hooray - we've finished our data */
+			this->msg->data_ref = this->ref_data;
+			this->ref_data = NULL;
+		}
+		break;
+
+	case KBUS_PART_DPAD:
+		bytes_needed = KBUS_PADDED_DATA_LEN(msg->data_len) -
+		    msg->data_len - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+		break;
+
+	case KBUS_PART_FINAL_GUARD:
+		bytes_needed = 4 - this->pos;
+		bytes_to_use = min(bytes_needed, *count);
+		if (copy_from_user((char *)(&this->guard) + this->pos,
+				   buf + *buf_pos, bytes_to_use)) {
+			dev_err(priv->dev->dev, "copy from user failed"
+			       " (final guard: %u of %u to %p + %u)\n",
+			       (unsigned)bytes_to_use, (unsigned)*count,
+			       &this->guard, this->pos);
+			return -EFAULT;
+		}
+		if (bytes_needed == bytes_to_use) {
+			if (this->guard != KBUS_MSG_END_GUARD) {
+				dev_err(priv->dev->dev, "pid %u [%s]"
+				       " (entire) message end guard is "
+				       "%08x, not %08x\n",
+				       current->pid, current->comm,
+				       this->guard, KBUS_MSG_END_GUARD);
+				return -EINVAL;
+			}
+			this->is_finished = true;
+		}
+		break;
+
+	default:
+		dev_err(priv->dev->dev, "Internal error in write: unexpected"
+		       " message part %d\n", this->which);
+		return -EFAULT;	/* what *should* it be? */
+	}
+
+	*count -= bytes_to_use;
+	*buf_pos += bytes_to_use;
+
+	if (bytes_needed == bytes_to_use) {
+		this->which++;
+		this->pos = 0;
+	} else {
+		this->pos += bytes_to_use;
+	}
+	return 0;
+}
+
+static ssize_t kbus_write(struct file *filp, const char __user *buf,
+			  size_t count, loff_t *f_pos __maybe_unused)
+{
+	struct kbus_private_data *priv = filp->private_data;
+	struct kbus_dev *dev = priv->dev;
+	ssize_t retval = 0;
+	size_t bytes_left = count;
+	size_t buf_pos = 0;
+
+	struct kbus_write_msg *this = &priv->write;
+
+	if (mutex_lock_interruptible(&dev->mux))
+		return -EAGAIN;
+
+	kbus_maybe_dbg(priv->dev, "%u WRITE count %u, pos %d\n",
+				   priv->id, (unsigned)count, (int)*f_pos);
+
+	/*
+	 * If we've already started to try sending a message, we don't
+	 * want to continue appending to it
+	 */
+	if (priv->sending) {
+		retval = -EALREADY;
+		goto done;
+	}
+
+	if (this->msg == NULL) {
+		/* Clearly, the start of a new message */
+		memset(this, 0, sizeof(*this));
 
-	filp->private_data = priv;
+		/* This is the new (internal) message we're preparing */
+		this->msg = kmalloc(sizeof(*(this->msg)), GFP_KERNEL);
+		if (!this->msg) {
+			retval = -ENOMEM;
+			goto done;
+		}
+		memset(this->msg, 0, sizeof(*(this->msg)));
+	}
 
-	mutex_unlock(&dev->mux);
+	while (bytes_left) {
+		retval = kbus_write_parts(priv, buf, &buf_pos, &bytes_left);
+		if (retval)
+			goto done;
+	}
 
-	kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+done:
+	kbus_maybe_dbg(priv->dev, "%u WRITE ends with retval %d\n",
+		       priv->id, (int)retval);
 
-	return 0;
+	if (retval)
+		kbus_empty_write_msg(priv);
+	mutex_unlock(&dev->mux);
+	if (retval)
+		return retval;
+	else
+		return count;
 }
 
-static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+static ssize_t kbus_read(struct file *filp, char __user *buf, size_t count,
+			 loff_t *f_pos __maybe_unused)
 {
-	int retval2 = 0;
 	struct kbus_private_data *priv = filp->private_data;
 	struct kbus_dev *dev = priv->dev;
+	struct kbus_read_msg *this = &(priv->read);
+	ssize_t retval = 0;
+	u32 len, left;
+	u32 which = this->which;
 
 	if (mutex_lock_interruptible(&dev->mux))
-		return -ERESTARTSYS;
+		return -EAGAIN;	/* Just try again later */
 
-	kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+	kbus_maybe_dbg(priv->dev, "%u READ count %u, pos %d\n",
+		       priv->id, (unsigned)count, (int)*f_pos);
 
-	kbus_forget_my_bindings(priv);
-	retval2 = kbus_forget_open_ksock(dev, priv->id);
-	kfree(priv);
+	if (this->msg == NULL) {
+		/* No message to read at the moment */
+		kbus_maybe_dbg(priv->dev, "  Nothing to read\n");
+		retval = 0;
+		goto done;
+	}
 
-	mutex_unlock(&dev->mux);
+	/*
+	 * Read each of the parts of a message until we're read 'count'
+	 * characters, or run off the end of the message.
+	 */
+	while (which < KBUS_NUM_PARTS && count > 0) {
+		if (this->lengths[which] == 0) {
+			kbus_maybe_dbg(priv->dev,
+				       "  xx which %d, read_len[%d] %u\n",
+				       which, which, this->lengths[which]);
+			this->pos = 0;
+			which++;
+			continue;
+		}
 
-	return retval2;
+		if (which == KBUS_PART_DATA) {
+			struct kbus_data_ptr *dp = this->msg->data_ref;
+
+			left = dp->lengths[this->ref_data_index] - this->pos;
+			len = min(left, (u32) count);
+			if (len) {
+				if (copy_to_user(buf,
+						 (void *)
+						 dp->parts[this->ref_data_index]
+							 + this->pos, len)) {
+					dev_err(priv->dev->dev,
+					       "error reading from %u\n",
+					       priv->id);
+					retval = -EFAULT;
+					goto done;
+				}
+				buf += len;
+				retval += len;
+				count -= len;
+				this->pos += len;
+			}
+
+			if (this->pos == dp->lengths[this->ref_data_index]) {
+				this->pos = 0;
+				this->ref_data_index++;
+			}
+			if (this->ref_data_index == dp->num_parts) {
+				this->pos = 0;
+				which++;
+			}
+		} else {
+			left = this->lengths[which] - this->pos;
+			len = min(left, (u32) count);
+			if (len) {
+				if (copy_to_user(buf,
+						 this->parts[which] + this->pos,
+						 len)) {
+					dev_err(priv->dev->dev,
+					       "error reading from %u\n",
+					       priv->id);
+					retval = -EFAULT;
+					goto done;
+				}
+				buf += len;
+				retval += len;
+				count -= len;
+				this->pos += len;
+			}
+
+			if (this->pos == this->lengths[which]) {
+				this->pos = 0;
+				which++;
+			}
+		}
+	}
+
+	if (which < KBUS_NUM_PARTS)
+		this->which = which;
+	else
+		kbus_empty_read_msg(priv);
+
+done:
+	mutex_unlock(&dev->mux);
+	return retval;
 }
 
 static int kbus_bind(struct kbus_private_data *priv,
@@ -651,12 +2712,507 @@ done:
 	return retval;
 }
 
+/*
+ * Make the next message ready for reading by the user.
+ *
+ * Returns 0 if there is no next message, 1 if there is, and a negative value
+ * if there's an error.
+ */
+static int kbus_nextmsg(struct kbus_private_data *priv,
+			unsigned long arg)
+{
+	int retval = 0;
+	struct kbus_msg *msg;
+	struct kbus_read_msg *this = &(priv->read);
+	struct kbus_message_header *user_msg;
+
+	kbus_maybe_dbg(priv->dev, "%u NEXTMSG\n", priv->id);
+
+	/* If we were partway through a message, lose it */
+	if (this->msg) {
+		kbus_maybe_dbg(priv->dev, "  Dropping partial message\n");
+		kbus_empty_read_msg(priv);
+	}
+
+	/* Have we got a next message? */
+	msg = kbus_pop_message(priv);
+	if (msg == NULL) {
+		kbus_maybe_dbg(priv->dev, "  No next message\n");
+		/*
+		 * A return value of 0 means no message, and that's
+		 * what __put_user returns for success.
+		 */
+		return __put_user(0, (u32 __user *) arg);
+	}
+
+	user_msg = (struct kbus_message_header *)&this->user_hdr;
+	user_msg->start_guard = KBUS_MSG_START_GUARD;
+	user_msg->id = msg->id;
+	user_msg->in_reply_to = msg->in_reply_to;
+	user_msg->to = msg->to;
+	user_msg->from = msg->from;
+	user_msg->orig_from = msg->orig_from;
+	user_msg->final_to = msg->final_to;
+	user_msg->extra = msg->extra;
+	user_msg->flags = msg->flags;
+	user_msg->name_len = msg->name_len;
+	user_msg->data_len = msg->data_len;
+	user_msg->name = NULL;
+	user_msg->data = NULL;
+	user_msg->end_guard = KBUS_MSG_END_GUARD;
+
+	this->msg = msg;	/* Remember it so we can free it later */
+
+	this->parts[KBUS_PART_HDR] = (char *)user_msg;
+	this->parts[KBUS_PART_NAME] = msg->name_ref->name;
+	/* direct to the string */
+
+	this->parts[KBUS_PART_NPAD] = static_zero_padding;
+
+	/* The data is treated specially - see kbus_read() */
+	this->parts[KBUS_PART_DATA] = (char *)msg->data_ref;
+
+	this->parts[KBUS_PART_DPAD] = static_zero_padding;
+	this->parts[KBUS_PART_FINAL_GUARD] = (char *)&static_end_guard;
+
+	this->lengths[KBUS_PART_HDR] = sizeof(*user_msg);
+	this->lengths[KBUS_PART_NAME] = msg->name_len;
+	this->lengths[KBUS_PART_NPAD] =
+	    KBUS_PADDED_NAME_LEN(msg->name_len) - msg->name_len;
+
+	/* The data is treated specially - see kbus_read() */
+	this->lengths[KBUS_PART_DATA] = msg->data_len;
+	this->lengths[KBUS_PART_DPAD] =
+	    KBUS_PADDED_DATA_LEN(msg->data_len) - msg->data_len;
+
+	this->lengths[KBUS_PART_FINAL_GUARD] = 4;
+
+	/* And we'll be starting by writing out the first thing first */
+	this->which = 0;
+	this->pos = 0;
+	this->ref_data_index = 0;
+
+	/*
+	 * If the message is a request (to us), then this is the approriate
+	 * point to add it to our list of "requests we've read but not yet
+	 * replied to" -- although that *sounds* as if we should be doing it in
+	 * kbus_read, we might never get round to reading the content of the
+	 * message (we might call NEXTMSG again, or DISCARD), and also
+	 * kbus_read can get called multiple times for a single message body.
+	 * If we do our remembering here, then we guarantee to get one memory
+	 * for each request, as it leaves the message queue and is (in whatever
+	 * way) dealt with.
+	 */
+	if (msg->flags & KBUS_BIT_WANT_YOU_TO_REPLY) {
+		retval = kbus_reply_needed(priv, msg);
+		/* If it couldn't malloc, there's not much we can do,
+		 * it's fairly fatal */
+		if (retval)
+			return retval;
+	}
+
+	retval = __put_user(KBUS_ENTIRE_MSG_LEN(msg->name_len, msg->data_len),
+			    (u32 __user *) arg);
+	if (retval)
+		return retval;
+	return 1;	/* We had a message */
+}
+
 /* How much of the current message is left to read? */
 extern u32 kbus_lenleft(struct kbus_private_data *priv)
 {
+	struct kbus_read_msg *this = &(priv->read);
+	if (this->msg) {
+		int ii, jj;
+		u32 sofar = 0;
+		u32 total = KBUS_ENTIRE_MSG_LEN(this->msg->name_len,
+						     this->msg->data_len);
+		/* Add up the items we're read all of, so far */
+		for (ii = 0; ii < this->which; ii++) {
+			if (this->which == KBUS_PART_DATA &&
+			    this->msg->data_len > 0) {
+				struct kbus_data_ptr *dp = this->msg->data_ref;
+				for (jj = 0; jj < this->ref_data_index; jj++)
+					sofar += dp->lengths[jj];
+				if (this->ref_data_index < dp->num_parts)
+					sofar += this->pos;
+			} else {
+				sofar += this->lengths[ii];
+			}
+		}
+		/* Plus what we're read of the last one */
+		if (this->which < KBUS_NUM_PARTS) {
+			if (this->which == KBUS_PART_DATA &&
+			    this->msg->data_len > 0) {
+				struct kbus_data_ptr *dp = this->msg->data_ref;
+				for (jj = 0; jj < this->ref_data_index; jj++)
+					sofar += dp->lengths[jj];
+				if (this->ref_data_index < dp->num_parts)
+					sofar += this->pos;
+			} else {
+				sofar += this->pos;
+			}
+		}
+		return total - sofar;
+	}
 	return 0; /* no message => nothing to read */
 }
 
+/*
+ * Allocate the data arrays we need to hold reference-counted data, possibly
+ * spread over multiple pages. 'data_len' is from the message header.
+ *
+ * Note that the 'lengths[n]' field to each page 'n' will be set to zero.
+ */
+static int kbus_alloc_ref_data(struct kbus_private_data *priv __maybe_unused,
+			       u32 data_len,
+			       struct kbus_data_ptr **ret_ref_data)
+{
+	int num_parts = 0;
+	unsigned long *parts = NULL;
+	unsigned *lengths = NULL;
+	unsigned last_page_len = 0;
+	struct kbus_data_ptr *ref_data = NULL;
+	int as_pages;
+	int ii;
+
+	*ret_ref_data = NULL;
+
+	num_parts = (data_len + KBUS_PART_LEN - 1) / KBUS_PART_LEN;
+
+	/*
+	 * To save recalculating the length of the last page every time
+	 * we're interested, get it right once and for all.
+	 */
+	last_page_len = data_len - (num_parts - 1) * KBUS_PART_LEN;
+
+	kbus_maybe_dbg(priv->dev,
+		       "%u Allocate ref data: part=%lu, "
+		       "threshold=%lu, data_len %u -> num_parts %d\n",
+		       priv->id, KBUS_PART_LEN,
+		       KBUS_PAGE_THRESHOLD, data_len, num_parts);
+
+	parts = kmalloc(sizeof(*parts) * num_parts, GFP_KERNEL);
+	if (!parts)
+		return -ENOMEM;
+	lengths = kmalloc(sizeof(*lengths) * num_parts, GFP_KERNEL);
+	if (!lengths) {
+		kfree(parts);
+		return -ENOMEM;
+	}
+
+	if (num_parts == 1 && data_len < KBUS_PAGE_THRESHOLD) {
+		/* A single part in "simple" memory */
+		as_pages = false;
+		parts[0] = (unsigned long)kmalloc(data_len, GFP_KERNEL);
+		if (!parts[0]) {
+			kfree(lengths);
+			kfree(parts);
+			return -ENOMEM;
+		}
+		lengths[0] = 0;
+	} else {
+		/*
+		 * One or more pages
+		 *
+		 * For simplicity, we make all of our pages be full pages.
+		 * In theory, we could use the same rules for the last page
+		 * as we do if we only have a single page - but for the
+		 * moment, we're not bothering.
+		 *
+		 * This means that the 'last_page_len' is strictly theoretical
+		 * for the moment...
+		 */
+		as_pages = true;
+		for (ii = 0; ii < num_parts; ii++) {
+			parts[ii] = __get_free_page(GFP_KERNEL);
+			if (!parts[ii]) {
+				int jj;
+				for (jj = 0; jj < ii; jj++)
+					free_page(parts[jj]);
+				kfree(lengths);
+				kfree(parts);
+				return -ENOMEM;
+			}
+			lengths[ii] = 0;
+		}
+	}
+	ref_data = kbus_wrap_data_in_ref(as_pages, num_parts, parts, lengths,
+					 last_page_len);
+	if (!ref_data) {
+		int jj;
+		if (as_pages)
+			for (jj = 0; jj < num_parts; jj++)
+				free_page(parts[jj]);
+		else
+			kfree((void *)parts[0]);
+		kfree(lengths);
+		kfree(parts);
+		return -ENOMEM;
+	}
+	*ret_ref_data = ref_data;
+	return 0;
+}
+
+/*
+ * Does what it says on the box - take the user data and promote it to kernel
+ * space, as a reference counted quantity, possibly spread over multiple pages.
+ */
+static int kbus_wrap_user_data(struct kbus_private_data *priv,
+			       u32 data_len,
+			       void *user_data_ptr,
+			       struct kbus_data_ptr **new_data)
+{
+	struct kbus_data_ptr *ref_data = NULL;
+	int num_parts;
+	unsigned long *parts;
+	unsigned *lengths;
+	int ii;
+	uint8_t __user *data_ptr;
+
+	int retval = kbus_alloc_ref_data(priv, data_len, &ref_data);
+	if (retval)
+		return retval;
+
+	num_parts = ref_data->num_parts;
+	lengths = ref_data->lengths;
+	parts = ref_data->parts;
+
+	kbus_maybe_dbg(priv->dev, "  @@ copying %s\n",
+		       ref_data->as_pages ? "as pages" : "as kmalloc'ed data");
+
+	/* Given all of the *space* for our data, populate it */
+	data_ptr = (void __user *) user_data_ptr;
+	for (ii = 0; ii < num_parts; ii++) {
+		unsigned len;
+		if (ii == num_parts - 1)
+			len = ref_data->last_page_len;
+		else
+			len = KBUS_PART_LEN;
+
+		kbus_maybe_dbg(priv->dev,
+			       "  @@ %d: copy %d bytes "
+			       "from user address %lu\n",
+			       ii, len, parts[ii]);
+
+		if (copy_from_user((void *)parts[ii], data_ptr, len)) {
+			kbus_lower_data_ref(ref_data);
+			return -EFAULT;
+		}
+		lengths[ii] = len;
+		data_ptr += len;
+	}
+	*new_data = ref_data;
+	return 0;
+}
+
+/*
+ * Given a "pointy" message header, copy the message name and data from
+ * user space into kernel space.
+ *
+ * The message name is copied as a reference-counted string.
+ *
+ * The message data (if any) is copied as reference-counted data.
+ *
+ * Also checks the legality of the message name, since we need the name in
+ * kernel space to do that, but prefer to do the check before copying any
+ * data (which can be expensive).
+ */
+static int kbus_copy_pointy_parts(struct kbus_private_data *priv,
+				  struct kbus_write_msg *this)
+{
+	struct kbus_msg *msg = this->msg;
+	char *new_name = NULL;
+	struct kbus_name_ptr *name_ref;
+	struct kbus_data_ptr *new_data = NULL;
+
+	/* First, let's deal with the name */
+	new_name = kmalloc(msg->name_len + 1, GFP_KERNEL);
+	if (!new_name)
+		return -ENOMEM;
+	if (copy_from_user
+	    (new_name, (void __user *)this->user_name_ptr, msg->name_len + 1)) {
+		kfree(new_name);
+		return -EFAULT;
+	}
+
+	/*
+	 * We can check the name now it is in kernel space - we want
+	 * to do this before we sort out the data, since that can involve
+	 * a *lot* of copying...
+	 */
+	if (kbus_invalid_message_name(priv->dev, new_name, msg->name_len)) {
+		kfree(new_name);
+		return -EBADMSG;
+	}
+	name_ref = kbus_wrap_name_in_ref(new_name);
+	if (!name_ref) {
+		kfree(new_name);
+		return -ENOMEM;
+	}
+
+	/* Now for the data. */
+	if (msg->data_len) {
+		int retval = kbus_wrap_user_data(priv, msg->data_len,
+						 this->user_data_ptr,
+						 &new_data);
+		if (retval) {
+			kbus_lower_name_ref(name_ref);
+			return retval;
+		}
+	}
+
+	kbus_maybe_dbg(priv->dev, "  'pointy' message normalised\n");
+
+	msg->name_ref = name_ref;
+	msg->data_ref = new_data;
+
+	this->user_name_ptr = NULL;
+	this->user_data_ptr = NULL;
+	this->pointers_are_local = true;
+
+	return 0;
+}
+
+static void kbus_discard(struct kbus_private_data *priv)
+{
+	kbus_empty_write_msg(priv);
+	priv->sending = false;
+}
+
+/*
+ * Returns 0 for success, and a negative value if there's an error.
+ */
+static int kbus_send(struct kbus_private_data *priv,
+		     struct kbus_dev *dev, unsigned long arg)
+{
+	ssize_t retval = 0;
+	struct kbus_msg *msg = priv->write.msg;
+
+	kbus_maybe_dbg(priv->dev, "%u SEND\n", priv->id);
+
+	if (priv->write.msg == NULL)
+		return -ENOMSG;
+
+	if (!priv->write.is_finished) {
+		dev_err(priv->dev->dev, "pid %u [%s]"
+		       " message not finished (in part %d of message)\n",
+		       current->pid, current->comm, priv->write.which);
+		retval = -EINVAL;
+		goto done;
+	}
+
+	/*
+	 * Users are not allowed to send messages marked as "synthetic"
+	 * (since, after all, if the user sends it, it is not). However,
+	 * it's possible that, in good faith, they re-sent a synthetic
+	 * message that they received earlier, so we shall take care to
+	 * unset the bit, if necessary.
+	 */
+	if (KBUS_BIT_SYNTHETIC & msg->flags)
+		msg->flags &= ~KBUS_BIT_SYNTHETIC;
+
+	/*
+	 * The "extra" field is reserved for future expansion, so for the
+	 * moment we always zero it (this stops anyone from trying to take
+	 * advantage of it, and getting caught out when we decide WE want it)
+	 */
+	msg->extra = 0;
+
+	/*
+	 * The message header is already in kernel space (thanks to kbus_write),
+	 * but if it's a "pointy" message, the name and data are not. So let's
+	 * fix that.
+	 *
+	 * Note that we *always* end up with a message header containing
+	 * pointers to (copies of) the name and (if given) data, and the
+	 * data reference counted, and maybe split over multiple pages.
+	 *
+	 *     Note that if this is a message we already tried to send
+	 *     earlier, any "pointy" parts would have been copied earlier,
+	 *     hence the check we actually make.
+	 */
+	if (!priv->write.pointers_are_local) {
+		retval = kbus_copy_pointy_parts(priv, &priv->write);
+		if (retval)
+			goto done;
+	}
+
+	/* ================================================================= */
+	/*
+	 * If this message is a Request, then we can't send it until/unless
+	 * we've got room in our message queue to receive the Reply.
+	 *
+	 * We do this check here, rather than in kbus_write_to_recipients,
+	 * because:
+	 *
+	 * a) kbus_write_to_recipients gets (re)called by the POLL interface,
+	 *    and at that stage KBUS *knows* that there is room for the
+	 *    message concerned (so the checking code would need to know not
+	 *    to check)
+	 *
+	 * b) If the check fails, we do not want to consider ourselves in
+	 *    "sending" state, since we can't afford to block, because it's
+	 *    *this Ksock* that needs to do some reading to clear the relevant
+	 *    queue, and it can't do that if it's blocking. So we'd either
+	 *    need to handle that (somehow), or just do the check here.
+	 *
+	 * Similarly, we don't finalise the message (put in its "from" and "id"
+	 * fields) until we pass this test.
+	 */
+	if ((msg->flags & KBUS_BIT_WANT_A_REPLY) &&
+	    kbus_queue_is_full(priv, "sender", false)) {
+		dev_err(priv->dev->dev, "%u Unable to send Request becausei"
+			" no room for a Reply in sender's message queue\n",
+			priv->id);
+		retval = -ENOLCK;
+		goto done;
+	}
+	/* ================================================================= */
+
+	/* So, we're actually ready to SEND! */
+
+	/* The message needs to say it is from us */
+	msg->from = priv->id;
+
+	/*
+	 * If we've already tried to send this message earlier (and
+	 * presumably failed with -EAGAIN), then we don't need to give
+	 * it a message id, because it already has one...
+	 */
+	if (!priv->sending) {
+		/* The message seems well formed, give it an id if necessary */
+		if (msg->id.network_id == 0)
+			msg->id.serial_num = kbus_next_serial_num(dev);
+	}
+
+	/* Also, remember this as the "message we last (tried to) send" */
+	priv->last_msg_id_sent = msg->id;
+
+	/*
+	 * Figure out who should receive this message, and write it to them
+	 */
+	retval = kbus_write_to_recipients(priv, dev, msg);
+
+done:
+	/*
+	 * -EAGAIN means we were blocked from sending, and the caller
+	 *  should try again (as one might expect).
+	 */
+	if (retval == -EAGAIN)
+		/* Remember we're still trying to send this message */
+		priv->sending = true;
+	else
+		/* We've now finished with our copy of the message header */
+		kbus_discard(priv);
+
+	if (retval == 0 || retval == -EAGAIN)
+		if (copy_to_user((void __user *)arg, &priv->last_msg_id_sent,
+				 sizeof(priv->last_msg_id_sent)))
+			retval = -EFAULT;
+	return retval;
+}
+
 static int kbus_maxmsgs(struct kbus_private_data *priv,
 			unsigned long arg)
 {
@@ -800,6 +3356,60 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		retval = kbus_replier(priv, dev, arg);
 		break;
 
+	case KBUS_IOC_NEXTMSG:
+		/*
+		 * Get the next message ready to be read, and return its
+		 * length.
+		 *
+		 * arg in:  none
+		 * arg out: number of bytes in next message
+		 * retval:  0 if no next message, 1 if there is a next message,
+		 *          negative value if there's an error.
+		 */
+		retval = kbus_nextmsg(priv, arg);
+		break;
+
+	case KBUS_IOC_LENLEFT:
+		/* How many bytes are left to read in the current message? */
+		{
+			u32 left = kbus_lenleft(priv);
+			kbus_maybe_dbg(priv->dev, "%u LENLEFT %u\n",
+				       id, left);
+			retval = __put_user(left, (u32 __user *) arg);
+		}
+		break;
+
+	case KBUS_IOC_SEND:
+		/*
+		 * Send the curent message, we've finished writing it.
+		 *
+		 * arg in: <ignored>
+		 * arg out: the message id of said message
+		 * retval: negative for bad message, etc., 0 otherwise
+		 */
+		retval = kbus_send(priv, dev, arg);
+		break;
+
+	case KBUS_IOC_DISCARD:
+		/* Throw away the message we're currently writing. */
+		kbus_maybe_dbg(priv->dev, "%u DISCARD\n", id);
+		kbus_discard(priv);
+		break;
+
+	case KBUS_IOC_LASTSENT:
+		/*
+		 * What was the message id of the last message written to this
+		 * file descriptor? Before any messages have been written to
+		 * this file descriptor, this ioctl will return {0,0).
+		 */
+		kbus_maybe_dbg(priv->dev, "%u LASTSENT %u:%u\n", id,
+			       priv->last_msg_id_sent.network_id,
+			       priv->last_msg_id_sent.serial_num);
+		if (copy_to_user((void __user *)arg, &priv->last_msg_id_sent,
+					sizeof(priv->last_msg_id_sent)))
+			retval = -EFAULT;
+		break;
+
 	case KBUS_IOC_MAXMSGS:
 		/*
 		 * Set (and/or query) maximum number of messages in this
@@ -821,6 +3431,14 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		retval = kbus_nummsgs(priv, dev, arg);
 		break;
 
+	case KBUS_IOC_UNREPLIEDTO:
+		/* How many Requests (to us) do we still owe Replies to? */
+		kbus_maybe_dbg(priv->dev, "%u UNREPLIEDTO %d\n",
+			       id, priv->num_replies_unsent);
+		retval = __put_user(priv->num_replies_unsent,
+				(u32 __user *) arg);
+		break;
+
 	case KBUS_IOC_VERBOSE:
 		/*
 		 * Should we output verbose/debug messages?
@@ -842,10 +3460,110 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	return retval;
 }
 
+/*
+ * Try sending the (current waiting to be sent) message
+ *
+ * Returns true if the message has either been successfully sent, or an error
+ * occcurred (which has been dealt with) and there is no longer a current
+ * message.
+ *
+ * Returns false if we hit EAGAIN (again) and we're still trying to send the
+ * current message.
+ */
+static int kbus_poll_try_send_again(struct kbus_private_data *priv,
+				    struct kbus_dev *dev)
+{
+	int retval;
+	struct kbus_msg *msg = priv->write.msg;
+
+	retval = kbus_write_to_recipients(priv, dev, msg);
+
+	switch (-retval) {
+	case 0:		/* All is well, nothing to do */
+		break;
+	case EAGAIN:		/* Still blocked by *someone* - nowt to do */
+		break;
+	case EADDRNOTAVAIL:
+		/*
+		 * It's a Request and there's no Replier (presumably there was
+		 * when the initial SEND was done, but now they've gone away).
+		 * A Request *needs* a Reply...
+		 */
+		kbus_push_synthetic_message(dev, 0, msg->from, msg->id,
+					    KBUS_MSG_NAME_REPLIER_DISAPPEARED);
+		retval = 0;
+		break;
+	default:
+		/*
+		 * Send *failed* - what can we do?
+		 * Not much, perhaps, but we must ensure that a Request gets
+		 * (some sort of) reply
+		 */
+		if (msg->flags & KBUS_BIT_WANT_A_REPLY)
+			kbus_push_synthetic_message(dev, 0, msg->from, msg->id,
+					    KBUS_MSG_NAME_ERROR_SENDING);
+		retval = 0;
+		break;
+	}
+
+	if (retval == 0) {
+		kbus_discard(priv);
+		return true;
+	}
+	return false;
+}
+
+static unsigned int kbus_poll(struct file *filp, poll_table * wait)
+{
+	struct kbus_private_data *priv = filp->private_data;
+	struct kbus_dev *dev = priv->dev;
+	unsigned mask = 0;
+
+	mutex_lock(&dev->mux);
+
+	kbus_maybe_dbg(priv->dev, "%u POLL\n", priv->id);
+
+	/*
+	 * Did I wake up because there's a message available to be read?
+	 */
+	if (priv->message_count != 0)
+		mask |= POLLIN | POLLRDNORM;	/* readable */
+
+	/*
+	 * Did I wake up because someone said they had space for a message on
+	 * their message queue (where there wasn't space before)?
+	 *
+	 * And if that is the case, if we're opened for write and have a
+	 * message waiting to be sent, can we now send it?
+	 *
+	 * The simplest way to find out is just to try again.
+	 */
+	if (filp->f_mode & FMODE_WRITE) {
+		int writable = true;
+		if (priv->sending)
+			writable = kbus_poll_try_send_again(priv, dev);
+		if (writable)
+			mask |= POLLOUT | POLLWRNORM;
+	}
+
+	/* Wait until someone has a message waiting to be read */
+	poll_wait(filp, &priv->read_wait, wait);
+
+	/* Wait until someone has a space into which a message can be pushed */
+	if (priv->sending)
+		poll_wait(filp, &dev->write_wait, wait);
+
+	mutex_unlock(&dev->mux);
+	return mask;
+}
+
 /* File operations for /dev/kbus<n> */
 static const struct file_operations kbus_fops = {
 	.owner = THIS_MODULE,
+	.read = kbus_read,
+	.write = kbus_write,
 	.unlocked_ioctl = kbus_ioctl,
+	.poll = kbus_poll,
 	.open = kbus_open,
 	.release = kbus_release,
 };
@@ -867,6 +3585,8 @@ static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
 	INIT_LIST_HEAD(&dev->bound_message_list);
 	INIT_LIST_HEAD(&dev->open_ksock_list);
 
+	init_waitqueue_head(&dev->write_wait);
+
 	dev->next_ksock_id = 0;
 	dev->next_msg_serial_num = 0;
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/11] KBUS add ability to receive messages only once
  2011-03-18 17:21         ` [PATCH 05/11] KBUS add support for messages Tony Ibbs
@ 2011-03-18 17:21           ` Tony Ibbs
  2011-03-18 17:21             ` [PATCH 07/11] KBUS add ability to add devices at runtime Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

Sometimes a recipient has bound to a message name more than once,
but only wants to receive one copy of each message matching those
bindings.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 include/linux/kbus_defns.h |   18 +++++++---
 ipc/kbus_internal.h        |   30 ++++++++++++++++
 ipc/kbus_main.c            |   82 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index d43c498..9da72e4 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -581,13 +581,21 @@ struct kbus_replier_bind_event_data {
  * retval: 0 for success, negative for failure
  */
 #define KBUS_IOC_UNREPLIEDTO _IOR(KBUS_IOC_MAGIC, 13, char *)
-
 /*
- * IOCTL 14 is not used, because it is introduced in the next revision,
- * (obviously, in real history this was done in a different order) and
- * I don't want to alter the number for VERBOSE.
+ * MSGONLYONCE - should we receive a message only once?
+ *
+ * This IOCTL tells a Ksock whether it should only receive a particular message
+ * once, even if it is both a Replier and Listener for the message (in which
+ * case it will always get the message as Replier, if appropriate), or if it is
+ * registered as multiple Listeners for the message.
+ *
+ * arg(in): __u32, 1 to change to "only once", 0 to change to the default,
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
  */
-
+#define KBUS_IOC_MSGONLYONCE  _IOWR(KBUS_IOC_MAGIC, 14, char *)
 /*
  * VERBOSE - should KBUS output verbose "printk" messages (for this device)?
  *
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 2d9e737..28c153c 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -534,6 +534,36 @@ struct kbus_private_data {
 	 * In fact, the 'outstanding_requests' list is used, simply because
 	 * it was implemented first.
 	 */
+
+	/*
+	 * By default, if a Ksock binds to a message name as both Replier and
+	 * Listener (typically by binding to a specific message name as Replier
+	 * and to a wildcard including it as Listener), and a Reqest of that
+	 * name is sent to that Ksock, it will get the message once as Replier
+	 * (marked "WANT_YOU_TO_REPLY"), and once as listener.
+	 *
+	 * This is technically sensible, but can be irritating to the end user
+	 * who really often only wants to receive the message once.
+	 *
+	 * If "messages_only_once" is set, then when a message is about to be
+	 * put onto a Ksocks message queue, it will only be added if it (i.e.,
+	 * a message with the same id) has not already just been added. This
+	 * is safe because Requests to the specific Replier are always dealt
+	 * with first.
+	 *
+	 * As a side-effect, which I think also makes sense, this will also
+	 * mean that if a Listener has bound to the same message name multiple
+	 * times (as a Listener), then they will only get the message once.
+	 */
+	int messages_only_once;
+	/*
+	 * Messages can be added to either end of our message queue (i.e.,
+	 * depending on whether they're urgent or not). This means that the
+	 * "only once" mechanism needs to check both ends of the queue (which
+	 * is a pain). Or we can just remember the message id of the last
+	 * message pushed onto the queue. Which is much simpler.
+	 */
+	struct kbus_msg_id msg_id_just_pushed;
 };
 
 /* What is a sensible number for the default maximum number of messages? */
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 944b60c..a75d2e1 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -851,6 +851,46 @@ static int kbus_push_message(struct kbus_private_data *priv,
 		       "  %u Pushing message onto queue (%s)\n",
 		       priv->id, for_replier ? "replier" : "listener");
 
+	/*
+	 * 1. Check to see if this Ksock has the "only one copy
+	 *    of a message" flag set.
+	 * 2. If it does, check if our message (id) is already on
+	 *    the queue, and if it is, just skip adding it.
+	 *
+	 * (this means if the Ksock was destined to get the message
+	 * several times, either as Replier and Listener, or as
+	 * multiple Listeners to the same message name, it will only
+	 * get it once, for this "push")
+	 *
+	 * If "for_replier" is set we necessarily push the message - see below.
+	 */
+	if (priv->messages_only_once && !for_replier) {
+		/*
+		 * 1. We've been asked to only send one copy of a message
+		 *    to each Ksock that should receive it.
+		 * 2. This is not a Reply (to our Ksock) or a Request (to
+		 *    our Ksock as Replier)
+		 *
+		 * So, given that, has a message with that id already been
+		 * added to the message queue?
+		 *
+		 * (Note that if a message would be included because of
+		 * multiple message name bindings, we do not say anything
+		 * about which binding we will actually add the message
+		 * for - so unbinding later on may or may not cause a
+		 * message to go away, in this case.)
+		 */
+		if (kbus_same_message_id(&priv->msg_id_just_pushed,
+					 msg->id.network_id,
+					 msg->id.serial_num)) {
+			kbus_maybe_dbg(priv->dev,
+				       "  %u Ignoring message "
+				       "under 'once only' rule\n",
+				       priv->id);
+			return 0;
+		}
+	}
+
 	new_msg = kbus_copy_message(priv->dev, msg);
 	if (!new_msg)
 		return -EFAULT;
@@ -898,6 +938,7 @@ static int kbus_push_message(struct kbus_private_data *priv,
 	}
 
 	priv->message_count++;
+	priv->msg_id_just_pushed = msg->id;
 
 	if (!kbus_same_message_id(&msg->in_reply_to, 0, 0)) {
 		/*
@@ -3243,6 +3284,36 @@ static int kbus_nummsgs(struct kbus_private_data *priv,
 	return __put_user(count, (u32 __user *) arg);
 }
 
+static int kbus_onlyonce(struct kbus_private_data *priv,
+			 unsigned long arg)
+{
+	int retval = 0;
+	u32 only_once;
+	int old_value = priv->messages_only_once;
+
+	retval = __get_user(only_once, (u32 __user *) arg);
+	if (retval)
+		return retval;
+
+	kbus_maybe_dbg(priv->dev, "%u ONLYONCE requests %u (was %d)\n",
+		       priv->id, only_once, old_value);
+
+	switch (only_once) {
+	case 0:
+		priv->messages_only_once = false;
+		break;
+	case 1:
+		priv->messages_only_once = true;
+		break;
+	case 0xFFFFFFFF:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return __put_user(old_value, (u32 __user *) arg);
+}
+
 static int kbus_set_verbosity(struct kbus_private_data *priv,
 			      unsigned long arg)
 {
@@ -3439,6 +3510,17 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				(u32 __user *) arg);
 		break;
 
+	case KBUS_IOC_MSGONLYONCE:
+		/*
+		 * Should we receive a given message only once?
+		 *
+		 * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+		 * arg out: the previous value, before we were called
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_onlyonce(priv, arg);
+		break;
+
 	case KBUS_IOC_VERBOSE:
 		/*
 		 * Should we output verbose/debug messages?
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/11] KBUS add ability to add devices at runtime
  2011-03-18 17:21           ` [PATCH 06/11] KBUS add ability to receive messages only once Tony Ibbs
@ 2011-03-18 17:21             ` Tony Ibbs
  2011-03-18 17:21               ` [PATCH 08/11] KBUS add Replier Bind Events Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

Users do not always know how many KBUS devices will be needed when
the system starts. This allows a normal user to request an extra
device.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 include/linux/kbus_defns.h |   12 +++++++++++-
 ipc/kbus_main.c            |   17 +++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 9da72e4..4d3fcd5 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -611,8 +611,18 @@ struct kbus_replier_bind_event_data {
  */
 #define KBUS_IOC_VERBOSE  _IOWR(KBUS_IOC_MAGIC, 15, char *)
 
+/*
+ * NEWDEVICE - request another KBUS device (/dev/kbus<n>).
+ *
+ * The next device number (up to a maximum of 255) will be allocated.
+ *
+ * arg(out): __u32, the new device number (<n>)
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NEWDEVICE _IOR(KBUS_IOC_MAGIC, 16, char *)
+
 /* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR	15
+#define KBUS_IOC_MAXNR	16
 
 #if !__KERNEL__ && defined(__cplusplus)
 }
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index a75d2e1..48ba4b2 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -3532,6 +3532,23 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		retval = kbus_set_verbosity(priv, arg);
 		break;
 
+	case KBUS_IOC_NEWDEVICE:
+		/*
+		 * Request a new device
+		 *
+		 * arg out: the new device number
+		 * return: 0 means OK, otherwise not OK.
+		 */
+		kbus_maybe_dbg(priv->dev, "%u NEWDEVICE %d\n",
+			       id, kbus_num_devices);
+		retval = kbus_setup_new_device(kbus_num_devices);
+		if (retval > 0) {
+			kbus_num_devices++;
+			retval = __put_user(kbus_num_devices - 1,
+				       (u32 __user *) arg);
+		}
+		break;
+
 	default:
 		/* *Should* be redundant, if we got our range checks right */
 		retval = -ENOTTY;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/11] KBUS add Replier Bind Events
  2011-03-18 17:21             ` [PATCH 07/11] KBUS add ability to add devices at runtime Tony Ibbs
@ 2011-03-18 17:21               ` Tony Ibbs
  2011-03-18 17:21                 ` [PATCH 09/11] KBUS Replier Bind Event set-aside lists Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

It is useful to be able to write userspace proxies which allow
messages sent on one KBUS device to be received on another. In
order to do this, the userspace program needs to be reliably
informed of Replier bind and unbind events.

These proxies are referrred to as Limpets in the KBUS documentation.
There is a brief introduction to the concept here:
http://presentations.kbus.googlecode.com/hg/talks/europython2010.html#limpets

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 include/linux/kbus_defns.h |   30 ++++-
 ipc/kbus_internal.h        |    5 +
 ipc/kbus_main.c            |  376 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 410 insertions(+), 1 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 4d3fcd5..c646a1d 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -489,6 +489,19 @@ struct kbus_replier_bind_event_data {
 #define KBUS_MSG_NAME_REPLIER_DISAPPEARED	"$.KBUS.Replier.Disappeared"
 #define KBUS_MSG_NAME_ERROR_SENDING		"$.KBUS.ErrorSending"
 
+/*
+ * Replier Bind Event
+ * ------------------
+ * This is the only message name for which KBUS generates data -- see
+ * kbus_replier_bind_event_data. It is also the only message name which KBUS
+ * does not allow binding to as a Replier.
+ *
+ * This is the message that is sent when a Replier binds or unbinds to another
+ * message name, if the KBUS_IOC_REPORTREPLIERBINDS ioctl has been used to
+ * request such notification.
+ */
+#define KBUS_MSG_NAME_REPLIER_BIND_EVENT	"$.KBUS.ReplierBindEvent"
+
 #define KBUS_IOC_MAGIC	'k'	/* 0x6b - which seems fair enough for now */
 /*
  * RESET: reserved for future use
@@ -621,8 +634,23 @@ struct kbus_replier_bind_event_data {
  */
 #define KBUS_IOC_NEWDEVICE _IOR(KBUS_IOC_MAGIC, 16, char *)
 
+/*
+ * REPORTREPLIERBINDS - request synthetic messages announcing Replier
+ * bind/unbind events.
+ *
+ * If this flag is set, then when someone binds or unbinds to a message name as
+ * a Replier, KBUS will send out a synthetic Announcement of this fact.
+ *
+ * arg(in): __u32, 1 to change to "report", 0 to change to "do not report",
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
+ */
+#define KBUS_IOC_REPORTREPLIERBINDS  _IOWR(KBUS_IOC_MAGIC, 17, char *)
+
 /* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR	16
+#define KBUS_IOC_MAXNR	17
 
 #if !__KERNEL__ && defined(__cplusplus)
 }
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 28c153c..13dc896 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -618,6 +618,11 @@ struct kbus_dev {
 
 	/* Are we wanting debugging messages? */
 	u32 verbose;
+
+	/*
+	 * Are we wanting to send a synthetic message for each Replier
+	 * bind/unbind? */
+	u32 report_replier_binds;
 };
 
 /*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 48ba4b2..6372b40 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -1031,6 +1031,203 @@ static void kbus_push_synthetic_message(struct kbus_dev *dev,
 }
 
 /*
+ * Add the data part to a bind/unbind synthetic message.
+ *
+ * 'is_bind' is true if this was a "bind" event, false if it was an "unbind".
+ *
+ * 'name' is the message name (or wildcard) that was bound (or unbound) to.
+ *
+ * Returns 0 if all goes well, or a negative value if something goes wrong.
+ */
+static int kbus_add_bind_message_data(struct kbus_private_data *priv,
+				      struct kbus_msg *new_msg,
+				      u32 is_bind,
+				      u32 name_len, char *name)
+{
+	struct kbus_replier_bind_event_data *data;
+	struct kbus_data_ptr *wrapped_data;
+	u32 padded_name_len = KBUS_PADDED_NAME_LEN(name_len);
+	u32 data_len = sizeof(*data) + padded_name_len;
+	u32 rest_len = padded_name_len / 4;
+	char *name_p;
+
+	unsigned long *parts;
+	unsigned *lengths;
+
+	data = kmalloc(data_len, GFP_KERNEL);
+	if (!data) {
+		kbus_free_message(new_msg);
+		return -ENOMEM;
+	}
+
+	name_p = (char *)&data->rest[0];
+
+	data->is_bind = is_bind;
+	data->binder = priv->id;
+	data->name_len = name_len;
+
+	data->rest[rest_len - 1] = 0;	/* terminating with enough '\0' */
+	strncpy(name_p, name, name_len);
+
+	/*
+	 * And that data, unfortunately for our simple mindedness,
+	 * needs wrapping up in a reference count...
+	 *
+	 * Note/remember that we are happy for the reference counting
+	 * wrapper to "own" our data, and free it when the it is done
+	 * with it.
+	 *
+	 * I'm going to assert that we have less than a PAGE length
+	 * of data, so we can simply do:
+	 */
+	parts = kmalloc(sizeof(*parts), GFP_KERNEL);
+	if (!parts) {
+		kfree(data);
+		return -ENOMEM;
+	}
+	lengths = kmalloc(sizeof(*lengths), GFP_KERNEL);
+	if (!lengths) {
+		kfree(parts);
+		kfree(data);
+		return -ENOMEM;
+	}
+	lengths[0] = data_len;
+	parts[0] = (unsigned long)data;
+	wrapped_data =
+	    kbus_wrap_data_in_ref(false, 1, parts, lengths, data_len);
+	if (!wrapped_data) {
+		kfree(lengths);
+		kfree(parts);
+		kfree(data);
+		return -ENOMEM;
+	}
+
+	new_msg->data_len = data_len;
+	new_msg->data_ref = wrapped_data;
+	return 0;
+}
+
+/*
+ * Create a new Replier Bind Event synthetic message.
+ *
+ * The initial design of things didn't really expect us to be
+ * generating messages with actual data inside the kernel module,
+ * so it's all a little bit more complicated than it might otherwise
+ * be, and there's some playing with things directly that might best
+ * be done otherwise (notably, sorting out the wrapping up of the
+ * data in a reference count). I think that's excusable given this
+ * should be the only sort of message we ever generate with actual
+ * data (or so I believe).
+ *
+ * Returns the new message, or NULL.
+ */
+static struct kbus_msg
+*kbus_new_synthetic_bind_message(struct kbus_private_data *priv,
+				 u32 is_bind,
+				 u32 name_len, char *name)
+{
+	ssize_t retval = 0;
+	struct kbus_msg *new_msg;
+	struct kbus_msg_id in_reply_to = { 0, 0 };	/* no-one */
+
+	kbus_maybe_dbg(priv->dev,
+		       "  Creating synthetic bind message for '%s'"
+		       " (%s)\n", name, is_bind ? "bind" : "unbind");
+
+	new_msg = kbus_build_kbus_message(priv->dev,
+					  KBUS_MSG_NAME_REPLIER_BIND_EVENT,
+					  0, 0, in_reply_to);
+	if (!new_msg)
+		return NULL;
+
+	/*
+	 * What happens if any one of the listeners can't receive the message
+	 * because they don't have room in their queues?
+	 *
+	 * If we flag the message as ALL_OR_FAIL, then if we can't deliver
+	 * to all of the listeners who care, we will get -EBUSY returned to
+	 * us, which we shall then return as -EAGAIN (people expect to check
+	 * for -EAGAIN to find out if they should, well, try again).
+	 *
+	 * In this scenario, the user needs to catch a "bind"/"unbind" return
+	 * of -EAGAIN and realise that it needs to try again.
+	 */
+	new_msg->flags |= KBUS_BIT_ALL_OR_FAIL;
+
+	/*
+	 * That gave us the basis of the message, but now we need to add in
+	 * its meaning.
+	 */
+	retval = kbus_add_bind_message_data(priv, new_msg, is_bind,
+					    name_len, name);
+	if (retval < 0) {
+		kbus_free_message(new_msg);
+		return NULL;
+	}
+
+	return new_msg;
+}
+
+/*
+ * Generate a bind/unbind synthetic message, and broadcast it.
+ *
+ * This is for use when we have been asked to announce when a Replier binds or
+ * unbinds.
+ *
+ * 'priv' is the sender - the entity that is doing the actual bind/unbind.
+ *
+ * 'is_bind' is true if this was a "bind" event, false if it was an "unbind".
+ *
+ * 'name' is the message name (or wildcard) that was bound (or unbound) to.
+ *
+ * Returns 0 if all goes well, or a negative value if something goes wrong,
+ * notably -EAGAIN if we couldn't send the message to ALL the Listeners who
+ * have bound to receive it.
+ */
+static int kbus_push_synthetic_bind_message(struct kbus_private_data *priv,
+					    u32 is_bind,
+					    u32 name_len, char *name)
+{
+
+	ssize_t retval = 0;
+	struct kbus_msg *new_msg;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  Pushing synthetic bind message for '%s'"
+		       " (%s) onto queue\n", name,
+		       is_bind ? "bind" : "unbind");
+
+	new_msg =
+	    kbus_new_synthetic_bind_message(priv, is_bind, name_len, name);
+	if (new_msg == NULL)
+		return -ENOMEM;
+
+	kbus_maybe_report_message(priv->dev, new_msg);
+	kbus_maybe_dbg(priv->dev,
+		       "Writing synthetic message to "
+		       "recipients\n");
+
+	retval = kbus_write_to_recipients(priv, priv->dev, new_msg);
+	/*
+	 * kbus_push_message takes a copy of our message data, so we
+	 * must remember to free ours. Since we've made sure that it
+	 * looks just like a user-generated message (i.e., the name
+	 * can be freed as well as the data), this is fairly simple.
+	 */
+	kbus_free_message(new_msg);
+
+	/*
+	 * Because we used ALL_OR_FAIL, we will get -EBUSY back if we FAIL,
+	 * but we want to tell the user -EAGAIN, since they *should* try
+	 * again later.
+	 */
+	if (retval == -EBUSY)
+		retval = -EAGAIN;
+
+	return retval;
+}
+
+/*
  * Pop the next message off our queue.
  *
  * Returns a pointer to the message, or NULL if there is no next message.
@@ -1442,6 +1639,21 @@ static int kbus_remember_binding(struct kbus_dev *dev,
 	new->name_len = name_len;
 	new->name = name;
 
+	if (replier && dev->report_replier_binds) {
+		/*
+		 * We've been asked to announce when a Replier binds.
+		 * If we can't tell all the Listeners who care, we want
+		 * to give up, rather than tell some of them, and then
+		 * bind anyway.
+		 */
+		retval = kbus_push_synthetic_bind_message(priv, true,
+							  name_len, name);
+		if (retval != 0) {	/* Hopefully, just -EBUSY */
+			kfree(new);
+			return retval;
+		}
+	}
+
 	list_add(&new->list, &dev->bound_message_list);
 	return 0;
 }
@@ -1561,6 +1773,27 @@ static int kbus_forget_binding(struct kbus_dev *dev,
 		return -EINVAL;
 	}
 
+	if (replier && dev->report_replier_binds) {
+
+		/*
+		 * We want to send a message indicating that we've unbound
+		 * the Replier for this message.
+		 *
+		 * If we can't tell all the Listeners who're listening for this
+		 * message, we want to give up, rather then tell some of them,
+		 * and then unbind anyway.
+		 */
+		int retval = kbus_push_synthetic_bind_message(priv, false,
+							      name_len, name);
+		if (retval != 0)	/* Hopefully, just -EBUSY */
+			return retval;
+		/*
+		 * Note that if we were ourselves listening for replier bind
+		 * events, then we will ourselves get the message announcing
+		 * we're about to unbind.
+		 */
+	}
+
 	kbus_maybe_dbg(priv->dev, "  %u Unbound %u %c '%.*s'\n",
 		       priv->id, binding->bound_to_id,
 		       (binding->is_replier ? 'R' : 'L'),
@@ -1584,6 +1817,49 @@ static int kbus_forget_binding(struct kbus_dev *dev,
 	return 0;
 }
 
+
+/*
+ * Report a Replier Bind Event for unbinding from the given message name
+ */
+static void kbus_report_unbinding(struct kbus_private_data *priv,
+				  u32 name_len, char *name)
+{
+	struct kbus_msg *msg;
+	int retval;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Safe report unbinding of '%.*s'\n",
+		       priv->id, name_len, name);
+
+	/* Generate the "X has unbound from Y" message */
+	msg = kbus_new_synthetic_bind_message(priv, false, name_len, name);
+	if (msg == NULL)
+		return;	/* There is nothing sensible to do here */
+
+	/* ...and send it */
+	retval = kbus_write_to_recipients(priv, priv->dev, msg);
+	if (retval != -EBUSY)
+		goto done_sending;
+
+	/*
+	 * If someone who had bound to it wasn't able to take the message,
+	 * then there's not a lot we can do at this stage.
+	 *
+	 * XXX This is, of course, unacceptable. Sorting it out will
+	 * XXX be done in the next tranch of code, though, since it
+	 * XXX is not terribly simple.
+	 */
+	kbus_maybe_dbg(priv->dev,
+		       "  %u Someone was unable to receive message '%*s'\n",
+		       priv->id, name_len, name);
+
+done_sending:
+	/* Don't forget to free our copy of the message */
+	if (msg)
+		kbus_free_message(msg);
+	/* We aren't returning any status code. Oh well. */
+}
+
 /*
  * Remove all bindings for a particular listener.
  *
@@ -1608,6 +1884,10 @@ static void kbus_forget_my_bindings(struct kbus_private_data *priv)
 			       ptr->bound_to_id, (ptr->is_replier ? 'R' : 'L'),
 			       ptr->name_len, ptr->name);
 
+		if (ptr->is_replier && dev->report_replier_binds)
+			kbus_report_unbinding(priv, ptr->name_len,
+					      ptr->name);
+
 		list_del(&ptr->list);
 		kfree(ptr->name);
 		kfree(ptr);
@@ -2624,6 +2904,14 @@ static int kbus_bind(struct kbus_private_data *priv,
 		goto done;
 	}
 
+	if (bind->is_replier && !strcmp(name,
+					KBUS_MSG_NAME_REPLIER_BIND_EVENT)) {
+		kbus_maybe_dbg(priv->dev, "cannot bind %s as a Replier\n",
+			       KBUS_MSG_NAME_REPLIER_BIND_EVENT);
+		retval = -EBADMSG;
+		goto done;
+	}
+
 	kbus_maybe_dbg(priv->dev, "%u BIND %c '%.*s'\n", priv->id,
 		       (bind->is_replier ? 'R' : 'L'), bind->name_len, name);
 
@@ -3353,6 +3641,83 @@ static int kbus_set_verbosity(struct kbus_private_data *priv,
 	return __put_user(old_value, (u32 __user *) arg);
 }
 
+/* Report all existing replier bindings to the requester */
+static int kbus_report_existing_binds(struct kbus_private_data *priv,
+				      struct kbus_dev *dev)
+{
+	struct kbus_message_binding *ptr;
+	struct kbus_message_binding *next;
+
+	list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+		struct kbus_msg *new_msg;
+		int retval;
+
+		kbus_maybe_dbg(priv->dev, "  %u Report %c '%.*s'\n",
+		       priv->id, (ptr->is_replier ? 'R' : 'L'),
+		       ptr->name_len, ptr->name);
+
+		if (!ptr->is_replier)
+			continue;
+
+		new_msg = kbus_new_synthetic_bind_message(ptr->bound_to, true,
+						  ptr->name_len, ptr->name);
+		if (new_msg == NULL)
+			return -ENOMEM;
+
+		/*
+		 * It is perhaps a bit inefficient to check this per binding,
+		 * but it saves us doing two passes through the list.
+		 */
+		if (kbus_queue_is_full(priv, "limpet", false)) {
+			/* Giving up is probably the best we can do */
+			kbus_free_message(new_msg);
+			return -EBUSY;
+		}
+
+		retval = kbus_push_message(priv, new_msg, NULL, false);
+
+		kbus_free_message(new_msg);
+		if (retval)
+			return retval;
+	}
+	return 0;
+}
+
+static int kbus_set_report_binds(struct kbus_private_data *priv,
+				 struct kbus_dev *dev, unsigned long arg)
+{
+	int retval = 0;
+	u32 report_replier_binds;
+	int old_value = priv->dev->report_replier_binds;
+
+	retval = __get_user(report_replier_binds, (u32 __user *) arg);
+	if (retval)
+		return retval;
+
+	kbus_maybe_dbg(priv->dev,
+		       "%u REPORTREPLIERBINDS requests %u (was %d)\n",
+		       priv->id, report_replier_binds, old_value);
+
+	switch (report_replier_binds) {
+	case 0:
+		priv->dev->report_replier_binds = false;
+		break;
+	case 1:
+		priv->dev->report_replier_binds = true;
+		/* And report the current state of bindings... */
+		retval = kbus_report_existing_binds(priv, dev);
+		if (retval)
+			return retval;
+		break;
+	case 0xFFFFFFFF:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return __put_user(old_value, (u32 __user *) arg);
+}
+
 static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	int err = 0;
@@ -3549,6 +3914,17 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		}
 		break;
 
+	case KBUS_IOC_REPORTREPLIERBINDS:
+		/*
+		 * Should we report Replier bind/unbind events?
+		 *
+		 * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+		 * arg out: the previous value, before we were called
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_set_report_binds(priv, dev, arg);
+		break;
+
 	default:
 		/* *Should* be redundant, if we got our range checks right */
 		retval = -ENOTTY;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/11] KBUS Replier Bind Event set-aside lists
  2011-03-18 17:21               ` [PATCH 08/11] KBUS add Replier Bind Events Tony Ibbs
@ 2011-03-18 17:21                 ` Tony Ibbs
  2011-03-18 17:21                   ` [PATCH 10/11] KBUS report state to userspace Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

In order to make sending Replier Bind Events reliable, we need
to introduce set-aside message lists, in case a client that
wants to receive such events has a full list when the event
occurs (making bind/unbind depend on the recipient of the
event is not acceptable).

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 include/linux/kbus_defns.h |    7 +
 ipc/kbus_internal.h        |   62 ++++++
 ipc/kbus_main.c            |  464 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 518 insertions(+), 15 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index c646a1d..82779a6 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -482,12 +482,19 @@ struct kbus_replier_bind_event_data {
  *   bound as Replier closed.
  * * ErrorSending - an unexpected error occurred when trying to send a Request
  *   to its Replier whilst polling.
+ *
+ * Synthetic Announcements with no data
+ * ------------------------------------
+ * * UnbindEventsLost - sent (instead of a Replier Bind Event) when the unbind
+ *   events "set aside" list has filled up, and thus unbind events have been
+ *   lost.
  */
 #define KBUS_MSG_NAME_REPLIER_GONEAWAY		"$.KBUS.Replier.GoneAway"
 #define KBUS_MSG_NAME_REPLIER_IGNORED		"$.KBUS.Replier.Ignored"
 #define KBUS_MSG_NAME_REPLIER_UNBOUND		"$.KBUS.Replier.Unbound"
 #define KBUS_MSG_NAME_REPLIER_DISAPPEARED	"$.KBUS.Replier.Disappeared"
 #define KBUS_MSG_NAME_ERROR_SENDING		"$.KBUS.ErrorSending"
+#define KBUS_MSG_NAME_UNBIND_EVENTS_LOST	"$.KBUS.UnbindEventsLost"
 
 /*
  * Replier Bind Event
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 13dc896..a24fcaf 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -414,6 +414,20 @@ struct kbus_write_msg {
 };
 
 /*
+ * The data for an unsent Replier Bind Event (in the unsent_unbind_msg_list)
+ *
+ * Note that 'binding' may theroretically be NULL (although I don't think this
+ * should ever actually happen).
+ */
+struct kbus_unsent_message_item {
+	struct list_head list;
+	struct kbus_private_data *send_to;	/* who we want to send it to */
+	u32 send_to_id;	/* but the id is often useful */
+	struct kbus_msg *msg;	/* the message itself */
+	struct kbus_message_binding *binding;	/* and why we remembered it */
+};
+
+/*
  * This is the data for an individual Ksock
  *
  * Each time we open /dev/kbus<n>, we need to remember a unique id for
@@ -564,6 +578,13 @@ struct kbus_private_data {
 	 * message pushed onto the queue. Which is much simpler.
 	 */
 	struct kbus_msg_id msg_id_just_pushed;
+
+	/*
+	 * If this flag is set, then we may have outstanding Replier Unbound
+	 * Event messages (kept on a list on our device). These must be read
+	 * before any "normal" messages (on our message_queue) get read.
+	 */
+	int maybe_got_unsent_unbind_msgs;
 };
 
 /* What is a sensible number for the default maximum number of messages? */
@@ -571,6 +592,18 @@ struct kbus_private_data {
 #define CONFIG_KBUS_DEF_MAX_MESSAGES	100
 #endif
 
+/*
+ * What about the maximum number of unsent unbind event messages?
+ * This may want to be quite large, to allow for Limpets with momentary
+ * network outages.
+ *
+ * The default value is probably too small, but experimantation is
+ * needed to determine a more sensible value.
+ */
+#ifndef CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES
+#define CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES 1000
+#endif
+
 /* Information belonging to each /dev/kbus<N> device */
 struct kbus_dev {
 	struct cdev cdev;	/* Character device data */
@@ -623,6 +656,35 @@ struct kbus_dev {
 	 * Are we wanting to send a synthetic message for each Replier
 	 * bind/unbind? */
 	u32 report_replier_binds;
+
+	/*
+	 * If Replier (un)bind events have been requested, then when
+	 * kbus_release is called, a message must be sent for each Replier that
+	 * is (of necessity) unbound from the Ksock being released. For a
+	 * normal unbound, if any of the Repliers doesn't have room in its
+	 * message queue for such an event, then the unbind fails with -EAGAIN.
+	 * This isn't acceptable for kbus_release (apart from anything else,
+	 * the release might be due to the original program falling over).
+	 * It's not acceptable to fail to send the messages (that's a general
+	 * KBUS principle).
+	 *
+	 * The only sensible solution seems to be to put the messages we'd
+	 * like to have sent onto a set-aside list, and mark each recipient
+	 * as having messages thereon. Then, each time a Ksock looks for a
+	 * new message, it should first check to see if it might have one
+	 * on the set-aside list, and if it does, read that instead.
+	 *
+	 * Once we're doing this, though, we need some limit on how big that
+	 * set-aside list may grow (to allow for user processes that keep
+	 * binding and falling over!). When the list gets "too long", we set a
+	 * "gone tragically wrong" flag, and instead of adding more unbind
+	 * events, we instead add a single "gone tragically wrong" message for
+	 * each Ksock. We don't revert to remembering unbind events again until
+	 * the list has been emptied.
+	 */
+	struct list_head unsent_unbind_msg_list;
+	u32 unsent_unbind_msg_count;
+	int unsent_unbind_is_tragic;
 };
 
 /*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 6372b40..296a1a1 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -88,6 +88,11 @@ static int kbus_write_to_recipients(struct kbus_private_data *priv,
 				    struct kbus_dev *dev,
 				    struct kbus_msg *msg);
 
+static void kbus_forget_unbound_unsent_unbind_msgs(struct kbus_private_data
+						   *priv,
+						   struct kbus_message_binding
+						   *binding);
+
 static int kbus_alloc_ref_data(struct kbus_private_data *priv,
 			       u32 data_len,
 			       struct kbus_data_ptr **ret_ref_data);
@@ -1803,6 +1808,13 @@ static int kbus_forget_binding(struct kbus_dev *dev,
 	kbus_forget_matching_messages(priv, binding);
 
 	/*
+	 * Maybe including any set-aside Replier Unbind Events...
+	 */
+	if (!strncmp(KBUS_MSG_NAME_REPLIER_BIND_EVENT, binding->name,
+		     binding->name_len))
+		kbus_forget_unbound_unsent_unbind_msgs(priv, binding);
+
+	/*
 	 * We carefully don't try to do anything about requests that
 	 * have already been read - the fact that the user has unbound
 	 * from receiving new messages with this name doesn't imply
@@ -1817,43 +1829,219 @@ static int kbus_forget_binding(struct kbus_dev *dev,
 	return 0;
 }
 
+/*
+ * Add a (copy of a) message to the "unsent Replier Unbind Event" list
+ *
+ * 'priv' is who we are trying to send to, 'msg' is the message we were
+ * trying to send.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_remember_unsent_unbind_event(struct kbus_dev *dev,
+					     struct kbus_private_data *priv,
+					     struct kbus_msg *msg,
+					     struct kbus_message_binding
+					     *binding)
+{
+	struct kbus_unsent_message_item *new;
+	struct kbus_msg *new_msg = NULL;
+
+	kbus_maybe_dbg(priv->dev,
+		       "  Remembering unsent unbind event "
+		       "%u '%.*s' to %u\n",
+		       dev->unsent_unbind_msg_count, msg->name_len,
+		       msg->name_ref->name, priv->id);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	new_msg = kbus_copy_message(dev, msg);
+	if (!new_msg) {
+		kfree(new);
+		return -EFAULT;
+	}
+
+	new->send_to = priv;
+	new->send_to_id = priv->id;	/* Useful shorthand? */
+	new->msg = new_msg;
+	new->binding = binding;
+
+	/*
+	 * The order should be the same as a normal message queue,
+	 * so add to the end...
+	 */
+	list_add_tail(&new->list, &dev->unsent_unbind_msg_list);
+	dev->unsent_unbind_msg_count++;
+	return 0;
+}
+
+/*
+ * Return true if this listener already has a "gone tragic" message.
+ *
+ * Look at the end of the unsent Replier Unbind Event message list, to see
+ * if the given listener already has a "gone tragic" message (since if it
+ * does, we will not want to add another).
+ */
+static int kbus_listener_already_got_tragic_msg(struct kbus_dev *dev,
+						struct kbus_private_data
+						*listener)
+{
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	kbus_maybe_dbg(dev,
+		       "  Checking for 'gone tragic' event for %u\n",
+		       listener->id);
+
+	list_for_each_entry_safe_reverse(ptr, next,
+					 &dev->unsent_unbind_msg_list, list) {
+
+		if (kbus_message_name_matches(
+					ptr->msg->name_ref->name,
+					ptr->msg->name_len,
+					KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+			/*
+			 * If we get a Replier Bind Event, then we're past all
+			 * the "tragic world" messages
+			 */
+			break;
+		if (ptr->send_to_id == listener->id) {
+			kbus_maybe_dbg(dev, "  Found\n");
+			return true;
+		}
+	}
+
+	kbus_maybe_dbg(dev, "  Not found\n");
+	return false;
+}
 
 /*
- * Report a Replier Bind Event for unbinding from the given message name
+ * Report a Replier Bind Event for unbinding from the given message name,
+ * in such a way that we do not lose the message even if we can't send it
+ * right away.
  */
-static void kbus_report_unbinding(struct kbus_private_data *priv,
-				  u32 name_len, char *name)
+static void kbus_safe_report_unbinding(struct kbus_private_data *priv,
+				       u32 name_len, char *name)
 {
+	/* 1. Generate a new unbinding event message
+	 * 2. Try sending it to everyone who cares
+	 * 3. If that failed, then find out who *does* care
+	 * 4. Is there room for that many messages on the set-aside list?
+	 * 5. If there is, add (a copy of) the message for each
+	 * 6. If there is not, set the "tragic" flag, and add (a copy of)
+	 *    the "world gone tragic" message for each
+	 * 7. If we've added something to the set-aside list, then set
+	 *    the "maybe got something on the set-aside list" flag for
+	 *    each recipient. */
+
 	struct kbus_msg *msg;
-	int retval;
+	struct kbus_message_binding **listeners = NULL;
+	struct kbus_message_binding *replier = NULL;
+	int retval = 0;
+	int num_listeners;
+	int ii;
 
 	kbus_maybe_dbg(priv->dev,
 		       "  %u Safe report unbinding of '%.*s'\n",
 		       priv->id, name_len, name);
 
-	/* Generate the "X has unbound from Y" message */
+	/* Generate the message we'd *like* to send */
 	msg = kbus_new_synthetic_bind_message(priv, false, name_len, name);
 	if (msg == NULL)
 		return;	/* There is nothing sensible to do here */
 
-	/* ...and send it */
+	/* If we're lucky, we can just send it */
 	retval = kbus_write_to_recipients(priv, priv->dev, msg);
 	if (retval != -EBUSY)
 		goto done_sending;
 
 	/*
-	 * If someone who had bound to it wasn't able to take the message,
-	 * then there's not a lot we can do at this stage.
+	 * So at least one of the people we were trying to send to was not able
+	 * to take the message, presumably because their message queue is full.
+	 * Thus we need to put aside one copy of the message for each
+	 * recipient, to be delivered when it *can* be received.
 	 *
-	 * XXX This is, of course, unacceptable. Sorting it out will
-	 * XXX be done in the next tranch of code, though, since it
-	 * XXX is not terribly simple.
+	 * So before we do anything else, we need to know who those recipients
+	 * are.
 	 */
+
 	kbus_maybe_dbg(priv->dev,
-		       "  %u Someone was unable to receive message '%*s'\n",
-		       priv->id, name_len, name);
+		       "  %u Need to add messages to set-aside list\n",
+		       priv->id);
+
+	/*
+	 * We're expecting some listeners, but no replier.
+	 * Since this is a duplicate of what we did in kbus_write_to_recipients,
+	 * and since our ksock is locked whilst we're working, we can assume
+	 * that we should get the same result. For the sake of completeness,
+	 * check the error return anyway, but I'm not going to worry about
+	 * whether we suddenly have a replier popping up unexpectedly...
+	 */
+	num_listeners = kbus_find_listeners(priv->dev, &listeners, &replier,
+					    msg->name_len, msg->name_ref->name);
+	if (num_listeners < 0) {
+		kbus_maybe_dbg(priv->dev,
+			       "  Error %d finding listeners\n",
+			       num_listeners);
+		retval = num_listeners;
+		goto done_sending;
+	}
+
+	if (priv->dev->unsent_unbind_is_tragic ||
+	    (num_listeners + priv->dev->unsent_unbind_msg_count >
+	     CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES)) {
+		struct kbus_msg_id in_reply_to = { 0, 0 };	/* no-one */
+		/*
+		 * Either the list had already gone tragic, or we've
+		 * filled it up with "normal" unbind event messages
+		 */
+		priv->dev->unsent_unbind_is_tragic = true;
+
+		/* In which case we need a different message */
+		kbus_free_message(msg);
+		msg = kbus_build_kbus_message(priv->dev,
+					      KBUS_MSG_NAME_UNBIND_EVENTS_LOST,
+					      0, 0, in_reply_to);
+		if (msg == NULL)
+			goto done_sending;
+
+		for (ii = 0; ii < num_listeners; ii++) {
+			/*
+			 * We only want to add a "gone tragic" message if the
+			 * recipient does not already have such a message
+			 * stacked...
+			 */
+			if (kbus_listener_already_got_tragic_msg(priv->dev,
+						 listeners[ii]->bound_to))
+				continue;
+			retval = kbus_remember_unsent_unbind_event(priv->dev,
+					   listeners[ii]->bound_to,
+					   msg, listeners[ii]);
+			/* And remember that we've got something on the
+			 * set-aside list */
+			listeners[ii]->bound_to->maybe_got_unsent_unbind_msgs =
+			    true;
+			if (retval)
+				break;	/* No good choice here */
+		}
+	} else {
+		/* There's room to add these messages as-is */
+		for (ii = 0; ii < num_listeners; ii++) {
+			retval = kbus_remember_unsent_unbind_event(priv->dev,
+					   listeners[ii]->bound_to,
+					   msg, listeners[ii]);
+			/* And remember that we've got something on the
+			 * set-aside list */
+			listeners[ii]->bound_to->maybe_got_unsent_unbind_msgs =
+			    true;
+			if (retval)
+				break;	/* No good choice here */
+		}
+	}
 
 done_sending:
+	kfree(listeners);
 	/* Don't forget to free our copy of the message */
 	if (msg)
 		kbus_free_message(msg);
@@ -1861,6 +2049,196 @@ done_sending:
 }
 
 /*
+ * Return how many messages we have in the unsent Replier Unbind Event list.
+ */
+static u32 kbus_count_unsent_unbind_msgs(struct kbus_private_data *priv)
+{
+	struct kbus_dev *dev = priv->dev;
+
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	u32 count = 0;
+
+	kbus_maybe_dbg(dev, "%u Counting unsent unbind messages\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+				 list) {
+		if (ptr->send_to_id == priv->id)
+			count++;
+	}
+	return count;
+}
+
+/*
+ * Maybe move an unsent Replier Unbind Event message to the main message list.
+ *
+ * Check if we have an unsent event on the set-aside list. If we do, move the
+ * first one across to our normal message queue.
+ *
+ * Returns 0 if all goes well, or a negative value if something went wrong.
+ */
+static int kbus_maybe_move_unsent_unbind_msg(struct kbus_private_data *priv)
+{
+	struct kbus_dev *dev = priv->dev;
+
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	kbus_maybe_dbg(dev,
+		       "%u Looking for an unsent unbind message\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+				 list) {
+		int retval;
+
+		if (ptr->send_to_id != priv->id)
+			continue;
+
+		kbus_maybe_report_message(priv->dev, ptr->msg);
+		/*
+		 * Move the message into our normal message queue.
+		 *
+		 * We *must* use kbus_push_message() to do this, as
+		 * we wish to keep our promise that this shall be the
+		 * only way of adding a message to the queue.
+		 */
+		retval = kbus_push_message(priv, ptr->msg, ptr->binding, false);
+		if (retval)
+			return retval;	/* What else can we do? */
+
+		list_del(&ptr->list);
+		/* Mustn't forget to free *our* copy of the message */
+		kbus_free_message(ptr->msg);
+		kfree(ptr);
+		dev->unsent_unbind_msg_count--;
+		goto check_tragic;
+	}
+
+	/*
+	 * Since we didn't find anything, we can safely unset the flag that
+	 * says there might be something to find...
+	 */
+	priv->maybe_got_unsent_unbind_msgs = false;
+
+check_tragic:
+	/*
+	 * And if we've succeeded in emptying the list, we can unset the
+	 * "gone tragic" flag for it, too, if it was set.
+	 */
+	if (list_empty(&dev->unsent_unbind_msg_list))
+		dev->unsent_unbind_is_tragic = false;
+	return 0;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages for this binding.
+ *
+ * Called from kbus_release.
+ */
+static void kbus_forget_unbound_unsent_unbind_msgs(
+					struct kbus_private_data *priv,
+					struct kbus_message_binding *binding)
+{
+	struct kbus_dev *dev = priv->dev;
+
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	u32 count = 0;
+
+	kbus_maybe_dbg(dev,
+		       " %u Forgetting unsent unbind messages for "
+		       "this binding\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+			list) {
+		if (ptr->binding == binding) {
+			list_del(&ptr->list);
+			kbus_free_message(ptr->msg);
+			kfree(ptr);
+			dev->unsent_unbind_msg_count--;
+			count++;
+		}
+	}
+	kbus_maybe_dbg(dev, "%u Forgot %u unsent unbind messages\n",
+		       priv->id, count);
+	/*
+	 * And if we've succeeded in emptying the list, we can unset the
+	 * "gone tragic" flag for it, too, if it was set.
+	 */
+	if (list_empty(&dev->unsent_unbind_msg_list))
+		dev->unsent_unbind_is_tragic = false;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages for this Replier.
+ *
+ * Called from kbus_release.
+ */
+static void kbus_forget_my_unsent_unbind_msgs(struct kbus_private_data *priv)
+{
+	struct kbus_dev *dev = priv->dev;
+
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	u32 count = 0;
+
+	kbus_maybe_dbg(dev,
+		       "%u Forgetting my unsent unbind messages\n", priv->id);
+
+	list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+			list) {
+		if (ptr->send_to_id == priv->id) {
+			list_del(&ptr->list);
+			kbus_free_message(ptr->msg);
+			kfree(ptr);
+			dev->unsent_unbind_msg_count--;
+			count++;
+		}
+	}
+	kbus_maybe_dbg(dev, "%u Forgot %u unsent unbind messages\n",
+		       priv->id, count);
+	/*
+	 * And if we've succeeded in emptying the list, we can unset the
+	 * "gone tragic" flag for it, too, if it was set.
+	 */
+	if (list_empty(&dev->unsent_unbind_msg_list))
+		dev->unsent_unbind_is_tragic = false;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock,
+ * or worry about lost messages.
+ */
+static void kbus_forget_unsent_unbind_msgs(struct kbus_dev *dev)
+{
+	struct kbus_unsent_message_item *ptr;
+	struct kbus_unsent_message_item *next;
+
+	kbus_maybe_dbg(dev,
+		       "  Forgetting unsent unbind event messages\n");
+
+	list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+			list) {
+
+		if (!kbus_message_name_matches(
+					    ptr->msg->name_ref->name,
+					    ptr->msg->name_len,
+					    KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+			kbus_maybe_report_message(dev, ptr->msg);
+
+		list_del(&ptr->list);
+		kbus_free_message(ptr->msg);
+		kfree(ptr);
+		dev->unsent_unbind_msg_count--;
+	}
+}
+
+/*
  * Remove all bindings for a particular listener.
  *
  * Called from kbus_release, which will itself handle removing messages
@@ -1885,8 +2263,8 @@ static void kbus_forget_my_bindings(struct kbus_private_data *priv)
 			       ptr->name_len, ptr->name);
 
 		if (ptr->is_replier && dev->report_replier_binds)
-			kbus_report_unbinding(priv, ptr->name_len,
-					      ptr->name);
+			kbus_safe_report_unbinding(priv, ptr->name_len,
+							 ptr->name);
 
 		list_del(&ptr->list);
 		kfree(ptr->name);
@@ -2088,6 +2466,8 @@ static int kbus_release(struct inode *inode __always_unused, struct file *filp)
 
 	kbus_empty_message_queue(priv);
 	kbus_forget_my_bindings(priv);
+	if (priv->maybe_got_unsent_unbind_msgs)
+		kbus_forget_my_unsent_unbind_msgs(priv);
 	kbus_empty_replies_unsent(priv);
 	retval2 = kbus_forget_open_ksock(dev, priv->id);
 	kfree(priv);
@@ -2933,6 +3313,7 @@ static int kbus_unbind(struct kbus_private_data *priv,
 	int retval = 0;
 	struct kbus_bind_request *bind;
 	char *name = NULL;
+	u32 old_message_count = priv->message_count;
 
 	bind = kmalloc(sizeof(*bind), GFP_KERNEL);
 	if (!bind)
@@ -2975,6 +3356,36 @@ static int kbus_unbind(struct kbus_private_data *priv,
 	retval = kbus_forget_binding(dev, priv,
 				     bind->is_replier, bind->name_len, name);
 
+	/*
+	 * If we're unbinding from $.KBUS.ReplierBindEvent, and there
+	 * are (or may be) any such kept for us on the unread Replier
+	 * Unbind Event list, then we need to remove them as well...
+	 *
+	 * NOTE that the following only checks for exact matchs to
+	 * $.KBUS.ReplierBindEvent, which should be sufficient...
+	 */
+	if (priv->maybe_got_unsent_unbind_msgs &&
+	    !strcmp(name, KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+		kbus_forget_my_unsent_unbind_msgs(priv);
+
+	/*
+	 * If that removed any messages from the message queue, then we have
+	 * room to consider moving a message across from the unread Replier
+	 * Unbind Event list
+	 */
+	if (priv->message_count < old_message_count &&
+	    priv->maybe_got_unsent_unbind_msgs) {
+		int rv = kbus_maybe_move_unsent_unbind_msg(priv);
+		/* If this fails, we're probably stumped */
+		if (rv)
+			/* The best we can do is grumble gently. We still
+			 * want to return retval, not rv.
+			 */
+			dev_err(priv->dev->dev,
+			       "Failed to move unsent messages on "
+			       "unbind (error %d)\n", -rv);
+	}
+
 done:
 	kfree(name);
 	kfree(bind);
@@ -3140,6 +3551,21 @@ static int kbus_nextmsg(struct kbus_private_data *priv,
 			return retval;
 	}
 
+	/*
+	 * If we (maybe) have any unread Replier Unbind Event messages,
+	 * we now have room to copy one across to the message list
+	 */
+	kbus_maybe_dbg(priv->dev,
+		       "  ++ maybe_got_unsent_unbind_msgs %d\n",
+		       priv->maybe_got_unsent_unbind_msgs);
+
+	if (priv->maybe_got_unsent_unbind_msgs) {
+		retval = kbus_maybe_move_unsent_unbind_msg(priv);
+		/* If this fails, we're probably stumped */
+		if (retval)
+			return retval;
+	}
+
 	retval = __put_user(KBUS_ENTIRE_MSG_LEN(msg->name_len, msg->data_len),
 			    (u32 __user *) arg);
 	if (retval)
@@ -3567,6 +3993,12 @@ static int kbus_nummsgs(struct kbus_private_data *priv,
 {
 	u32 count = priv->message_count;
 
+	if (priv->maybe_got_unsent_unbind_msgs) {
+		kbus_maybe_dbg(dev, "%u NUMMSGS 'main' count %u\n",
+			       priv->id, count);
+		count += kbus_count_unsent_unbind_msgs(priv);
+	}
+
 	kbus_maybe_dbg(dev, "%u NUMMSGS %u\n", priv->id, count);
 
 	return __put_user(count, (u32 __user *) arg);
@@ -4059,6 +4491,7 @@ static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
 	 */
 	INIT_LIST_HEAD(&dev->bound_message_list);
 	INIT_LIST_HEAD(&dev->open_ksock_list);
+	INIT_LIST_HEAD(&dev->unsent_unbind_msg_list);
 
 	init_waitqueue_head(&dev->write_wait);
 
@@ -4078,6 +4511,7 @@ static void kbus_teardown_cdev(struct kbus_dev *dev)
 {
 	kbus_forget_all_bindings(dev);
 	kbus_forget_all_open_ksocks(dev);
+	kbus_forget_unsent_unbind_msgs(dev);
 
 	cdev_del(&dev->cdev);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/11] KBUS report state to userspace
  2011-03-18 17:21                 ` [PATCH 09/11] KBUS Replier Bind Event set-aside lists Tony Ibbs
@ 2011-03-18 17:21                   ` Tony Ibbs
  2011-03-18 17:21                     ` [PATCH 11/11] KBUS configuration and Makefile Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs

This introduces two pseudo-files which expose some of the current
KBUS internal state. The bindings file, in particular, is used
in the KBUS test scripts (which are part of the userspace Python
binding).

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 ipc/kbus_report.c |  256 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100644 ipc/kbus_report.c

diff --git a/ipc/kbus_report.c b/ipc/kbus_report.c
new file mode 100644
index 0000000..ba07254
--- /dev/null
+++ b/ipc/kbus_report.c
@@ -0,0 +1,256 @@
+/* KBUS kernel module - general reporting to userspace
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ *   Kynesim, Cambridge UK
+ *   Tony Ibbs <tibs@tonyibbs.co.uk>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above.  If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL.  If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#include <linux/module.h>
+#include <linux/cdev.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+
+#include <linux/kbus_defns.h>
+#include "kbus_internal.h"
+
+/* /proc */
+static struct proc_dir_entry *kbus_proc_dir;
+static struct proc_dir_entry *kbus_proc_file_bindings;
+static struct proc_dir_entry *kbus_proc_file_stats;
+
+/* What's the symbolic name of a message part? */
+static const char *kbus_msg_part_name(enum kbus_msg_parts p)
+{
+	switch (p) {
+	case KBUS_PART_HDR:	return "HDR";
+	case KBUS_PART_NAME:	return "NAME";
+	case KBUS_PART_NPAD:	return "NPAD";
+	case KBUS_PART_DATA:	return "DATA";
+	case KBUS_PART_DPAD:	return "DPAD";
+	case KBUS_PART_FINAL_GUARD:	return "FINAL";
+	}
+
+	pr_err("kbus: unhandled enum lookup %d in "
+		   "kbus_msg_part_name - memory corruption?", p);
+	return "???";
+}
+
+/*
+ * Report on the current bindings, via /proc/kbus/bindings
+ */
+
+static int kbus_binding_seq_show(struct seq_file *s, void *v __always_unused)
+{
+	int ii;
+	int kbus_num_devices;
+	struct kbus_dev **kbus_devices;
+
+	kbus_get_device_data(&kbus_num_devices, &kbus_devices);
+
+	/* We report on all of the KBUS devices */
+	for (ii = 0; ii < kbus_num_devices; ii++) {
+		struct kbus_dev *dev = kbus_devices[ii];
+
+		struct kbus_message_binding *ptr;
+		struct kbus_message_binding *next;
+
+		if (mutex_lock_interruptible(&dev->mux))
+			return -ERESTARTSYS;
+
+		seq_printf(s,
+			   "# <device> is bound to <Ksock-ID> in <process-PID>"
+			   " as <Replier|Listener> for <message-name>\n");
+
+		list_for_each_entry_safe(ptr, next, &dev->bound_message_list,
+					 list) {
+			seq_printf(s, "%3u: %8u %8lu  %c  %.*s\n", dev->index,
+				   ptr->bound_to_id,
+				   (long unsigned)ptr->bound_to->pid,
+				   (ptr->is_replier ? 'R' : 'L'),
+				   ptr->name_len, ptr->name);
+		}
+
+		mutex_unlock(&dev->mux);
+	}
+	return 0;
+}
+
+static int kbus_proc_bindings_open(struct inode *inode __always_unused,
+				   struct file *file)
+{
+	return single_open(file, kbus_binding_seq_show, NULL);
+}
+
+static const struct file_operations kbus_proc_binding_file_ops = {
+	.owner = THIS_MODULE,
+	.open = kbus_proc_bindings_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release
+};
+
+static struct proc_dir_entry
+*kbus_create_proc_binding_file(struct proc_dir_entry *directory)
+{
+	struct proc_dir_entry *entry =
+	    create_proc_entry("bindings", 0, directory);
+	if (entry)
+		entry->proc_fops = &kbus_proc_binding_file_ops;
+	return entry;
+}
+
+/*
+ * Report on whatever statistics seem like they might be useful,
+ * via /proc/kbus/stats
+ */
+
+static int kbus_stats_seq_show(struct seq_file *s, void *v __always_unused)
+{
+	int ii;
+	int kbus_num_devices;
+	struct kbus_dev **kbus_devices;
+
+	kbus_get_device_data(&kbus_num_devices, &kbus_devices);
+
+	/* We report on all of the KBUS devices */
+	for (ii = 0; ii < kbus_num_devices; ii++) {
+		struct kbus_dev *dev = kbus_devices[ii];
+
+		struct kbus_private_data *ptr;
+		struct kbus_private_data *next;
+
+		if (mutex_lock_interruptible(&dev->mux))
+			return -ERESTARTSYS;
+
+		seq_printf(s,
+			 "dev %2u: next ksock %u next msg %u "
+			 "unsent unbindings %u%s\n",
+			 dev->index, dev->next_ksock_id,
+			 dev->next_msg_serial_num,
+			 dev->unsent_unbind_msg_count,
+			 (dev->unsent_unbind_is_tragic ? "(gone tragic)" : ""));
+
+		list_for_each_entry_safe(ptr, next,
+					 &dev->open_ksock_list, list) {
+
+			u32 left = kbus_lenleft(ptr);
+			u32 total;
+			if (ptr->read.msg)
+				total =
+				    KBUS_ENTIRE_MSG_LEN(ptr->read.msg->name_len,
+						ptr->read.msg->data_len);
+			else
+				total = 0;
+
+			seq_printf(s, "    ksock %u last msg %u:%u "
+					"queue %u of %u\n",
+				   ptr->id, ptr->last_msg_id_sent.network_id,
+				   ptr->last_msg_id_sent.serial_num,
+				   ptr->message_count, ptr->max_messages);
+
+			seq_printf(s, "      read byte %u of %u, "
+				"wrote byte %u of %s (%sfinished), "
+				"%ssending\n",
+				   (total - left), total, ptr->write.pos,
+				   kbus_msg_part_name(ptr->write.which),
+				   ptr->write.is_finished ? "" : "not ",
+				   ptr->sending ? "" : "not ");
+
+			seq_printf(s, "      outstanding requests %u "
+					"(size %u, max %u), "
+					"unsent replies %u (max %u)\n",
+					ptr->outstanding_requests.count,
+					ptr->outstanding_requests.size,
+					ptr->outstanding_requests.max_count,
+					ptr->num_replies_unsent,
+					ptr->max_replies_unsent);
+		}
+		mutex_unlock(&dev->mux);
+	}
+
+	return 0;
+}
+
+static int kbus_proc_stats_open(struct inode *inode __always_unused,
+				struct file *file)
+{
+	return single_open(file, kbus_stats_seq_show, NULL);
+}
+
+static const struct file_operations kbus_proc_stats_file_ops = {
+	.owner = THIS_MODULE,
+	.open = kbus_proc_stats_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release
+};
+
+static struct proc_dir_entry
+*kbus_create_proc_stats_file(struct proc_dir_entry *directory)
+{
+	struct proc_dir_entry *entry = create_proc_entry("stats", 0, directory);
+	if (entry)
+		entry->proc_fops = &kbus_proc_stats_file_ops;
+	return entry;
+}
+
+/* ========================================================================= */
+
+extern void kbus_setup_reporting(void)
+{
+	/* Within the /proc/kbus directory, we have: */
+	kbus_proc_dir = proc_mkdir("kbus", NULL);
+	if (kbus_proc_dir) {
+		/* /proc/kbus/bindings -- message name bindings */
+		kbus_proc_file_bindings =
+		    kbus_create_proc_binding_file(kbus_proc_dir);
+		/* /proc/kbus/stats -- miscellaneous statistics */
+		kbus_proc_file_stats =
+		    kbus_create_proc_stats_file(kbus_proc_dir);
+	}
+}
+
+extern void kbus_remove_reporting(void)
+{
+	if (kbus_proc_dir) {
+		if (kbus_proc_file_bindings)
+			remove_proc_entry("bindings", kbus_proc_dir);
+		if (kbus_proc_file_stats)
+			remove_proc_entry("stats", kbus_proc_dir);
+		remove_proc_entry("kbus", NULL);
+	}
+}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/11] KBUS configuration and Makefile
  2011-03-18 17:21                   ` [PATCH 10/11] KBUS report state to userspace Tony Ibbs
@ 2011-03-18 17:21                     ` Tony Ibbs
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-03-18 17:21 UTC (permalink / raw)
  To: lkml
  Cc: Linux-embedded, Tibs at Kynesim, Richard Watts, Grant Likely, Tony Ibbs


Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 init/Kconfig |    2 +
 ipc/Kconfig  |  117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ipc/Makefile |    9 ++++
 3 files changed, 128 insertions(+), 0 deletions(-)
 create mode 100644 ipc/Kconfig

diff --git a/init/Kconfig b/init/Kconfig
index c972899..945c380 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -339,6 +339,8 @@ config AUDIT_TREE
 	depends on AUDITSYSCALL
 	select FSNOTIFY
 
+source "ipc/Kconfig"
+
 source "kernel/irq/Kconfig"
 
 menu "RCU Subsystem"
diff --git a/ipc/Kconfig b/ipc/Kconfig
new file mode 100644
index 0000000..808d742
--- /dev/null
+++ b/ipc/Kconfig
@@ -0,0 +1,117 @@
+config KBUS
+	tristate "KBUS messaging system"
+	default n
+	---help---
+	  KBUS is a lightweight messaging system, particularly aimed
+	  at embedded platforms. This option provides the kernel support
+	  for mediating messages between client processes.
+
+	  KBUS documentation may be found in Documentation/Kbus.txt
+
+	  If you want KBUS support, you should say Y or M here. If you
+	  choose M, the module will be called kbus.
+
+	  If unsure, say N.
+
+if KBUS
+
+config KBUS_DEBUG
+	bool "Make KBUS debugging messages available"
+	default y
+	---help---
+	  This is the master switch for KBUS debug kernel messages.
+
+	  If N is selected, all debug messages will be compiled out,
+	  and the KBUS_IOC_VERBOSE ioctl will have no effect.
+
+	  If Y is selected, then KBUS_DEBUG_DEFAULT_VERBOSE sets
+	  the default for whether debug messages are emitted or not,
+	  and the KBUS_IOC_VERBOSE ioctl can be used at runtime to
+	  set/unset the output of debugging messages per KBUS device.
+
+	  If unsure, say Y.
+
+config KBUS_DEBUG_DEFAULT_VERBOSE
+	bool "Output KBUS debug messages by default"
+	depends on KBUS_DEBUG
+	default n
+	---help---
+	  This sets the default state for the output of debugging messages,
+	  It only has effect if KBUS_DEBUG is already set.
+
+	  If Y is selected, then KBUS devices default to outputting debug
+	  messages. If N is selected, they do not.
+
+	  In either case, debug messages for a particular KBUS device can
+	  be turned on or off at runtime with the KBUS_IOC_VERBOSE ioctl.
+
+	  If unsure, say N.
+
+config KBUS_DEF_NUM_DEVICES
+	int "Number of KBUS devices to auto-create"
+	default 1
+	range 1 KBUS_MAX_NUM_DEVICES
+	---help---
+	  This specifies the number of KBUS devices automatically created
+	  when the KBUS subsystem initialises (when the module is loaded
+			  or the kernel booted, as appropriate).
+
+	  If KBUS is built as a module, this number may also be given as a
+	  parameter; for example: kbus_num_devices=5.
+
+	  Additional devices can be added at runtime using the
+	  KBUS_IOC_NEWDEVICE ioctl.
+
+config KBUS_MAX_NUM_DEVICES
+	int "Maximum number of KBUS devices"
+	default 256
+	range 1 2147483647
+	# We don't impose a limit on the max, so if you've got enough
+	# RAM the only practical limit will be the (int) minor count
+	# passed to __register_chrdev_region.
+	---help---
+	  The maximum number of KBUS devices to support. If unsure, use
+	  the default of 256.
+
+	  Note that this setting controls the size of an array of pointers
+	  to in-kernel KBUS structures; reducing it only saves a tiny amount
+	  of RAM. On the other hand, once a KBUS device is used, the various
+	  message lists and so on do take space.
+
+config KBUS_DEF_MAX_MESSAGES
+	int "Default KBUS message queue size limit"
+	default 100
+	range 1 2147483647
+	---help---
+	  Specify the default incoming message queue size limit. This default
+	  is applied to all clients whenever they open or reopen a KBUS device
+	  node.
+
+	  Clients sending messages may specify the desired behaviour if any
+	  of the recipients' message queues are full. If a senders own queue
+	  is full, it may not send a message flagged as a Request. Refer to
+	  the KBBUS documentation ("Queues filling up") for details.
+
+	  Clients may change their own queue size limits at any time with the
+	  KBUS_IOC_MAXMSGS ioctl.
+
+	  The default is believed to be a reasonable conservative choice.
+
+config KBUS_MAX_UNSENT_UNBIND_MESSAGES
+	int "Maximum number of unsent KBUS unbind event messages"
+	default 1000
+	range 1 2147483647
+	---help---
+	  KBUS devices may request (by ioctl) that they want to receive
+	  messages when clients bind or unbind as repliers. If such a
+	  message is sent when their incoming queue is full, it is instead
+	  saved onto a set-aside queue, and delivered later. This setting
+	  determines the size of the set-aside queue; if the limit is reached,
+	  a special "bind/unbind event messages were lost" message is queued
+	  instead, and any further bind/unbind messages will be lost, until
+	  such time as the special message can be delivered.
+
+	  If unsure, choose the default.
+
+endif # KBUS
+
diff --git a/ipc/Makefile b/ipc/Makefile
index 9075e17..db692ad 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -2,6 +2,11 @@
 # Makefile for the linux ipc.
 #
 
+ifeq ($(CONFIG_KBUS_DEBUG),y)
+	CFLAGS_kbus_main.o	= -DDEBUG
+	CFLAGS_kbus_report.o	= -DDEBUG
+endif
+
 obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o
 obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o
 obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
@@ -9,4 +14,8 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
 obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
 obj-$(CONFIG_IPC_NS) += namespace.o
 obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
+obj-$(CONFIG_KBUS) += kbus.o
+
+kbus-y := kbus_main.o
+kbus-$(CONFIG_PROC_FS) += kbus_report.o
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-03-18 17:21 [PATCH 00/11] RFC: KBUS messaging subsystem Tony Ibbs
  2011-03-18 17:21 ` [PATCH 01/11] Documentation for KBUS Tony Ibbs
@ 2011-03-22 19:36 ` Jonathan Corbet
  2011-03-23 23:13   ` Tony Ibbs
  2011-05-17  8:50   ` [PATCH 00/11] RFC: KBUS messaging subsystem Florian Fainelli
  1 sibling, 2 replies; 34+ messages in thread
From: Jonathan Corbet @ 2011-03-22 19:36 UTC (permalink / raw)
  To: Tony Ibbs, Grant Likely
  Cc: lkml, Linux-embedded, Tibs at Kynesim, Richard Watts

On Fri, 18 Mar 2011 17:21:09 +0000
Tony Ibbs <tibs@tonyibbs.co.uk> wrote:

> KBUS is a lightweight, Linux kernel mediated messaging system,
> particularly intended for use in embedded environments.

I've spent a bit of time looking at this code...this isn't a detailed
review by any stretch, more like a few impressions.

- Why kbus over, say, a user-space daemon and unix-domain sockets?  I'm
  not sure I see the advantage that comes with putting this into kernel
  space.

- The interface is ... creative.  If you have to do this in kernel space,
  it would be nice to do away with the split write()/ioctl() API for
  reading or writing messages.  It seems like either a write(), OR an
  ioctl() with a message data pointer would suffice; that would cut the
  number of syscalls the applications need to make too.

  Even better might be to just use the socket API.

- Does anything bound the size of a message fed into the kernel with
  write()?  I couldn't find it.  It seems like an application could
  consume arbitrary amounts of kernel memory.

- It would be good to use the kernel's dynamic debugging and tracing
  facilities rather than rolling your own.

- There's lots of kmalloc()/memset() pairs that could be kzalloc().

That's as far as I could get for now.

Thanks,

jon

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-03-22 19:36 ` [PATCH 00/11] RFC: KBUS messaging subsystem Jonathan Corbet
@ 2011-03-23 23:13   ` Tony Ibbs
  2011-03-24 18:03     ` James Chapman
  2011-04-15 21:34     ` [PATCH] extra/1 Allow setting the maximum KBUS message size Tony Ibbs
  2011-05-17  8:50   ` [PATCH 00/11] RFC: KBUS messaging subsystem Florian Fainelli
  1 sibling, 2 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-03-23 23:13 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Grant Likely, lkml, Linux-embedded, Tibs at Kynesim, Richard Watts


On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:

> On Fri, 18 Mar 2011 17:21:09 +0000
> Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>
> > KBUS is a lightweight, Linux kernel mediated messaging system,
> > particularly intended for use in embedded environments.

> - Why kbus over, say, a user-space daemon and unix-domain sockets?  I'm
>  not sure I see the advantage that comes with putting this into kernel
>  space.

Mostly, a kernel module gives us reliability.

In particular, a kernel module allows us to guarantee that a replier
that "goes away" (including crashing) will be detected by KBUS, and
cause a synthetic reply to be sent, so that the sender can know that it
will not get a real reply.

This same guarantee means that the sender end of a stateful dialogue can
be reliably told if the replier end disconnects and (some new version of
it) reconnects - in which case state presumably needs to be
reestablished.

Doing this in userspace would be difficult and unreliable.

There are other problems with userspace daemons, including setting up
many-to-many messaging, message atomicity, and so on. Our past
experience of other people's solutions (previous customers in
particular) is that it is perilously easy to get it wrong in userspace,
and especially to end up with race conditions.

> - The interface is ... creative.

That's very tactfully put.

>  If you have to do this in kernel space,
>  it would be nice to do away with the split write()/ioctl() API for
>  reading or writing messages.  It seems like either a write(), OR an
>  ioctl() with a message data pointer would suffice; that would cut the
>  number of syscalls the applications need to make too.

When the reader is reading a message, using 'read' seems very natural,
and is simple to explain. Because we always return an "entire" message
(i.e., one in which all the message data is in one chunk, rather than
a header pointing to message name and/or data), it also means that
memory handling on return to user space is much simplified. Doing an
ioctl first to find out the length of the message to come is also
simple to explain.

Also, in the case of reading a message, I can see clear advantage
in being able to "stream" the reading of the message data (for a
long and appropriately structured message).

Writing a message *could* be done with 'write' alone. I must admit that
having 'write' detect the end of the message by looking at it feels
wrong, somehow, but that's not a very compelling answer. It is,
however, definitely easier for the user to understand the error if
they try to <send> and get told they haven't written enough data
yet, rather than just waiting for the 'write' to magically complete.

There is also a certain symmetry to using <nextmsg>/'read' and
'write'/<send>, but as you said at the start, it's a bit unusual.

Using an ioctl instead of 'write' would involve a more complex ioctl
than we're otherwise commonly using, would lose the symmetry, and just
didn't feel right. It also means pointer handling for even the simplest
message.

>  Even better might be to just use the socket API.

Whilst the current API is a bit odd, trying to use the socket API looked
to us as if it would be a worse fit.

The socket API doesn't seem to match what we wanted KBUS to do
particularly well. It's not, for instance, obvious how to do a 'recv' of
a variable length message that might be quite short or several hundred
KB long - does one 'recv' the header first, and then the body (which
isn't very nice)? Doing a 'next message' ioctl as current KBUS does
would feel really alien in a socket environment.

Of course, we'd still have to invent our own addressing scheme, and our
own ``struct *addr``, and appropriate socket options, and also decide
how the common options should apply or not (for instance, SO_ACCEPTCONN,
SO_BROADCAST). And how to work with accept/listen/bind and all the other
common calls.

Also, lazily on my part, it's fairly obvious how to write a file
interface for the kernel, but the socket API (from the inside) appears
to be more complex, and to have fewer examples with training wheels.

We *could* reimplement in terms of sockets, but I think the code would
get a lot bigger, and I think using the system would be a lot harder to
explain (I don't think the current message name binding mechanisms would
get any clearer, for instance).

And some of the semantics of KBUS (the sending of a message to say that
the expected replier has been replaced by a new one, for instance) seem
to fit oddly with how people expect sockets to work. Or being told that
the far end has gone away, or is not who one expected it to be.

Also, I'm afraid my experience is that people find sockets hard to
understand (not necessarily justifiably), whereas explaining KBUS to its
intended users is fairly simple - one can assume they know about file
interfaces, and people fairly easily accept a few "odd" extra calls. But
that may not be a very compelling reason from the inside of the
kernel...

> - Does anything bound the size of a message fed into the kernel with
>  write()?  I couldn't find it.  It seems like an application could
>  consume arbitrary amounts of kernel memory.

That is indeed a misfeature. There should be a default limit, and some
way of changing it.

> - It would be good to use the kernel's dynamic debugging and tracing
>  facilities rather than rolling your own.

Mea culpa. KBUS's debug support grew rather erratically, and only
recently got converted to at least using dev_debug and friends.
Also, I'm not at all sure what the current kernel mechanisms are
(pointers are welcomed, since this is a clear case where normal
kernel conventions should be followed, and I don't know what they are).

> - There's lots of kmalloc()/memset() pairs that could be kzalloc().

And I just missed that.

> That's as far as I could get for now.

Thanks, it's all appreciated, and all makes sense.

(and I should say thank you since I started out writing KBUS with a copy
of Linux Device Drivers beside me, and bookmarks for various LWN
articles. It would all be a lot worse without those).

Hope this all makes sense - it's late here but I shan't have a chance to
reply tomorrow.

Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-03-23 23:13   ` Tony Ibbs
@ 2011-03-24 18:03     ` James Chapman
  2011-03-27 19:07       ` Tony Ibbs
  2011-04-15 21:34     ` [PATCH] extra/1 Allow setting the maximum KBUS message size Tony Ibbs
  1 sibling, 1 reply; 34+ messages in thread
From: James Chapman @ 2011-03-24 18:03 UTC (permalink / raw)
  To: Tony Ibbs, Jonathan Corbet
  Cc: Grant Likely, lkml, Linux-embedded, Tibs at Kynesim, Richard Watts

On 23/03/2011 23:13, Tony Ibbs wrote:
> 
> On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:
> 
>> On Fri, 18 Mar 2011 17:21:09 +0000
>> Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>>
>>> KBUS is a lightweight, Linux kernel mediated messaging system,
>>> particularly intended for use in embedded environments.
> 
>> - Why kbus over, say, a user-space daemon and unix-domain sockets?  I'm
>>  not sure I see the advantage that comes with putting this into kernel
>>  space.
> 
> Mostly, a kernel module gives us reliability.
> 
> In particular, a kernel module allows us to guarantee that a replier
> that "goes away" (including crashing) will be detected by KBUS, and
> cause a synthetic reply to be sent, so that the sender can know that it
> will not get a real reply.
>
> This same guarantee means that the sender end of a stateful dialogue can
> be reliably told if the replier end disconnects and (some new version of
> it) reconnects - in which case state presumably needs to be
> reestablished.
> 
> Doing this in userspace would be difficult and unreliable.
> 
> There are other problems with userspace daemons, including setting up
> many-to-many messaging, message atomicity, and so on. Our past
> experience of other people's solutions (previous customers in
> particular) is that it is perilously easy to get it wrong in userspace,
> and especially to end up with race conditions.

I don't understand what Kbus really brings either.

With good sockets programming, it is possible to avoid most of the
issues mentioned above. Frameworks like Glib and DBus can also help.

Have you considered other kernel messaging subsystems such as netlink
sockets, connectors, POSIX message queues etc etc if you don't want DBus?

> 
>> - The interface is ... creative.
> 
> That's very tactfully put.
> 
>>  If you have to do this in kernel space,
>>  it would be nice to do away with the split write()/ioctl() API for
>>  reading or writing messages.  It seems like either a write(), OR an
>>  ioctl() with a message data pointer would suffice; that would cut the
>>  number of syscalls the applications need to make too.
> 
> When the reader is reading a message, using 'read' seems very natural,
> and is simple to explain. Because we always return an "entire" message
> (i.e., one in which all the message data is in one chunk, rather than
> a header pointing to message name and/or data), it also means that
> memory handling on return to user space is much simplified. Doing an
> ioctl first to find out the length of the message to come is also
> simple to explain.

Eh? Network protocols routinely do this sort of thing with regular
sockets. Read the message header then read the rest when you know how
big the rest is. Of where you know the max size of all possible
messages, do one read into a fixed size buffer.

> Also, in the case of reading a message, I can see clear advantage
> in being able to "stream" the reading of the message data (for a
> long and appropriately structured message).
> 
> Writing a message *could* be done with 'write' alone. I must admit that
> having 'write' detect the end of the message by looking at it feels
> wrong, somehow, but that's not a very compelling answer. It is,
> however, definitely easier for the user to understand the error if
> they try to <send> and get told they haven't written enough data
> yet, rather than just waiting for the 'write' to magically complete.

A write to a socket would do the same. I don't get the bit about
detecting the end of the message. I think this complexity is coming from
using char devices for message passing.

> There is also a certain symmetry to using <nextmsg>/'read' and
> 'write'/<send>, but as you said at the start, it's a bit unusual.
> 
> Using an ioctl instead of 'write' would involve a more complex ioctl
> than we're otherwise commonly using, would lose the symmetry, and just
> didn't feel right. It also means pointer handling for even the simplest
> message.
> 
>>  Even better might be to just use the socket API.

Agreed.

> Whilst the current API is a bit odd, trying to use the socket API looked
> to us as if it would be a worse fit.
> 
> The socket API doesn't seem to match what we wanted KBUS to do
> particularly well. It's not, for instance, obvious how to do a 'recv' of
> a variable length message that might be quite short or several hundred
> KB long - does one 'recv' the header first, and then the body (which
> isn't very nice)? Doing a 'next message' ioctl as current KBUS does
> would feel really alien in a socket environment.
> 
> Of course, we'd still have to invent our own addressing scheme, and our
> own ``struct *addr``, and appropriate socket options, and also decide
> how the common options should apply or not (for instance, SO_ACCEPTCONN,
> SO_BROADCAST). And how to work with accept/listen/bind and all the other
> common calls.
>
> Also, lazily on my part, it's fairly obvious how to write a file
> interface for the kernel, but the socket API (from the inside) appears
> to be more complex, and to have fewer examples with training wheels.
> 
> We *could* reimplement in terms of sockets, but I think the code would
> get a lot bigger, and I think using the system would be a lot harder to
> explain (I don't think the current message name binding mechanisms would
> get any clearer, for instance).

Why would you need a new socket family?

> And some of the semantics of KBUS (the sending of a message to say that
> the expected replier has been replaced by a new one, for instance) seem
> to fit oddly with how people expect sockets to work. Or being told that
> the far end has gone away, or is not who one expected it to be.

Not really. It is what DBus is all about.

Perhaps KBUS is intended for uses where DBus is too big? Or is it to
help port legacy RTOS apps to Linux? Shudder. :-)

Perhaps I misunderstand what KBUS might do for me. It might be useful to
present two simple apps implementing the same thing with, say, a unix
socket and KBUS, e.g. sending a message reliably to another process and
handling possible errors.

> Also, I'm afraid my experience is that people find sockets hard to
> understand (not necessarily justifiably), whereas explaining KBUS to its
> intended users is fairly simple - one can assume they know about file
> interfaces, and people fairly easily accept a few "odd" extra calls. But
> that may not be a very compelling reason from the inside of the
> kernel...
> 
>> - Does anything bound the size of a message fed into the kernel with
>>  write()?  I couldn't find it.  It seems like an application could
>>  consume arbitrary amounts of kernel memory.
> 
> That is indeed a misfeature. There should be a default limit, and some
> way of changing it.
> 
>> - It would be good to use the kernel's dynamic debugging and tracing
>>  facilities rather than rolling your own.
> 
> Mea culpa. KBUS's debug support grew rather erratically, and only
> recently got converted to at least using dev_debug and friends.
> Also, I'm not at all sure what the current kernel mechanisms are
> (pointers are welcomed, since this is a clear case where normal
> kernel conventions should be followed, and I don't know what they are).
> 
>> - There's lots of kmalloc()/memset() pairs that could be kzalloc().
> 
> And I just missed that.
> 
>> That's as far as I could get for now.
> 
> Thanks, it's all appreciated, and all makes sense.
> 
> (and I should say thank you since I started out writing KBUS with a copy
> of Linux Device Drivers beside me, and bookmarks for various LWN
> articles. It would all be a lot worse without those).
> 
> Hope this all makes sense - it's late here but I shan't have a chance to
> reply tomorrow.
> 
> Tibs


-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-03-24 18:03     ` James Chapman
@ 2011-03-27 19:07       ` Tony Ibbs
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-03-27 19:07 UTC (permalink / raw)
  To: James Chapman
  Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
	Tibs at Kynesim, Richard Watts

On 24 Mar 2011, at 18:03, James Chapman wrote:

> On 23/03/2011 23:13, Tony Ibbs wrote:
> ...stuff I've left out, because it's expanded on below...
>
> I don't understand what Kbus really brings either.

>From one angle, KBUS aims to fit the particular needs of the embedded
user:

1. It's relatively small (so it will fit in small environments)
2. It's written in C (so no need for C++ libraries, etc., which are
   relatively large)
3. It's meant to be simple to learn (so it should be useful if you don't
   have time or resource to learn something arguably more
   complex/featureful)
4. It's meant to be simple to use without anything but the kernel
   module (i.e., you don't have to use the userspace libraries that
   the overall project also provides).

These are limitations of possible use spaces.

We also know, from practical experience, that a hard-pressed engineer
working on a project wants to be able to reach for a messaging solution
that doesn't take more than (say) an afternoon to get to grips with,
becase in general it is not anywhere near the main thrust of what
they're doing, and they don't have the time or inclination to become
expert in yet another domain. So it needs to be simple to explain and
use.

(Personally, I see this as the main failing of Dbus - even the tutorial
states up front that it is an incomplete and unfinished document. It
would be good if someone produced documentation that gives the same
enthusiasm for Dbus as the 0mq people have done with their stuff.)

>From another angle:

a. KBUS provides deterministic message ordering, so message order is
   the same for everyone receiving messages (if A sends a message,
   and B sends a message, then any of A, B, C, etc., who are listening
   to messages of that name will receive them in the same order.
   Consider messages for 'play', 'rewind' and 'pause', for instance.
   Or 'left by X', 'forward by Y' and 'up by Z' in a non-open space).
b. It guarantees that you'll get a Reply for a Request, even if that
   reply is "you aren't going to get one because the replier has gone
   away". This is done via message rather than error code, since it may
   be determined some time after the sender has sent the original
   message. Callback mechanisms can be used to similar effect, but are a
   pain if you don't want them.
c. It is not possible to send a request if the replier for that request
   does not have room in its incoming message queue. Nor is it possible
   to send a request if the sender does not have room for the (eventual)
   reply (ok, that last is perhaps a bit obvious).

Clearly, if the incoming and outgoing message queues are going be
managed in this way (i.e., according to (c)), there needs to be a
central daemon of some sort. Making sure that a daemon in userspace is
reliable is hard, and making sure that it knows when one of its clients
dies is also difficult (or perhaps "messy" would be a better word).
This last is particularly easy in the kernel, of course. Kernel modules
also come with a host of requirements and facilities that make them
easier to make reliable. Being in the kernel also brings well-debugged
handling for scalability issues, threading and multi-processing, all of
which are difficult to get absolutely right in userspace.

Some things which perhaps are not made as clear in the documentation as
they might be:

i.   KBUS is not client/server based. All senders/listeners (KBUS
     clients) are peers in this respect. Any Ksock can send messages
     (well, if it's opened for write), any Ksock can receive messages.
ii.  Anyone can choose to listen to (receive) a message of a particular 
     name. This is not dependent on who sends such a message.
iii. A single Ksock can choose to be a replier for a message of a
     particular name. This does not limit who may send request messages
     of that name, just who can reply.
iv.  From the above, it should be obvious that any request can be seen
     (received) by any listeners to that same message name, even though
     they can't reply to it. This makes debugging/logging the message
     traffic on a system particular simple.

Also (this is definitely addressed in the documentation), there is
limited wildcarding on message names - basically on the end of the
message name. This can used when binding to a message name for either
listening or replying.

Finally, KBUS does not address message content. It is not intended to be
an RPC mechanism (which I think is what Dbus, for instance, is mostly
after?). It's up to the user to choose an appropriate way of describing
data (of which there are many good examples). I don't think that's a
particularly contentious point, though, but perhaps worth mentioning.

> With good sockets programming, it is possible to avoid most of the
> issues mentioned

If someone who is a very good socket programmer wrote a user-space
library to reproduce all of the things that KBUS does, then that would
be very good (I'm unhappy with the concept of only solving "most" of the
problems, though). I don't think I could do it (hiding client/server,
handling many-to-many transfers over AF_UNIX, etc., etc.), but then I'd
not claim to be a very good socket programmer. And I'd assume it would
take rather more code (since one is, to an extent, fighting against the
underlying paradigms).

I also don't see how you'd get away without some sort of identity
broker, to indicate which socket(s) belong to whom (since I assume each
Ksock-equivalent would need multiple sockets to handle the different
things that need doing). And that's another moderately fiddly bunch of
code to be tested.

> Frameworks like Glib and DBus can also help.

Glib is, surely, addressing different issues.

Dbus is simply too large for many of the places we want to do messaging.
It is also, as I suggested above, not terribly simple to learn, so it is
not going to be picked up by people as a potential solution when they
"just" want to send messages. (Clearly, if one is wanting to communicate
with systems that already use Dbus, this is a different matter, and I
assume one can use existing examples to leverage oneself in such cases,
and anyway one doesn't have a choice.) It's also not clear to me that
Dbus allows one to do all the things we want to do with KBUS - I stand
willing to be told that I just haven't found the right documentation for
that, though.

> Have you considered other kernel messaging subsystems such as netlink
> sockets, connectors, POSIX message queues etc etc if you don't want DBus?

As I understand it, netlink is explicitly lossy (from the connector
documentation: "messages can be lost due to memory pressure or process'
receiving queue overflowed"). So one would have the issue of handling
acknowledgements and re-requests, and suddenly life is more complicated.

If one were using netlink, I think connector looks like an excellent way
to do it (it seems to have decent documentation, for a start), but it
doesn't alleviate the lossy nature of netlink itself.

In both cases, it's a lot simpler just not to do that - use a non-lossy
approach in the first place.

By 'sockets' I assume you mean over AF_UNIX, and to some extent I hope
I've addressed that above. It didn't seem a very natural fit to what
we're trying to achieve, so we didn't start from there.

Posix message queues are interesting, but I don't see how it would be
possble to do N-to-many messaging using them, without having to write
almost all of KBUS-as-infrastructure anyway. Also, as I understand it,
managing the existence of the exposed filesystem paths corresponding to
message queues in a reliable manner is difficult.

I'm not sure what other technologies are meant by "etc., etc.".

> Why would you need a new socket family?

I was assuming the alternative of writing a kernel module that "spoke"
the sockets API, instead of the file API, but still had the KBUS code
inside it (i.e., the internal management of queues, etc.), just as we've
done with the normal file interface API in current KBUS. As I said
before, we'd then need a new socket family - perhaps AF_KBUS. And I'd
contend that if this were to preserve the intended uses of KBUS, it
would be as "creative" a use of the socket interfaces as our current
code is of the file API, and I think in context even more confusing.

Or we could have taken a Poxix message queue style API, which would map
directly to KBUS usage. That, however, would involve new system calls
(i.e., replacing 'mq_' with 'kbus_') and I assume that's an absolute
forbidden.

> Perhaps KBUS is intended for uses where DBus is too big?

Addressed above. It's one of the intentions.

> Or is it to help port legacy RTOS apps to Linux?

I don't understand this point - but assume it is not relevant.

> Perhaps I misunderstand what KBUS might do for me.

I hope the above helps a bit.

> It might be useful to present two simple apps implementing the same
> thing with, say, a unix socket and KBUS, e.g. sending a message
> reliably to another process and handling possible errors.

Indeed, this would be an interesting and valuable exercise. I'm afraid I
don't see that I'm going to have time to produce such a thing in the
near future, though.

The EuroPython 2010 talk on the KBUS website does present some (very)
simple examples of use, albeit in Python, and the kmsg application
presents some very primitive usage (but hardly representative). I'm
afraid the project is sparse on examples at the moment, as most of its
use has been for customers who do not wish their code made public.

All the best,
    Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] extra/1 Allow setting the maximum KBUS message size
  2011-03-23 23:13   ` Tony Ibbs
  2011-03-24 18:03     ` James Chapman
@ 2011-04-15 21:34     ` Tony Ibbs
  2011-04-15 22:46       ` Jonathan Corbet
  1 sibling, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-04-15 21:34 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Grant Likely, lkml, Linux-embedded, Tibs at Kynesim, Richard Watts


On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:

> - Does anything bound the size of a message fed into the kernel with
>  write()?  I couldn't find it.  It seems like an application could
>  consume arbitrary amounts of kernel memory.

This patch provides mechanisms for setting an absolute maximum message
size at compile time, and a per-device maximum at runtime.

The patch is relative to the results of the previous set of patches - I
assume this is better than resubmitting all of them for what is a
relatively small change.

> - It would be good to use the kernel's dynamic debugging and tracing
>  facilities rather than rolling your own.
>
> - There's lots of kmalloc()/memset() pairs that could be kzalloc().

I shall address these next, although I'm afraid it may be a few days.
Thanks, by the way, for the timely LWN article on the dynamic debugging
interface.

Signed-off-by: Tony Ibbs <tibs@tonyibbs.co.uk>
---
 Documentation/Kbus.txt     |   15 ++++++++++-
 include/linux/kbus_defns.h |   14 ++++++++++-
 ipc/Kconfig                |   32 ++++++++++++++++++++++++-
 ipc/kbus_internal.h        |   11 ++++++++
 ipc/kbus_main.c            |   57 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/Documentation/Kbus.txt b/Documentation/Kbus.txt
index 7cf723fd6..16828b9 100644
--- a/Documentation/Kbus.txt
+++ b/Documentation/Kbus.txt
@@ -1058,6 +1058,18 @@ header file (``kbus_defns.h``). They are:
                 Both Python and C bindings provide a useful function to
                 extract the ``is_bind``, ``binder`` and ``name`` values from
                 the data.
+:MAXMSGSIZE:    Set the maximum size of a KBUS message for this KBUS device,
+                and return the value that is set. This is the size of the
+                largest message that may be written to a KBUS Ksock. Trying
+                to write a longer message will result in an -EMSGSIZE error.
+                An attempt to set this value of 0 will just return the current
+                maximum size. Otherwise, the size requested may not be less
+                than 100, or more than the kernel configuration value
+                KBUS_ABS_MAX_MESSAGE_SIZE.  The default maximum size is set by
+                the kernel configuration value KBUS_DEF_MAX_MESSAGE_SIZE, and
+                is typically 1024.  The size being tested is that returned by
+                the KBUS_ENTIRE_MESSAGE_LEN macro - i.e., the size of an
+                equivalent "entire" message.
 
 /proc/kbus/bindings
 -------------------
@@ -1158,7 +1170,8 @@ as values inside the IOError exception.
 :EINVAL:        Something went wrong (generic error).
 :EMSGSIZE:      On attempting to write a message: Data was written after
                 the end of the message (i.e., after the final end guard
-                of the message).
+                of the message), or an attempt was made to write a message
+                that is too long (see the MAXMSGSIZE ioctl).
 :ENAMETOOLONG:  On attempting to bind, unbind or send a message: The message
                 name is too long.
 :ENOENT:        On attempting to open a Ksock: There is no such device
diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 82779a6..29f6f99 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -655,9 +655,21 @@ struct kbus_replier_bind_event_data {
  * of the specified values)
  */
 #define KBUS_IOC_REPORTREPLIERBINDS  _IOWR(KBUS_IOC_MAGIC, 17, char *)
+/*
+ * MAXMSGSIZE - set the maximum size of a KBUS message for this KBUS device.
+ * This may not be set to less than 100, or more than
+ * CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE.
+ * arg (in): __u32, the requested maximum message size, or 0 to just
+ *           request what the current limit is, 1 to request the absolute
+ *           maximum size.
+ * arg (out): __u32, the maximum essage size after this call has
+ *            succeeded
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_MAXMSGSIZE _IOWR(KBUS_IOC_MAGIC, 18, char *)
 
 /* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR	17
+#define KBUS_IOC_MAXNR	18
 
 #if !__KERNEL__ && defined(__cplusplus)
 }
diff --git a/ipc/Kconfig b/ipc/Kconfig
index 808d742..603b2f6 100644
--- a/ipc/Kconfig
+++ b/ipc/Kconfig
@@ -113,5 +113,35 @@ config KBUS_MAX_UNSENT_UNBIND_MESSAGES
 
 	  If unsure, choose the default.
 
-endif # KBUS
+config KBUS_ABS_MAX_MESSAGE_SIZE
+	int "Absolute maximum KBUS mesage size"
+	default 1024
+	range 100 2147483647
+	---help---
+	  This sets the absolute maximum size of an individual KBUS message,
+	  that is, the size of the largest KBUS message that may be written
+	  to a KBUS device node.
+
+	  It is not possible to set the maximum message size greater than
+	  this value using the KBUS_IOC_MAXMSGSIZE ioctl.
 
+	  The size is measured as by the KBUS_ENTIRE_MSG_LEN macro, and
+	  includes the message header (80 bytes on a 32-bit system).
+
+config KBUS_DEF_MAX_MESSAGE_SIZE
+	int "Default maximum KBUS mesage size"
+	default 1024
+	range 100 KBUS_ABS_MAX_MESSAGE_SIZE
+	---help---
+	  This sets the default maximum size of an individual KBUS message,
+	  that is, the size of the largest KBUS message that may be written
+	  to a KBUS device node.
+
+	  It may be altered at runtime, for a particular KBUS device, with
+	  the KBUS_IOC_MAXMSGSIZE ioctl, up to a limit of
+	  KBUS_ABS_MAX_MESSAGE_SIZE.
+
+	  The size is measured as by the KBUS_ENTIRE_MSG_LEN macro, and
+	  includes the message header (80 bytes on a 32-bit system).
+
+endif # KBUS
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index a24fcaf..51d512c 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -86,6 +86,14 @@
 #define CONFIG_KBUS_DEF_NUM_DEVICES	1
 #endif
 
+#ifndef CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE
+#define CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE 1024
+#endif
+
+#ifndef CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE
+#define CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE 1024
+#endif
+
 /*
  * Our initial array sizes could arguably be made configurable
  * for tuning, if we discover this is useful
@@ -685,6 +693,9 @@ struct kbus_dev {
 	struct list_head unsent_unbind_msg_list;
 	u32 unsent_unbind_msg_count;
 	int unsent_unbind_is_tragic;
+
+	/* The maximum message size that may be written to this device */
+	u32 max_message_size;
 };
 
 /*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index e99bfca..64f863a 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -615,6 +615,8 @@ static int kbus_check_message_written(struct kbus_dev *dev,
 	struct kbus_message_header *user_msg =
 	    (struct kbus_message_header *)&this->user_msg;
 
+	int	msg_size;
+
 	if (this == NULL) {
 		dev_err(dev->dev, "pid %u [%s]"
 		       " Tried to check NULL message\n",
@@ -683,6 +685,15 @@ static int kbus_check_message_written(struct kbus_dev *dev,
 		       current->pid, current->comm);
 		return -EINVAL;
 	}
+
+	msg_size = KBUS_ENTIRE_MSG_LEN(user_msg->name_len, user_msg->data_len);
+	if (msg_size > dev->max_message_size) {
+		dev_err(dev->dev, "pid %u [%s]"
+			"Message size is %d, more than the maximum %d\n",
+			current->pid, current->comm,
+			msg_size, dev->max_message_size);
+		return -EMSGSIZE;
+	}
 	return 0;
 }
 
@@ -4150,6 +4161,39 @@ static int kbus_set_report_binds(struct kbus_private_data *priv,
 	return __put_user(old_value, (u32 __user *) arg);
 }
 
+static int kbus_maxmsgsize(struct kbus_private_data *priv,
+			   unsigned long arg)
+{
+	int retval = 0;
+	u32 requested_max;
+
+	retval = __get_user(requested_max, (u32 __user *) arg);
+	if (retval)
+		return retval;
+
+	kbus_maybe_dbg(priv->dev, "%u MAXMSGSIZE requests %u (was %u)\n",
+		       priv->id, requested_max, priv->dev->max_message_size);
+
+	dev_dbg(priv->dev->dev, "    abs max %d, def max %d\n",
+		CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE,
+		CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE);
+
+	/* A value of 0 is a query for the current length */
+	/* A value of 1 is a query for the absolute maximum */
+	if (requested_max == 0)
+		return __put_user(priv->dev->max_message_size,
+				  (u32 __user *) arg);
+	else if (requested_max == 1)
+		return __put_user(CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE,
+				  (u32 __user *) arg);
+	else if (requested_max < 100 ||
+		 requested_max > CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE)
+		return -EINVAL;
+
+	priv->dev->max_message_size = requested_max;
+	return __put_user(priv->dev->max_message_size, (u32 __user *) arg);
+}
+
 static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	int err = 0;
@@ -4357,6 +4401,18 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		retval = kbus_set_report_binds(priv, dev, arg);
 		break;
 
+	case KBUS_IOC_MAXMSGSIZE:
+		/*
+		 * Set (and/or query) maximum message size
+		 *
+		 * arg in: 0 or 1 (for query of current maximum or absolute
+		 * maximu) or maximum size wanted
+		 * arg out: maximum size allowed
+		 * return: 0 means OK, otherwise not OK
+		 */
+		retval = kbus_maxmsgsize(priv, arg);
+		break;
+
 	default:
 		/* *Should* be redundant, if we got our range checks right */
 		retval = -ENOTTY;
@@ -4545,6 +4601,7 @@ static int kbus_setup_new_device(int which)
 	new->index = which;
 
 	new->verbose = KBUS_DEFAULT_VERBOSE_SETTING;
+	new->max_message_size = CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE;
 
 	new->dev = device_create(kbus_class_p, NULL,
 				 this_devno, NULL, "kbus%d", which);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] extra/1 Allow setting the maximum KBUS message size
  2011-04-15 21:34     ` [PATCH] extra/1 Allow setting the maximum KBUS message size Tony Ibbs
@ 2011-04-15 22:46       ` Jonathan Corbet
  2011-04-18 14:01         ` Mark Brown
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Corbet @ 2011-04-15 22:46 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: Grant Likely, lkml, Linux-embedded, Tibs at Kynesim, Richard Watts

On Fri, 15 Apr 2011 22:34:53 +0100
Tony Ibbs <tibs@tonyibbs.co.uk> wrote:

> This patch provides mechanisms for setting an absolute maximum message
> size at compile time, and a per-device maximum at runtime.

It seems like a good step in the right direction.  Do you really need to
add a bunch more configuration options, though?  It seems like a
reasonable default and a way to change it (sysfs file, maybe) might be
better.  Is there a way to cap the total memory used by the kbus subsystem?

The kzalloc() fixes seem like a good idea too, BUT:  I honestly think that
the item at the top of your list, if you want to merge this code, must be
to get the user-space API more widely reviewed and accepted.  It could, I
think, be a big sticking point, and it's something you want to try to
address sooner rather than later.

That means getting more people to look at the patch, which could be hard.
The problem is that, if you wait, they'll only squeal when the code is
close to going in, and you could find yourself set back a long way.  A
good first step might be to CC Andrew Morton on your next posting.

Thanks,

jon

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] extra/1 Allow setting the maximum KBUS message size
  2011-04-15 22:46       ` Jonathan Corbet
@ 2011-04-18 14:01         ` Mark Brown
  2011-04-19 19:33           ` Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Mark Brown @ 2011-04-18 14:01 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Tony Ibbs, Grant Likely, lkml, Linux-embedded, Tibs at Kynesim,
	Richard Watts

On Fri, Apr 15, 2011 at 04:46:08PM -0600, Jonathan Corbet wrote:

> That means getting more people to look at the patch, which could be hard.
> The problem is that, if you wait, they'll only squeal when the code is
> close to going in, and you could find yourself set back a long way.  A
> good first step might be to CC Andrew Morton on your next posting.

One other thing that it'd be good to see is a contrast and compare with
other similar things like the Android binder that are floating around.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] extra/1 Allow setting the maximum KBUS message size
  2011-04-18 14:01         ` Mark Brown
@ 2011-04-19 19:33           ` Tony Ibbs
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-04-19 19:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
	Tibs at Kynesim, Richard Watts, James Chapman


On 18 Apr 2011, at 15:01, Mark Brown wrote:

> On Fri, Apr 15, 2011 at 04:46:08PM -0600, Jonathan Corbet wrote:
> 
>> That means getting more people to look at the patch, which could be hard.
>> The problem is that, if you wait, they'll only squeal when the code is
>> close to going in, and you could find yourself set back a long way.  A
>> good first step might be to CC Andrew Morton on your next posting.
> 
> One other thing that it'd be good to see is a contrast and compare with
> other similar things like the Android binder that are floating around.

I'm aiming to write a smallish example of using KBUS for something not
entirely boring, and that will hopefully work as a starting point for
comparisons. Unfortunately (for this purpose, anyway), the next couple
of weekends (including four days of public holiday!) are taken up with
other things, so it will be a little while.

That will probably be an appropriate thing to CC Andrew Morton.

Comparative examples of how to do something similar with other means may
take longer, of course.

Thanks,
	Tibs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-03-22 19:36 ` [PATCH 00/11] RFC: KBUS messaging subsystem Jonathan Corbet
  2011-03-23 23:13   ` Tony Ibbs
@ 2011-05-17  8:50   ` Florian Fainelli
  2011-05-22 19:58     ` Tony Ibbs
  2011-08-03 20:23     ` [PATCH 00/11] RFC: " Tony Ibbs
  1 sibling, 2 replies; 34+ messages in thread
From: Florian Fainelli @ 2011-05-17  8:50 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Tony Ibbs, Grant Likely, lkml, Linux-embedded, Tibs at Kynesim,
	Richard Watts

Hello,

Sorry for this late answer.

On Tuesday 22 March 2011 20:36:40 Jonathan Corbet wrote:
> On Fri, 18 Mar 2011 17:21:09 +0000
> 
> Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> > KBUS is a lightweight, Linux kernel mediated messaging system,
> > particularly intended for use in embedded environments.
> 
> I've spent a bit of time looking at this code...this isn't a detailed
> review by any stretch, more like a few impressions.
> 
> - Why kbus over, say, a user-space daemon and unix-domain sockets?  I'm
>   not sure I see the advantage that comes with putting this into kernel
>   space.

I also fail to see why this would be required. In my opininon you are trading 
the reliability over complexity by putting this in the kernel.

Most implementations (if not all) involving system-wide message delivery for 
other daemons are running in user-space. Having a daemon for message delivery 
to other kbus clients/servers does not not seem less reliable to me. If you 
had in mind that this daemon might be killed under OOM conditions, then maybe 
your whole system has an issue, issue which could be circumvented by making 
sure the messaging process gets respawned when possible (upstart like 
mechanism or such).

> 
> - The interface is ... creative.  If you have to do this in kernel space,
>   it would be nice to do away with the split write()/ioctl() API for
>   reading or writing messages.  It seems like either a write(), OR an
>   ioctl() with a message data pointer would suffice; that would cut the
>   number of syscalls the applications need to make too.
> 
>   Even better might be to just use the socket API.

Indeed, I would also suggest having a look at what generic netlink already 
provides like messages per application PID, multicasting and marshaling. If 
you intend to keep a part of it in the kernel, you should have a look at this, 
because from my experience with generic netlink, most of the hard job you are 
re-doing here, has already been done in a generic manner.

> 
> - Does anything bound the size of a message fed into the kernel with
>   write()?  I couldn't find it.  It seems like an application could
>   consume arbitrary amounts of kernel memory.
> 
> - It would be good to use the kernel's dynamic debugging and tracing
>   facilities rather than rolling your own.
> 
> - There's lots of kmalloc()/memset() pairs that could be kzalloc().
> 
> That's as far as I could get for now.
> 
> Thanks,
> 
> jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-embedded"
> in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-05-17  8:50   ` [PATCH 00/11] RFC: KBUS messaging subsystem Florian Fainelli
@ 2011-05-22 19:58     ` Tony Ibbs
  2011-07-06 16:15       ` Florian Fainelli
  2011-08-03 20:23     ` [PATCH 00/11] RFC: " Tony Ibbs
  1 sibling, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-05-22 19:58 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
	Tibs at Kynesim, Richard Watts

On 17 May 2011, at 09:50, Florian Fainelli wrote:
 
> Hello,
> 
> Sorry for this late answer.

Not a problem from here, all responses are helpful. In, turn, apologies
for taking so long to reply.

> Most implementations (if not all) involving system-wide message
> delivery for other daemons are running in user-space.

OK. Although I certainly wouldn't claim to have anywhere near a complete
list of such (an annotated list of all the messaging systems on Linux
would be rather interesting, though!).

> If you had in mind that this daemon might be killed under OOM
> conditions, then maybe your whole system has an issue, which
> could be circumvented by making sure the messaging process gets
> respawned when possible (upstart like mechanism or such).

OOM isn't particularly an issue I'd worried about for any part of the
system. Other things tend to cause user processes to crash - using
ffmpeg on random video data, for instance. Of course, that is clearly
not a problem for KBUS itself.

Respawning itself isn't directly a problem, but getting everyone talking
to everyone else again is typically a nasty pain (and one users don't
want to think about), so one tends to want one's messaging handler to be
*very* robust. I think the discipline of working in-kernel helps with
that, although I'd be surprised if that were considered enough reason to
add a new kernel module!

> From: Jonathan Corbet <corbet@lwn.net>
> Date: 22 March 2011 19:36:40 GMT
>
> >   Even better might be to just use the socket API.
> 
> Indeed, I would also suggest having a look at what generic netlink already 
> provides like messages per application PID, multicasting and marshaling.

As I said in an earlier message, I'd ignored netlink because it sounded
as if were intrinsically losssy (no way of not losing messages if a
queue got full) which is a problem for KBUS requests/replies.

On the other hand, understanding netlink from scratch is somewhat
difficult (I've just spent some hours doing more research, and don't
feel like I've begun to get a good idea of its boundaries yet).

I have also been reading the libnl documentation, which seems to make
the userspace end somewhat less complex, and looks like a good thing.

> If you intend to keep a part of it in the kernel, you should have a
> look at this, because from my experience with generic netlink, most of
> the hard job you are re-doing here, has already been done in a generic
> manner.

It looks interesting, but the worrying part of statements like this is
always the "most of".

Is your suggestion that netlink would be a better API than the current
"creating" use of a file API for communicating from user space to the
KBUS kernel module, and then back?

The LWN article http://lwn.net/Articles/131802/ makes that sound
plausible (assuming one can still detect "release" events for netlink
sockets - I assume one can). At first glance I'm not sure how much
harder it is to program such a netlink interface "bare" (without a
userspace library such as libnl) than it is to use the current KBUS
interface in such a manner.

(Aside: a quick look at my current KBUS build shows kbus.ko as 60KB,
libkbus.so (the C userspace library on top of the "raw" usage) as 54KB,
and libnl.so as 277KB - although I don't know how Ubuntu build the
latter, and it obviously also includes all sorts of data description
handling which KBUS deliberately does not. So netlink smaller if "bare",
and bigger, but not a huge amount, if used with its library.)

I'm not entirely sure what happens if either end of the netlink API
doesn't respond in a timely manner - is netlink allowed to throw things
away?

Or did you mean that netlink is appropriate to replace some/much of the
KBUS kernel module as well? In that case I'd have to think about it a
lot more to have an informed opinion.

Anyway.

What I'm working on at the moment is an email in which I try to restate
what we are/were trying to do with KBUS, with simple examples of the
sorts of call we're talking about, and ask if that is a sensible thing
to have in the kernel, emphasising that we are more worried about the
functionality than the API.

If the concept is a good thing but our implementation of it is
objectionable (e.g., we need to rewrite to a less "creative" interface,
be more sockety, or whatever), then so be it, we'll need to rewrite.

If you'd be willing to look at that email when it is posted, I hope it
will be easier to point at specific things and say "yes, that would be
better done with netlink" or, perhaps, "netlink would not address this,
but one might attack it in this way".

Thanks,
    Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-05-22 19:58     ` Tony Ibbs
@ 2011-07-06 16:15       ` Florian Fainelli
  2011-07-28 21:48         ` RFC: [Restatement] " Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Fainelli @ 2011-07-06 16:15 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
	Tibs at Kynesim, Richard Watts

Hello Tony,

On Sunday 22 May 2011 21:58:13 Tony Ibbs wrote:
> On 17 May 2011, at 09:50, Florian Fainelli wrote:
> > Hello,
> > 
> > Sorry for this late answer.
> 
> Not a problem from here, all responses are helpful. In, turn, apologies
> for taking so long to reply.

My answer is also pretty late, sorry about that.

> 
> > Most implementations (if not all) involving system-wide message
> > delivery for other daemons are running in user-space.
> 
> OK. Although I certainly wouldn't claim to have anywhere near a complete
> list of such (an annotated list of all the messaging systems on Linux
> would be rather interesting, though!).

Here is a non-exhaustive one:
- ubus: http://nbd.name/gitweb.cgi?p=luci2/ubus.git;a=summary
- dbus

and at least another one I am using at work.

One thing that I missed while mentionning that I prefer an userland 
implementation, is that you allow several interesting features to be available 
such as:

- peer to peer between two daemons
- shared memory support between two daemons

The later specifically is interesting if you need to transfer large amounts of 
data like images.

> 
> > If you had in mind that this daemon might be killed under OOM
> > conditions, then maybe your whole system has an issue, which
> > could be circumvented by making sure the messaging process gets
> > respawned when possible (upstart like mechanism or such).
> 
> OOM isn't particularly an issue I'd worried about for any part of the
> system. Other things tend to cause user processes to crash - using
> ffmpeg on random video data, for instance. Of course, that is clearly
> not a problem for KBUS itself.
> 
> Respawning itself isn't directly a problem, but getting everyone talking
> to everyone else again is typically a nasty pain (and one users don't
> want to think about), so one tends to want one's messaging handler to be
> *very* robust. I think the discipline of working in-kernel helps with
> that, although I'd be surprised if that were considered enough reason to
> add a new kernel module!

Even if a program which is implementing some KBUS methods is crashing, and 
then restarting, I see two methods to deal with this:
- the caller of the remote kbus method should be made aware that its endpoint 
is not connected and deal with the error
- the respawning program, once registering back on the bus "server" should 
cause the bus server to emit a signal "program-xyz" is back online

It seems to me that we achieve the reliable feature that you want, without 
still making the kernel responsible for this.

Needless to say, there should be some respawning mechanism (ala upstart).

> 
> > From: Jonathan Corbet <corbet@lwn.net>
> > Date: 22 March 2011 19:36:40 GMT
> > 
> > >   Even better might be to just use the socket API.
> > 
> > Indeed, I would also suggest having a look at what generic netlink
> > already provides like messages per application PID, multicasting and
> > marshaling.
> 
> As I said in an earlier message, I'd ignored netlink because it sounded
> as if were intrinsically losssy (no way of not losing messages if a
> queue got full) which is a problem for KBUS requests/replies.

Changing netlink to report "queue full" errors sounds good for both ends, so 
it is not really a big problem. Same goes for all errors actually, it will 
just benefit the existing netlink users.

> 
> On the other hand, understanding netlink from scratch is somewhat
> difficult (I've just spent some hours doing more research, and don't
> feel like I've begun to get a good idea of its boundaries yet).
> 
> I have also been reading the libnl documentation, which seems to make
> the userspace end somewhat less complex, and looks like a good thing.

Yes, libnl hides a lot of complexity of netlink but still adds some, like 
caches and such. But in the end there is no more than 5 to 6 libnl 
handles/objects to use, and there you go. Then you usually use another library 
like libevent to get called on socket writes/reads.

> 
> > If you intend to keep a part of it in the kernel, you should have a
> > look at this, because from my experience with generic netlink, most of
> > the hard job you are re-doing here, has already been done in a generic
> > manner.
> 
> It looks interesting, but the worrying part of statements like this is
> always the "most of".
> 
> Is your suggestion that netlink would be a better API than the current
> "creating" use of a file API for communicating from user space to the
> KBUS kernel module, and then back?

What I like with netlink, which I do not with your implementation, is that 
netlink uses sockets and not traditionnal devices. But what is exposed to 
KBUS-implementors is good.

netlink in my mind is just a transport layer, while you see the transport 
layer as a /dev/kbus<N> device and its kernel module.

> 
> The LWN article http://lwn.net/Articles/131802/ makes that sound
> plausible (assuming one can still detect "release" events for netlink
> sockets - I assume one can). At first glance I'm not sure how much
> harder it is to program such a netlink interface "bare" (without a
> userspace library such as libnl) than it is to use the current KBUS
> interface in such a manner.

It is probably more work to use netlink rather than KBUS with a bare library.

> 
> (Aside: a quick look at my current KBUS build shows kbus.ko as 60KB,
> libkbus.so (the C userspace library on top of the "raw" usage) as 54KB,
> and libnl.so as 277KB - although I don't know how Ubuntu build the
> latter, and it obviously also includes all sorts of data description
> handling which KBUS deliberately does not. So netlink smaller if "bare",
> and bigger, but not a huge amount, if used with its library.)

libnl by default is pretty big, this is why we maintain a stripped down 
version, called libnl-tiny in OpenWrt: 
https://dev.openwrt.org/browser/trunk/package/libnl-tiny (based on libnl-1.1)

this is an orthogonal problem though. Some people might even go for static 
linking to automatically strip down their daemons linking with such a library.

> 
> I'm not entirely sure what happens if either end of the netlink API
> doesn't respond in a timely manner - is netlink allowed to throw things
> away?

As usual with sockets, if you do not read from it, data is thrown away, or you 
end-up looping until all is read if doing epoll.

> 
> Or did you mean that netlink is appropriate to replace some/much of the
> KBUS kernel module as well? In that case I'd have to think about it a
> lot more to have an informed opinion.
> 
> Anyway.
> 
> What I'm working on at the moment is an email in which I try to restate
> what we are/were trying to do with KBUS, with simple examples of the
> sorts of call we're talking about, and ask if that is a sensible thing
> to have in the kernel, emphasising that we are more worried about the
> functionality than the API.
> 
> If the concept is a good thing but our implementation of it is
> objectionable (e.g., we need to rewrite to a less "creative" interface,
> be more sockety, or whatever), then so be it, we'll need to rewrite.

I personaly prefer your interface to be more "sockety" to use your wording. 
Having a /dev/kbus<N> device does not seem very natural to me to read/write 
from, using a socket would be more appropriate. This is why I suggest generic 
netlink, because you will most likely end up doing the same thing, if you go 
the kernel way, which is:

- identifying your process uniquely
- marshalling/unmarshalling data sent to/from the daemon
- allowing several recipients to receive a message from a sender
...

My suggestion is that, if you really want to go the kernel way for transport 
(which I discourage, you certainly got that), let's just use generic netlink 
which has been proven to be reliable and easy to use for this.

> 
> If you'd be willing to look at that email when it is posted, I hope it
> will be easier to point at specific things and say "yes, that would be
> better done with netlink" or, perhaps, "netlink would not address this,
> but one might attack it in this way".

Let's say that if I was to integrate something like KBUS in the kernel, I 
would do it that way:

- rework the KBUS module to implement a generic netlink family and multicast 
group
- keep the same data format/marshalling
- create an user-space library which abstracts the using of this generic 
netlink socket and the KBUS specific messaging

> 
> Thanks,
>     Tibs

This is coming pretty late, but I hope that you are still willing to work on 
this subject despite my comments.
--
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RFC: [Restatement] KBUS messaging subsystem
  2011-07-06 16:15       ` Florian Fainelli
@ 2011-07-28 21:48         ` Tony Ibbs
  2011-07-28 23:58           ` Colin Walters
  2011-08-03 20:48           ` Pekka Enberg
  0 siblings, 2 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-07-28 21:48 UTC (permalink / raw)
  To: lkml
  Cc: Andrew Morton, Jonathan Corbet, Florian Fainelli, Grant Likely,
	Linux-embedded, Tibs at Kynesim, Richard Watts

The company I work for does a fair amount of stuff on embedded
systems, and we often need a simple way to send interprocess
messages. DBUS is too large and complex for this, and the ad-hoc
solutions we've seen people develop normally have significant
problems. So we wrote our own. Since it's a kernel module, it seemed
appropriate to submit it for possible inclusion in Linux.

The original submission to the LKML is at
http://marc.info/?l=linux-kernel&m=130047040716848&w=2

Jonathan Corbet (22 Mar 2011, at 19:36) pointed out, though, that

> The interface is ... creative.

and on 15 Apr 2011, at 23:46, in response to a patch I wrote for a
particular issue, he wrote:

> I honestly think that the item at the top of your list, if you want
> to merge this code, must be to get the user-space API more widely
> reviewed and accepted.  It could, I think, be a big sticking point,
> and it's something you want to try to address sooner rather than later.
>
> That means getting more people to look at the patch, which could be hard.
> The problem is that, if you wait, they'll only squeal when the code is
> close to going in, and you could find yourself set back a long way.  A
> good first step might be to CC Andrew Morton on your next posting.

Other matters have caused this response to be a bit delayed, but he's
clearly right that this should be addressed. And whilst we ourselves
obviously don't mind the API, it's the functionality that we *really*
care about. So there are two questions

- is KBUS something that should be in the kernel, and if it is,
- should it have a different API?

What we want of the system
==========================
We want KBUS to be very quick to learn, and very simple to use, not
least because the target users are often already experts in several
domains, and they don't have time or effort available to spend on
learning a complex system just to send messages.

We provide user-space libraries in various languages (including C and
C++), but we also want KBUS to be simple to use "bare" - target
systems may be small, and may not have much user-space. This also
means that KBUS itself needs to be written in C.

Technically, the most important requirements are:

* We require deterministic message ordering. If A, B and C all send
  messages, anyone receiving those messages must see them in the same
  order, which must be the order they were sent in.

* Messages are identified by name. Names are formed of dot-separated
  words.
* Listeners choose which messages they wish to receive, on the basis
  of the message name. Some simple filtering is available by
  wildcarding on the last word or words of the name.
* All messages are implicitly broadcast (to anyone who wishes to
  receive them).
* Request/reply is handled by a single listener binding as a replier
  to messages with a particular name. When a request message with that
  name is sent, the replier is reponsible for sending a reply message.
  (The request and reply are both still broadcast to any "normal"
  listeners.)
* When a request message is sent, but a reply from the original
  replier is not possible - e.g., if there is noone bound as replier,
  or the original replier has unbound, or even crashed - then the
  system will send a synthetic reply to indicate this. In informal
  terms, the system guarantees a request will get a reply.

The system deliberately avoids a client/server model.

* Any client can be a sender and a listener on the same connection.
* KBUS makes no restriction on who sends messages with what names,
  and it does not restrict who may send request messages.

Whilst requests are guaranteed to be sent to the appropriate recipient
(if they're still listening), "general" listening is by default lossy.
That is, if a listener's incoming message queue is full, then new
messages will by default be dropped (but not if they're a request to
that listener, of course). However, the sender of a message can
choose that all recipients must be able to receive a particular
message (sending with either all-or-fail or all-or-wait).

* KBUS does not say anything about the data (if any) in a message.
  Different users have different requirements, so this gets
  complicated very quickly, and there are many excellent independent
  solutions for this.
* We do want support for multiple buses, which must be walled off from
  each other. Naming them by anything complicated isn't needed.

So why did we write it as a kernel module?
==========================================
As implementors, a kernel module makes a lot of sense. Not least
because:

* It gives us a lot of things for free, including list handling,
  reference counting, thread safety and (on larger systems)
  multi-processor support, which we would otherwise have to write and
  debug ourselves. This also keeps our codebase smaller.
* It helps give us reliability, partly because of the code we're
  relying on, partly because of the strictures of working in the
  kernel, partly by shielding us from userspace.
* It reduces message copying (we have userspace to kernel back to
  userspace, as opposed to a userspace daemon communicating with
  clients via sockets)
* It makes it simple for us to tell when a message recipient has "gone
  away", as the kernel will call our "release" callback for us.
* It allows us to provide the functionality on systems without
  requiring anything much beyond /dev and maybe /proc in userspace.

Since our potential users are generally all building Linux anyway,
adding an extra kernel module is not an issue.

The API
=======
KBUS as written uses the file API. KBUS bus <n> is represented as
device file /dev/kbus<n>. Bus 0 is always present.

So a client connects to KBUS by calling "open" on the appropriate
device file, and disconnects with "close". Messages are sent by
writing the appropriate data to the device using "write", and using a
SEND ioctl to indicate that the message is to be sent [1]. Messages
are read by calling an ioctl to find the length of the next message
(0 if there is none), and then reading it using "read". Polling may be
used to wait for a message read/write in the normal manner.

[1] It has already been suggested that the SEND is not needed given
    that the end of a message is deterministic, but it mirrors the
    "is there a next message" ioctl, and felt tidier to us than
    producing a "can't send" error when the write(s) for the message
    data are finished.

I believe that this use of the file API is what Jonathan Corbet calls
"creative" (we can't disagree with him).

We chose this interface for several reasons:

* All C programmers are likely to have used open/close/read/write at
  some time, and even if they've not used ioctls, our target audience
  will have called functions with arbitrary datastructures.
* We didn't want to (try to) add a new specific API to the kernel
  (I assume that is a luxury whose day has long gone).
* The socket model didn't seem like a good fit to us, not least
  because we did not want a client/server model.
* There is a lot of information on how to write a file-based kernel
  module, including many examples.
* It is obvious that the "release" callback for a "file" will be
  called when the file is closed or the opening process crashes.

I must admit that, this being my first kernel module, the "lots of
documentation and examples" influenced me.

Several people on-list have wondered why we didn't use some socket
interface. This seems partly to be based on the assumption that
"everyone knows how to do sockets" (I may be being unfair, here),
which is not in fact true - many C programmers have not worked with
sockets.

I'm not aware of any good explanations of how to do a socket based
interface to a kernel module, and particularly not of documentation of
how one *should* do it. Mea culpa if I've just looked in the wrong
places.

The interactions we'd *like* our API to have go more-or-less thus:

* Send: either manufacture an entire message, pass it to KBUS,
  and then send it, or pass parts of a message to KBUS, and when all
  of the parts have been given, send it. This latter allows (for
  instance) a common header to be written repeatedly, followed by
  varying data.

* Read: Ask for (the size of) the next message. A size of 0 can
  conveniently be taken as meaning there is no next message. Then, as
  a separate call, read that many bytes.

* Discard a message that is waiting to be read. Current KBUS does this
  by asking for (the size of) the next message again, without an
  intermediate read.

* Poll for the next message to be read. Obviously useful.
 
* Poll for being allowed to send a new message. This is useful if a
  send was rejected, perhaps because it was a request message, and
  the replier does not have room in their queue for it.

As I said earlier, I'm not sure how one would write a kernel module
with a socket-style API that does what KBUS does, although clearly one
could imagine such a thing. I must admit that our feeling is that it
would be "stretching" the normal assumptions of how sockets work in
much the same way that the current API is doing for the file-like
interface.

What already exists
===================
There is a general KBUS homepage at http://kbus-messaging.org/.

The version of KBUS on Google Code, at http://code.google.com/p/kbus/,
is used by several clients of ours, and includes a "higher level" C
library, plus bindings for Python, C++ and Java.

There is a working repository with these patches applied to
Linux 2.6.37, available via:

git pull git://github.com/crazyscot/linux-2.6-kbus.git kbus-2.6.37

The original patches were applied in branch apply-patchset-20110318
-- 
Tibs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-07-28 21:48         ` RFC: [Restatement] " Tony Ibbs
@ 2011-07-28 23:58           ` Colin Walters
  2011-08-03 20:14             ` Tony Ibbs
  2011-08-03 20:48           ` Pekka Enberg
  1 sibling, 1 reply; 34+ messages in thread
From: Colin Walters @ 2011-07-28 23:58 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts

Hi,

Quick background: I work on GNOME mainly, and have contributed various
bits of code to DBUS and made some releases historically (though that
is now in more capable hands).  I should also note I have no patches
in the Linux kernel (but I'm trying to follow it more and participate,
so here I am).

So, what I think your description of this project lacks is high-level
technical requirements and goals; you mention basically just
"deterministic ordering" and "easy API" (the latter being fairly
subjective).  You don't mention what your performance requirements are
(if any) for example.

If you had a high-level requirements list it might be easier to
compare with other things.  For example, you might ask "Why not SYSV
IPC"?  Or  "why not drop files in a temporary directory and use
inotify" etc.

My feeling as a DBus developer is that by far its most important
feature is providing a dynamic naming service (the RequestName call).
I don't know how one could sanely do a general-purpose operating
system without a way for loosely-coupled components to find each other
at runtime.  Things like Debian or Fedora where one can assemble
arbitrary sets of packages basically demand the ability do to this -
think things like X being able to talk to HAL, or Firefox being able
to find NetworkManager.

KBUS seems not to provide this (or at least, I'm not seeing it).

The other key feature is signals, where userspace can multicast to
userspace.  KBUS does provide this.

DBus *does* have deterministic ordering too - I'm not sure why you say we don't.

DBus does have some flaws - for example, the resource controls were
poorly thought out and basically useless.  If we were designing them
today, we'd probably have DBus connections tied to Linux cgroups
somehow.  Currently the system bus is limited to a configured amount
of RAM in the XML config (and we don't even attempt to look at system
memory size to limit it); if you have less than 1G (if i remember
correctly) of phys+swap, you can make the bus OOM.

KBUS from what I can see shares this flaw, except that users allocate
arbitrary amounts of kernel memory rather than in userspace in the
system bus:

static ssize_t kbus_write() {
...
		this->msg = kmalloc(sizeof(*(this->msg)), GFP_KERNEL);
		if (!this->msg) {
			retval = -ENOMEM;
			goto done;
		}
...
}

A somewhat weaker but still useful part of DBus is that it has
mandatory security controls; the policy can restrict which uids can
talk to a given service on the bus, and also allows userspace to check
the credentials of messages they receive (think SO_PEERCRED).  By
having KBUS based on files you seem to lose this which you'd get from
a socket API.

On a different topic, I find myself really unconvinced by the length
to which you go to claim the API is simple and easy to use.  I mean, I
really can't imagine it'd be hard to write a userspace C library
implementing the semantics you have here, and have it be as easy or
easier.    Oh you actually do have a "libkbus" here:
http://code.google.com/p/kbus/source/browse/libkbus/kbus.h

You never mention what tradeoffs you might see from having that in
userspace or whether you tried it for that matter.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-07-28 23:58           ` Colin Walters
@ 2011-08-03 20:14             ` Tony Ibbs
  2011-08-07 16:47               ` Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-08-03 20:14 UTC (permalink / raw)
  To: Colin Walters
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts

Sorry for the delay in replying - the last several days have had no
free time at all (otherwise this would be shorter as well!).

On 29 Jul 2011, at 00:58, Colin Walters wrote:

> So, what I think your description of this project lacks is high-level
> technical requirements and goals;

Well. I thought that was what I was doing. Mea culpa.

> you mention basically just "deterministic ordering"

I'd count the following as our driving requirements/goals, although
this is really just a rephrasing of stuff from last message, so
apologies if I'm missing the point:

* deterministic message ordering (as defined last email)
* messages identified by name (i.e., something human readable and
  potentially mnemonic, not just numbers)
* all messages visible to (can be received by) any interested party
  (this can be particularly important for logging/diagnosis)
* recipient chooses which messages it is interested in
* only (at most!) one replier allowed for a particular message name
* no restriction on who can send messages
* guaranteed reply to a request (a legitimate reply being "the
  designated replier has gone away")
* multiple buses allowed, messages do not move between buses
* a client can send/receive messages on the same connection

Those are the issues that give KBUS its particular flavour.

Implied by some of that is:

* no client/server model
* no timeouts (OK, I didn't say that last email, and it is fairly
  unobviously implicit)

and not quite API:

* must be written in C
* must be a relatively small amount of code (yes, that's subjective)
* must be documented (although I'd take that as a given, and it can
  always be done better)
* must have a relatively small API (again, subjective)

but I thought that was all in the last email. Sorry if it was unclear.

> and "easy API" (the latter being fairly subjective).

Yes, of course it is. But a simple system should be simpler to
explain/use than a complex one. Although I take your point further
down.

> You don't mention what your performance requirements are
> (if any) for example.

We don't particularly have any - we're nearer to user-reaction timing
than real-time issues. Some rough tests a while back indicated that
KBUS performed at about either half or double the speed of inotify
(sorry, I'm not at work so don't have the notes to hand) - so not
particularly fast. It was written to provide the functionality we
needed, with the intention to optimise as needed, and since it's been
fast enough, we've not tried to speed it up. Although I've strong
suspicions (*not* validated by testing, though, so not reliable) of
what is probably slow.

> If you had a high-level requirements list it might be easier to
> compare with other things.

> For example, you might ask "Why not SYSV IPC"?

I know this is probably a rhetorical question, but well, it's poorly
specified, too low level and doesn't do all the things we want (for
instance, how does one handle many-to-many transactions?).

> Or  "why not drop files in a temporary directory and use
> inotify" etc.

Can we all say "ick" (although back in the day on IBM mainframes that
wasn't a bad way to do data passing, as the infrastructure existed to
allow one to know when directory contents changed in a fairly
efficient manner - I had a client using just such a mechanism for
passing around documents containing data reports).

> My feeling as a DBus developer is that by far its most important
> feature is providing a dynamic naming service (the RequestName call).
> I don't know how one could sanely do a general-purpose operating
> system without a way for loosely-coupled components to find each other
> at runtime.  Things like Debian or Fedora where one can assemble
> arbitrary sets of packages basically demand the ability do to this -
> think things like X being able to talk to HAL, or Firefox being able
> to find NetworkManager.

That sounds like a good characterisation of the world DBus is trying
to work in.

> KBUS seems not to provide this (or at least, I'm not seeing it).

Exactly.

I'm not sure why people keep wanting to compare KBUS and DBus (maybe
my fault for the name), as their scopes are so different it feels
rather like comparing a bicycle with a bus company.

Anyway, from what I understand, DBus is an "enterprise" style solution
to messaging for major-level infrastructure - large systems such as
Gnome and its associated programs. As such its aim is to provide a
one-stop solution for all ones messaging needs, including:

* client name brokering (and the very existence of client naming)
* data schemas
* IPC (clearly a specialisation of the above)
* allowing/forbidding message receipt/transmission according to policy
* security policies (see below)

and so on.

These all sound like necessary things in the arena in which DBus is
used. They are not things that KBUS intends to provide. If such things
are needed, DBus already exists and is in common use.

So I would say that DBus is, of necessity, trying to find the maximal
solution for the problem space, so that users do not need to
learn/deploy more than one thing. This makes sense for what it is
trying to do.

    (One could argue about whether it is better to have one large
    system providing everything, or a small core and many extensions,
    but experience shows that the one-large-system tends to win, as
    for instance with EULisp versus Common Lisp, presumably for good
    psychological reasons.)

KBUS, however, is trying to find the minimum that is useful for our
problem space, which is small systems oriented.

So it eschews data description. After all, not all messages *have*
data, and there are many good ways of describing data that already
exist (from ASN.1 to JSON to the google protocol descriptions).

KBUS provides a mechanism for choosing which messages to receive, and
who might reply to a particular message name. But it itself doesn't
provide a registry of allowed names. For many systems that would be
overkill, and if it is needed, then it can be done separately (often,
in fact, in paper specifications).

> DBus *does* have deterministic ordering too - I'm not sure why you
> say we don't.

That's good. I *thought* I'd said that I couldn't tell from the
documentation, but that there was one place that I'd seen that seemed
(possibly) to imply it couldn't.

> DBus does have some flaws - for example, the resource controls were
> poorly thought out and basically useless.  If we were designing them
> today, we'd probably have DBus connections tied to Linux cgroups
> somehow.

Of course, cgroups have only recently begun to be adopted widely, and
presumably weren't around at all when DBus was started.

> KBUS from what I can see shares this flaw

Indeed. Since we've been using it on systems where we control the
system-as-a-whole, it was something we mostly didn't remember to worry
about, and that is a significant flaw, which has already been picked
up on lkml. Regardless of what happens in terms of adoption into the
kernel, it's something we have to address.

> A somewhat weaker but still useful part of DBus is that it has
> mandatory security controls; the policy can restrict which uids can
> talk to a given service on the bus, and also allows userspace to check
> the credentials of messages they receive (think SO_PEERCRED).  By
> having KBUS based on files you seem to lose this which you'd get from
> a socket API.

Again, this is the sort of support one would want in a realistic
enterprise/large scale solution. It's also deliberately not part of
the KBUS design - we want any recipient to be able to receive any
messages (on a given bus). In that sense, KBUS is (by design)
inherently insecure.

> On a different topic, I find myself really unconvinced by the length
> to which you go to claim the API is simple and easy to use.

Point taken.

> I mean, I really can't imagine it'd be hard to write a userspace C
> library implementing the semantics you have here, and have it be as
> easy or easier.    Oh you actually do have a "libkbus" here:
> http://code.google.com/p/kbus/source/browse/libkbus/kbus.h

I'm not sure what you mean by this. libkbus is a layer around the
"bare" usage of KBUS, allowing people to do the more common actions
without (for instance) fettling "errno" all over the place. It's
polite to provide that, as we do for Python (which was the first such
wrapper, so I could write unit tests easily) and C++ - but all of
those *are* just wrappers around the "bare" usage (the Java library is
different, since it's a wrapper around the C library, but that's Java
for you).

I am willing, however, to assert that it would have been harder to
write a C library that did this job in userspace, if only (and
trivially) because I would have had to write all the support code that
the kernel already supplies for me. And then debug it.

> You never mention what tradeoffs you might see from having that in
> userspace or whether you tried it for that matter.

Obviously we haven't tried rewriting KBUS in userspace for Linux. If
we ever ported the functionality to Windows (or BSD or Mac), then a
userspace solution would perhaps be necessary (I don't think it's
quite comparable), but on Linux, where we currently need it, the
solution we've got works well for us.

My belief is that a userspace solution would be less reliable (for one
thing, it's harder to reliably detect that a replier has crashed as
opposed to just being very busy for a moment). It would definitely be
larger.

That doesn't, of course, mean that *other people* will find KBUS
valuable in the kernel (as I think Grant Likely said, uplist, every
extra piece of code added to the kernel imposes a heavy maintenance
load, and so must be very carefully justified).

We've done the work for ourselves, and feel that KBUS provides
something the kernel might want (a simple, low-level messaging system
that is just enough higher level than what is already provided to be
useful), so it's worth our submitting it.

Anyway, thanks for taking the trouble to comment, and I hope this
all makes sense,
Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
  2011-05-17  8:50   ` [PATCH 00/11] RFC: KBUS messaging subsystem Florian Fainelli
  2011-05-22 19:58     ` Tony Ibbs
@ 2011-08-03 20:23     ` Tony Ibbs
  1 sibling, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-08-03 20:23 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
	Tibs at Kynesim, Richard Watts


On 17 May 2011, at 09:50, Florian Fainelli wrote:

> Sorry for this late answer.

And apologies for my own late response. I'll try to keep this short as
I hope the "Restatement" side-thread will address some of it.

> On Tuesday 22 March 2011 20:36:40 Jonathan Corbet wrote:
>> 
>> - Why kbus over, say, a user-space daemon and unix-domain sockets?  I'm
>>  not sure I see the advantage that comes with putting this into kernel
>>  space.
> 
> I also fail to see why this would be required. In my opininon you are trading 
> the reliability over complexity by putting this in the kernel.

I hope that's addressed in the "So why did we write it as a kernel
module?" section of the "Restatement" message thread. Basically,
I believe a kernel module is smaller and "steals" reliability from
code written and tested by others. That doesn't mean it's a good
solution "in the wild", of course (privately we can add whatever we
want to the kernel, but in public it is and must be controlled).

> Indeed, I would also suggest having a look at what generic netlink already 
> provides like messages per application PID, multicasting and marshaling. If 
> you intend to keep a part of it in the kernel, you should have a look at this, 
> because from my experience with generic netlink, most of the hard job you are 
> re-doing here, has already been done in a generic manner.

If we do end up heading that way, I hope you won't mind if I ask
you for advice!

All the best,
	Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-07-28 21:48         ` RFC: [Restatement] " Tony Ibbs
  2011-07-28 23:58           ` Colin Walters
@ 2011-08-03 20:48           ` Pekka Enberg
  2011-08-07 20:24             ` Tony Ibbs
  1 sibling, 1 reply; 34+ messages in thread
From: Pekka Enberg @ 2011-08-03 20:48 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts

Hi Tony,

Your description doesn't really explain what you want to use this
thing exactly for in userspace.

On Fri, Jul 29, 2011 at 12:48 AM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> So why did we write it as a kernel module?
> ==========================================
> As implementors, a kernel module makes a lot of sense. Not least
> because:
>
> * It gives us a lot of things for free, including list handling,
>  reference counting, thread safety and (on larger systems)
>  multi-processor support, which we would otherwise have to write and
>  debug ourselves. This also keeps our codebase smaller.

That's not a reason to put this into the kernel, really.

> * It helps give us reliability, partly because of the code we're
>  relying on, partly because of the strictures of working in the
>  kernel, partly by shielding us from userspace.

So now instead of crashing in userspace, we crash the kernel? This
seems like a bogus argument as well.

> * It reduces message copying (we have userspace to kernel back to
>  userspace, as opposed to a userspace daemon communicating with
>  clients via sockets)

Now this sounds like a real reason but you'd have to explain why you
can't reuse existing zero-copy mechanisms like splice() and tee().

> * It makes it simple for us to tell when a message recipient has "gone
>  away", as the kernel will call our "release" callback for us.

Again, sounds like a reasonable technical requirement but doesn't
really justify putting all this code into the kernel.

> * It allows us to provide the functionality on systems without
>  requiring anything much beyond /dev and maybe /proc in userspace.

Why is this important?

                                Pekka

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-03 20:14             ` Tony Ibbs
@ 2011-08-07 16:47               ` Tony Ibbs
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-08-07 16:47 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: Colin Walters, lkml, Andrew Morton, Jonathan Corbet,
	Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
	Richard Watts


On 3 Aug 2011, at 21:14, Tony Ibbs wrote:

> On 29 Jul 2011, at 00:58, Colin Walters wrote:
> 
>> You don't mention what your performance requirements are
>> (if any) for example.
> 
> We don't particularly have any...

My colleague says that he has previously done measurements that show
KBUS to be about 3 times slower than using shared memory and futexes,
but (of course) much simpler to use.

His implication is, I believe, that this is a respectable speed.

Tibs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-03 20:48           ` Pekka Enberg
@ 2011-08-07 20:24             ` Tony Ibbs
  2011-08-15 11:46               ` Pekka Enberg
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-08-07 20:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts


On 3 Aug 2011, at 21:48, Pekka Enberg wrote:
> Your description doesn't really explain what you want to use this
> thing exactly for in userspace.

A typical use might be communicating between components in a
set-top-box (STB). This might involve:

* Some sort of GUI user interface (e.g., a browser). This will
  send control messages and receive state messages.
* Some sort of IR input, reading keypresses from a remote control. The
  program reading the keypresses will decide to send control messages
  for some of them.
* Possibly input from a mobile phone (over bluetooth or whatever),
  acting as another source of control. It's possible messages may also
  be received that require sending information back to the phone.
* A process reading data streams from the network and passing the
  appropriate parts therefrom to audio and video decoders. This will
  receive messages to tell it which programs to play, and send
  messages indicating what it is doing.
* Another process recording programs to disk, as directed by the user
  inputs. It may need to send messages to the process reading data
  streams. It will also send messages of interest to the GUI.
* A process playing programs back from disk, including "trick play" -
  that is, fast forward, skip and reverse. Obviously it receives
  messages telling it which program to play, and what trick play
  operations to perform. It in turn will send messages to the UI to
  say what it is doing.

Having the listener choose what it wants to listen to is a clear win
in these circumstances - it means that the sender of a message does
not need to know if a new piece of infrastructure is added that also
wants to receive it.

Similarly, allowing any sender to send a particular request also makes
sense, as several processes might want to ask the current location of
play in the displayed video stream, or to request some sort of trick
play action.

(I'm sure all of this could be done perfectly well with, for instance,
DBus as well, but I hope I've adequately explained elsewhere why
that's not an applicable solution.)

A small example might be several programs waiting for particular
conditions to be satisfied, and sending messages to a central program
which lights up LEDs according to the messages it reveives.

Real examples of usage that aren't the STB are a bit difficult to give
because they belong to customer projects that we're not allowed to
talk about.

> On Fri, Jul 29, 2011 at 12:48 AM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> > So why did we write it as a kernel module?
> > ==========================================
> > As implementors, a kernel module makes a lot of sense. Not least
> > because:
> > 
> > * It gives us a lot of things for free, including list handling,
> >  reference counting, thread safety and (on larger systems)
> >  multi-processor support, which we would otherwise have to write and
> >  debug ourselves. This also keeps our codebase smaller.
> 
> That's not a reason to put this into the kernel, really.

It's part of the reason why we wrote KBUS as a kernel module, which is
what this section was about. Agreed, it's not a reason that one can
readily use to argue that "X" (whatever that may be) should go in the
kernel-as-distributed, or we'd have all of user space there, which
would no longer be Linux (not sure what it *would* be).

> > * It helps give us reliability, partly because of the code we're
> >  relying on, partly because of the strictures of working in the
> >  kernel, partly by shielding us from userspace.
> 
> So now instead of crashing in userspace, we crash the kernel? This
> seems like a bogus argument as well.

Well, ignoring the tone of that comment, the same argument as above
applies. Although I would point out that what I was saying was that it
would be intrinsically much less likely to crash anywhere because it
is a kernel module.

> > * It reduces message copying (we have userspace to kernel back to
> >  userspace, as opposed to a userspace daemon communicating with
> >  clients via sockets)
> 
> Now this sounds like a real reason but you'd have to explain why you
> can't reuse existing zero-copy mechanisms like splice() and tee().

Hmm. vmsplice() too, presumably. I'll freely admit I don't know
anything beyond what I've just read about these functions. If one was
writing KBUS from scratch as a userspace library, with associated
daemon, then they might well be useful, but one would need to think
their use through very carefully, and I don't believe the code would
be simple (the image I have in mind is managing message structures
with two-metre long tongs, through an air-water boundary).

> > * It makes it simple for us to tell when a message recipient has "gone
> >  away", as the kernel will call our "release" callback for us.
> 
> Again, sounds like a reasonable technical requirement but doesn't
> really justify putting all this code into the kernel.

I'll get back to that below.

> > * It allows us to provide the functionality on systems without
> >  requiring anything much beyond /dev and maybe /proc in userspace.
> 
> Why is this important?

Because we sometimes want to target systems that do not need a
userspace filesystem, either because they are very simple (so their
needs can be satisfied by starting the necessary programs up in init),
or because they're trying to save space, or because they don't have
any physical storage associated with them, etc.

I assume the real point of your post is that I wrote about the reasons
why we made KBUS a kernel module, but did not really address the
reasons why KBUS might want to be a kernel module in general usage.

Obviously, there's one overriding reason, which is key:

* Inter-process messaging is hard to get right, and very easy to get
  wrong. The kernel provides low-level mechanisms one can use to write
  a userspace inter-process messaging system, but not an actual
  solution.

  Our contention is that a simple inter-process messaging module is a
  worthwhile addition to the toolkit supplied by the kernel. The trick
  is not to get over-ambitious (clearly enterprise solutions like DBus
  belong in userspace), but to provide a sensible mimumum. KBUS is our
  attempt at this, based on our experience of what one actually needs
  in a relatively simple system.

  Clearly, as the needs of a system grow, there is likely to be a
  point at which larger, more powerful solutions may be necessary
  (inevitably if you need things KBUS doesn't provide), but that
  shouldn't preclude providing the simpler solution.

Otherwise, I'll try to give some subsidiary reasons below, but I'm
bound to have forgotten something. The points aren't in any particular
order.

* I aleady said that it is important that the kernel has a single
  point where it knows that a process has gone away. Knowing this is a
  fundamental requirement of KBUS, and it would be difficult and
  unreliable to do in userspace. I actually think this is a very
  important point, as it is at the core of how KBUS works.

* All the queues are in one place.
 
  If KBUS was a userspace daemon, then it has to maintain the same
  queues as it does now (in order to get the same effect), plus some
  fraction of N message copies in transit through the kernel, where N
  is the number of clients sending/receiving messages at a particular
  time.

  With KBUS in the kernel, that "fraction of N" is not needed, and
  thus KBUS can account much more accurately for the memory it is
  using. This in turn means that it can be less conservative about the
  amount of memory available for its queues, meaning it can have more
  messages in transit.

  (Note that KBUS at the moment is nowhere near as good at this as it
  should be, but resource management is acknowledged to be a problem
  that we need to address, and it would be very simple to have a
  memory limit per bus.)

  Again, it's not that one can't do something similar in userspace,
  but that doing it in userspace is both more complicated and more
  wasteful.

* On embedded systems with not much memory, the OOM killer can be
  quite active in userspace. If the message system is crucial, then it
  is a big advantage having it in the kernel, where it cannot be
  killed (that's not to claim that KBUS as it stands is well suited to
  this use case, but it is more suitable than if it were a userspace
  daemon).

  (I do realise that there are ways of overriding the OOM killer per
  process, but being removed from the problem seems more sensible.)

* KBUS works in each client's priority, and thus avoids priority
  inversion problems, compared to userspace daemons.

  A userspace daemon must run at its own particuar priority. If it is
  high, then a low priority program sending messages can starve a
  higher process program, and if it is low, the low prioriy processes
  can preempt higher priority processes.

* Userspace peer-to-peer messaging via sockets (for instance) needs a
  persistent store of client identities ("names"). Writing this so
  that race conditions are minimised is not simple, and doing so makes
  the whole messaging infrastructure more complex. I hope the example
  at the beginning of this email makes it clearer why we'd rather not
  have such.

* It was mentioned before that KBUS being a kernel module makes it
  significantly smaller, as it can leverage code that is already
  present in the kernel. This can be important on embedded systems,
  since NAND flash is slow, and loading an extra few MB of library can
  slow the boot process down unacceptably.

  This matters to us quite a lot, it may matter less to the general
  kernel community...

* Despite having said that we weren't aiming for the sort of security
  handling that DBus provides, some security considerations are of
  interest. In particular, being a kernel module means that KBUS
  definitatively knows the identity of the sender and recipient(s) of
  each message. This makes it possible, for instance, for a sender of
  a request to assert that it should only succeed (at "write" time) if
  the intended recipient is that expected (so if the original recipient
  unbinds and a new recipient rebinds, this can be trivially
  detected - we use this so a sender can realise that the replier has
  changed and will not have any required state).

* Coming back to the "being in the kernel means more code reuse"
  issue, this is not insignificant. If your message manager crashes,
  for whatever reason, you will typically have lost all the in-transit
  messages. This is a fairly serious issue. Reusing lots of well
  tested code, and having to adhere to a moderately rigourous coding
  style and set of practices helps a lot. It's not enough by itself to
  justify being in the kernel, but it should not be ignored as a
  contributory factor once one is balancing issues.

* Being in the kernel means that it should be a lot easier to scale to
  multiple processors. And other forms of scaling that the kernel does
  for you (more or less).

* I've recently received a specific request for support of messaging
  between kernel and userspace (and vice-versa). I've yet to look at
  the feasibility of this (it's my next job after this email), but I
  think it's a fairly simple and non-obscure set of changes to KBUS. I
  don't believe this would be as true of a userspace system.

  This would allow us to replace writing to a user process that exists
  merely to write to a (locally written) driver for a piece of
  hardware with direct communication with that driver.

Tibs



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-07 20:24             ` Tony Ibbs
@ 2011-08-15 11:46               ` Pekka Enberg
  2011-08-21 13:28                 ` Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Pekka Enberg @ 2011-08-15 11:46 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts

Hi Tony,

On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> Real examples of usage that aren't the STB are a bit difficult to give
> because they belong to customer projects that we're not allowed to
> talk about.

That's part of the problem, I suppose. We usually don't merge new
kernel facilities unless we're able to understand (and see) real
applications that need them.

On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> I assume the real point of your post is that I wrote about the reasons
> why we made KBUS a kernel module, but did not really address the
> reasons why KBUS might want to be a kernel module in general usage.

I simply don't see a convincing argument why existing IPC and other
kernel mechanisms are not sufficient to implement what you need. I'm
sure there is one but it's not apparent from your emails.

The whole thing feels more like "lets put a message broker into the
kernel" rather than set of kernel APIs that make sense. I suppose the
rather extensive ioctl() ABI is partly to blame here.

I'm not saying you need to implement everything in userspace but I'm
also not convinced we want _all of this_ in the kernel.

                          Pekka

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-15 11:46               ` Pekka Enberg
@ 2011-08-21 13:28                 ` Tony Ibbs
  2011-08-22  1:15                   ` Bryan Donlan
  0 siblings, 1 reply; 34+ messages in thread
From: Tony Ibbs @ 2011-08-21 13:28 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
	Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts


On 15 Aug 2011, at 12:46, Pekka Enberg wrote:

> Hi Tony,

Hi. Thanks for your reply

> On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>> Real examples of usage that aren't the STB are a bit difficult to give
>> because they belong to customer projects that we're not allowed to
>> talk about.
> 
> That's part of the problem, I suppose. We usually don't merge new
> kernel facilities unless we're able to understand (and see) real
> applications that need them.

I understand. It is a bit of a chicken-and-egg problem.

> On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>> I assume the real point of your post is that I wrote about the reasons
>> why we made KBUS a kernel module, but did not really address the
>> reasons why KBUS might want to be a kernel module in general usage.
> 
> I simply don't see a convincing argument why existing IPC and other
> kernel mechanisms are not sufficient to implement what you need. I'm
> sure there is one but it's not apparent from your emails.

Our major concern, strongly based on experience, is that given the
existing kernel mechanisms, users do not build robust (or even
sometimes working!) solutions for inter-process communication.

This is in large part because they do not realise (at the start) how
difficult this is to do. Especially if they want to keep it small.

The only *sure* way of solving this is to provide a mechanism that is
"always there", and that really means a solution provided by the
kernel. This needs to be at a higher level than what is currently
available, but obviously what exactly is provided is then a matter for
discussion. We'd obviously argue that KBUS hits a "sweet spot" for the
needs we perceive, given our application areas.

> The whole thing feels more like "lets put a message broker into the
> kernel" rather than set of kernel APIs that make sense. I suppose the
> rather extensive ioctl() ABI is partly to blame here.

I'm not sure what you mean by "message broker", except that it's
plainly meant to be a bad thing - the wikipedia meaning doesn't seem
terribly applicable to KBUS, as it covers an awful lot more territory
(mind, the discussion page is amusing).

I'll freely admit we started with the idea of what functionality we
wanted and then chose a simple-to-implement API to make it happen.

*If* KBUS were in the kernel, with its current functionality, what API
would you expect? (not just "a sockety one", but what actual API?) If
one recasts as a sockety API, how is many new socket options better
than a set of ioctls? (or is that just one of those questions to which
the answer is "well, it is"?)

> I'm not saying you need to implement everything in userspace but I'm
> also not convinced we want _all of this_ in the kernel.

That's quite understandable. So, given the functionality we want, what
would you put in the kernel, and what in userspace? (I'm personally
sceptical about how much can be split this way, but still)

(I realise that both this and the previous question are asking you to
do work, but I'm honestly hoping that even if KBUS-as-is is not
applicable, we might figure out an irreducible set of higher level
constructs than those which are currently present which will attack
the problem we wrote KBUS to attack.)

Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-21 13:28                 ` Tony Ibbs
@ 2011-08-22  1:15                   ` Bryan Donlan
  2011-08-29  8:55                     ` Tony Ibbs
  0 siblings, 1 reply; 34+ messages in thread
From: Bryan Donlan @ 2011-08-22  1:15 UTC (permalink / raw)
  To: Tony Ibbs
  Cc: Pekka Enberg, lkml, Andrew Morton, Jonathan Corbet,
	Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
	Richard Watts

On Sun, Aug 21, 2011 at 09:28, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>
> On 15 Aug 2011, at 12:46, Pekka Enberg wrote:
>
>> I simply don't see a convincing argument why existing IPC and other
>> kernel mechanisms are not sufficient to implement what you need. I'm
>> sure there is one but it's not apparent from your emails.
>
> Our major concern, strongly based on experience, is that given the
> existing kernel mechanisms, users do not build robust (or even
> sometimes working!) solutions for inter-process communication.
>
> This is in large part because they do not realise (at the start) how
> difficult this is to do. Especially if they want to keep it small.
>
> The only *sure* way of solving this is to provide a mechanism that is
> "always there", and that really means a solution provided by the
> kernel. This needs to be at a higher level than what is currently
> available, but obviously what exactly is provided is then a matter for
> discussion. We'd obviously argue that KBUS hits a "sweet spot" for the
> needs we perceive, given our application areas.
>
>> The whole thing feels more like "lets put a message broker into the
>> kernel" rather than set of kernel APIs that make sense. I suppose the
>> rather extensive ioctl() ABI is partly to blame here.
>
> I'm not sure what you mean by "message broker", except that it's
> plainly meant to be a bad thing - the wikipedia meaning doesn't seem
> terribly applicable to KBUS, as it covers an awful lot more territory
> (mind, the discussion page is amusing).
>
> I'll freely admit we started with the idea of what functionality we
> wanted and then chose a simple-to-implement API to make it happen.
>
> *If* KBUS were in the kernel, with its current functionality, what API
> would you expect? (not just "a sockety one", but what actual API?) If
> one recasts as a sockety API, how is many new socket options better
> than a set of ioctls? (or is that just one of those questions to which
> the answer is "well, it is"?)

I think this may well be the core problem here - is KBUS, as proposed,
a general API lots of people will find useful, or is it something that
will fit _your_ usecase well, but other usecases poorly?
Designing a good API, of course, is quite difficult, but it _must_ be
done before integrating anything with upstream Linux, as once
something is merged it has to be supported for decades, even if it
turns out to be useless for 99% of usecases.

Some good questions to ask might be:
* Does this system play nice with namespaces?
* What limits are in place to prevent resource exhaustion attacks?
* Can libdbus or other such existing message brokers swap out their
existing central-routing-process based communications with this new
system without applications being aware?

Keep in mind also that the kernel API need not match the
application-visible API, if you can add a userspace library to
translate to the API you want. So, for example, instead of numbering
kbuses, you could define them as a new AF_UNIX protocol, and place
them in the abstract socket namespace (ie, they'd have names like
"\0kbus-0"). Doing something like this avoids creating a new
namespace, and non-embedded devices could place these new primitives
in a tmpfs or other more visible location. It also makes it very cheap
(and a non-privileged operation!) to create kbuses.

So, let's look at your requirements:

* Message broadcast API with prefix filtering
* Deterministic ordering
* Possible to snoop on all messages being passed through
* Must not require any kind of central userspace daemon
* Needs a race-less way of 1) Advertising (and locking) as a replier
for a particular message type and 2) Detecting when the replier dies
(and synthesizing error replies in this event)

Now, to minimize this definition, why not remove prefix filtering from
the kernel? For low-volume buses, it doesn't hurt to do the filtering
in userspace (right?). If you want to reduce the volume of messages
received, do it on a per-bus granularity (and set up lots of buses
instead). After all, you can always connect to multiple buses if you
need to listen for multiple message types. For replier registration,
then, it would be done on a per-bus granularity, not a per-message
granularity.

So we now have an API that might (as an example) look like this:

* Creation of buses - socket(AF_UNIX, SOCK_DGRAM, PROTO_KBUS),
followed by bind() either to a file or in the abstract namespace
* Advertising as a replier on a socket - setsockopt(SOL_KBUS,
KBUS_REPLIER, &one); - returns -EEXIST if a replier is already present
* Sending/receiving messages - ordinary sendto/recvfrom. If a reply is
desired, use sendmsg with an ancillary data item indicating a reply is
desired
* Notification on replier death (or replier buffer overflow etc):
empty message with ancillary data attached informing of the error
condition
* 64-bit global counter on all messages (or messages where requested
by the client) to give a deterministic order between messages sent on
multiple buses (reported via ancillary data)
* Resource limitation based on memory cgroup or something? Not sure
what AF_UNIX uses already, but you could probably use the same system.
* Perhaps support SCM_RIGHTS/SCM_CREDENTIALS transfers as well?

This is a much simpler kernel API, don't you think? It's also easy to
see how dbus could use it as well - just add a method to filter
unicast messages from being seen by other uninterested clients, create
a kbus socket for each dbus connection (with appropriate symlinks for
any registered aliases), and have the owner of a connection socket
register itself as a replier. Now you can send dbus broadcast messages
across the KBUS socket as usual, and perhaps send replies to unicast
messages over a socket passed in over a SCM_CREDENTIALS transfer.
Alternately, you could assign connection IDs, and have a control
message to route unicast replies to their sender - in any case, these
details are something dbus people would need to comment on, if they're
interested, but you can see that it's a use case that shows promise
(I'm not familiar with the dbus security model, however, and so I'm
not sure if this'll play well with it).

In short, API minimalism is key to acceptance in the upstream kernel.
Try to pare down the core API to the bare minimum to get what you
need, rather than implementing your final use case directly into the
kernel using ioctls or whatnot.

Thanks,

Bryan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: [Restatement] KBUS messaging subsystem
  2011-08-22  1:15                   ` Bryan Donlan
@ 2011-08-29  8:55                     ` Tony Ibbs
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Ibbs @ 2011-08-29  8:55 UTC (permalink / raw)
  To: Bryan Donlan
  Cc: Pekka Enberg, lkml, Andrew Morton, Jonathan Corbet,
	Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
	Richard Watts

On 22 Aug 2011, at 02:15, Bryan Donlan wrote:

> I think this may well be the core problem here - is KBUS, as proposed,
> a general API lots of people will find useful, or is it something that
> will fit _your_ use case well, but other use cases poorly?

Indeed.

And, by the way, thanks a lot for this email, which gives me lots of
specific items to reply to.

> Designing a good API, of course, is quite difficult, but it _must_ be
> done before integrating anything with upstream Linux, as once
> something is merged it has to be supported for decades, even if it
> turns out to be useless for 99% of use cases.

Indeed.

It's only anecdotal evidence, of course, but we have put quite a lot
of thought and testing into KBUS - many things that look as if they
are going to be simple or easy either aren't, or lead to unfortunate
consequences.

> Some good questions to ask might be:
> * Does this system play nice with namespaces?
> * What limits are in place to prevent resource exhaustion attacks?
> * Can libdbus or other such existing message brokers swap out their
> existing central-routing-process based communications with this new
> system without applications being aware?

I'll punt on namespaces, since I don't know the terminology being used
in the kernel (there seem to be several sorts of namespace, which are
essentially independent?).

KBUS at the moment is definitely not doing enough to manage its
resources, as I've said elsewhere. However, having all of the queues
in the same place, under the same management, means that it is a
relatively simple job to enforce an overall limit on the memory usage
of a particular bus, as well as per-queue and per-message limits. This
will (eventually) get mended whether KBUS ends up in the kernel or
not.

As to higher-level messaging systems using KBUS, I think that's a red
herring. For a start, I can't see why they'd necessarily be interested
- presumably if they felt the need for such a kernel module they'd
already have moved to introduce one (as in binder, for instance). If
they haven't, it's presumably because their design works well enough
(for their own aims) without. And of course they'd lose a certain
amount of control if part of their system were kernel-maintained,
which might also be important. But also, it's a significant
development project in itself to try to produce a system suitable to
act as the underpinnings for another system. One can't just say "it
looks as if it might work", one needs to implement it and test it,
because of all the edge cases one is bound not to have thought of.
That's a lot of work, and almost entirely unrelated to producing a
simple, minimal system.

  (for what it's worth, and despite that, my gut feeling is that
  any useful minimal messaging system could be used as the
  bottom-level for a libdbus or equivalent, but I'm still not
  convinced it would be worth it, and it would not necessarily give a
  better version of the higher level system)

So I'd say if libdbus or whoever *could* use such a system, that would
be nice, but it should not be the primary aim.

> Keep in mind also that the kernel API need not match the
> application-visible API, if you can add a userspace library to
> translate to the API you want.

OK, although that's basically true of all APIs (for instance, the way
I think of KBUS in the privacy of my own head is with the API I use in
the Python library, or how the message queues actually work within
KBUS itself).

If you look at the existing C and C++ APIs, they provide two somewhat
different abstractions. The Javascript APIs we've used in the past
were even further away from the actual kernel APIs (they never
mentioned a particular bus, since that could be inferred from the
message name in that application).

On the other hand, it is incumbent on us to remember that people
programming to the kernel API are users as well. So your implicit
point (as I take it) in this message that one should use familiar
interfaces in a familiar way is a good one. On the other hand, if we
can present the user with a simpler interface (as in simpler to
program with) by a relatively small amount of underlying work, then
that is a net gain - the user is less likely to make mistakes, and the
overall amount of code written will be smaller.

I think there is a general principle at work, in that one should
solve difficult problems once, in one place, if at all possible.

> So, for example, instead of numbering
> kbuses, you could define them as a new AF_UNIX protocol, and place
> them in the abstract socket namespace (ie, they'd have names like
> "\0kbus-0").

Indeed. I'd regard that as cosmetic detail - each KBUS is still
identified by a number, but instead of that number being used in a
device name, it's being used in a socket name.

> Doing something like this avoids creating a new
> namespace, and non-embedded devices could place these new primitives
> in a tmpfs or other more visible location. It also makes it very cheap
> (and a non-privileged operation!) to create kbuses.

Hmm.

The current mechanism for creating new KBUS buses as an unprivileged
user is admittedly via an ioctl, but clearly that should be replaced
by something more modern (the received wisdom on how to use things
like debugfs, for instance, seems to have changed greatly even in
KBUS's short life). It's not something that the current KBUS model
requires one to do often, so cheapness is not a great issue.

But unprivileged is good.

> So, let's look at your requirements:
> 
> * Message broadcast API with prefix filtering
> * Deterministic ordering
> * Possible to snoop on all messages being passed through
> * Must not require any kind of central userspace daemon
> * Needs a race-less way of 1) Advertising (and locking) as a replier
> for a particular message type and 2) Detecting when the replier dies
> (and synthesizing error replies in this event)
> 
> Now, to minimize this definition, why not remove prefix filtering from
> the kernel? For low-volume buses, it doesn't hurt to do the filtering
> in userspace (right?). If you want to reduce the volume of messages
> received, do it on a per-bus granularity (and set up lots of buses
> instead). After all, you can always connect to multiple buses if you
> need to listen for multiple message types. For replier registration,
> then, it would be done on a per-bus granularity, not a per-message
> granularity.
> 
> So we now have an API that might (as an example) look like this:
> 
> * Creation of buses - socket(AF_UNIX, SOCK_DGRAM, PROTO_KBUS),
> followed by bind() either to a file or in the abstract namespace
> * Advertising as a replier on a socket - setsockopt(SOL_KBUS,
> KBUS_REPLIER, &one); - returns -EEXIST if a replier is already present
> * Sending/receiving messages - ordinary sendto/recvfrom. If a reply is
> desired, use sendmsg with an ancillary data item indicating a reply is
> desired
> * Notification on replier death (or replier buffer overflow etc):
> empty message with ancillary data attached informing of the error
> condition
> * 64-bit global counter on all messages (or messages where requested
> by the client) to give a deterministic order between messages sent on
> multiple buses (reported via ancillary data)
> * Resource limitation based on memory cgroup or something? Not sure
> what AF_UNIX uses already, but you could probably use the same system.
> * Perhaps support SCM_RIGHTS/SCM_CREDENTIALS transfers as well?

Thanks a lot for concrete call examples - that makes it a lot easier
for me to think things through. I'll try to separate out my comments
into some sort of sensible sequence. Forgive me if I miss something.

Current scheme
--------------
In the current system, message sending looks something like the
following:

1. Sender opens bus 0
2. Sender creates a message with name "Fred"
3. Sender may mark the message as needing a reply.
4. Sender writes the message and sends it (these are currently two
   operations, but as was mentioned upstream, could be combined - we
   just liked them better apart).
5. If the message needs a reply, KBUS checks if the sender has enough
   space in its queue to receive a reply, and if not, rejects it
5. KBUS assigns the next message id for bus 0 to the message
6. KBUS determines who should receive the message. If the message
   needs a reply, and no-one has bound as replier, then the send
   fails. Similarly, if the replier does not have room in their queue,
   the send fails.
7. Otherwise, KBUS copies the message to the queues of everyone who
   should receive it. If the message needs a reply, then the
   header of the particular message that needs a reply will be altered
   to indicate this.

At the recipient end, the sequence is something like:

1. Listener opens bus 0.
2. Listener chooses to receive messages with a particular name,
   possibly as a replier.
3. Listener determines if there is a next message, and if so, its
   length.
4. Listener allocates a buffer to receive the message.
5. Listener reads the message into the buffer.

The recipient is guaranteed to read messages in the order they were
sent in, and to only get the messages they asked for.

Sockety scheme
--------------
In the scheme where we're just replacing the calls with appropriate
"sockety" calls, and not altering message name filtering, this
presumably proceeds in a very similar manner, except that we are using
the equivalent sockety calls.

My first question would be how the recipient is meant to tell the
length of the next message before doing their recvfrom/recvmsg.

I realise (now) that "message data may vary in length" wasn't
mentioned up front as a requirement (although I'd aver that it is
pretty evident from the documentation, and from the API proposed, that
this is meant, else why do we have NEXTMSG returning the length of the
next message?). But then I'd never imagined that someone wouldn't
assume this as a property of a general messaging system (after all,
it is clearly simple to build a fixed length message system on top of
a variable length message system, and rather harder to do the
reverse).

Can one use MSG_PEEK to retrieve just the ancillary data? It's not
clear to me from the recvmsg man page. If message data length was sent
in the ancillary data, then this could work. If one can't do that,
perhaps one could use MSG_PEEK to look at the start of the message
proper (although that feels like a horrible hack). Is there precedent
for this?

Otherwise, we're reduced to a special call of getsockopt, or, worse,
separating all messages into a standard sized header message followed
by the message data - but that way lies insanity.

There's also a decision to be made of what does get put into ancillary
data. At one extreme, all of the message header data would be treated
as ancillary data, which means that the user would need to use
sendmsg/recvmsg all of the time. That's a lot more code complexity,
and a lot more allocations. At the other extreme, we don't use
ancillary data at all, in which case we keep the header more-or-less
as is. There is the whole issue of whether message name and data are
referred to as pointers from the header, or are part of the same
buffer (there's some discussion of this in the KBUS documentation,
where it talks about "pointy" and "entire" messages). But that's a
level of detail for later, if necessary.

Also, if we're using ancillary data, can we use a socket specific
method to identify the message sender, something one can feed straight
back into sendmsg (hmm, maybe not - a quick scan around suggests that
AF_UNIX only has SCM_CREDENTIALS and SCM_RIGHTS - maybe I've not
looked hard enough).

  (KBUS's current sender id is nice and simple, but I'd assumed there
  must be some sockety equivalent we should be using...)

Regardless, we have an API comparison something like:

========================            =============================
File-oriented                       Socket-oriented
========================            =============================
open                                socket
close                               close
write [and <send> ioctl]            sendmsg or sendto
<nextmsg> ioctl                     not clear - getsockopt? peek?
read                                recvmsg or recvfrom
<bind> ioctl                        setsockopt
<unbind> ioctl                      setsockopt
poll                                poll
========================            =============================

There are also various ioctls on the file-oriented side that would
clearly be  replaced by get/setsockopt, and more that should be direct
instructions to KBUS via debugfs or something (i.e., they should never
have been ioctls in the first place, if I'd know what to do instead).

> This is a much simpler kernel API, don't you think?

I think we mean different things by that.

Replacing read/write (which are, let's face it, quite simple to use,
and just about every C programmer can get them mostly right) with
sendmsg/recvmsg (which are some of the most complicated calls to use
in the socket world, and for which most easy to find examples are
about moving file descriptors between processes) does not seem to me
to be simplifying anything.

I must admit I'm also not entirely sure why get/setsockopt calls are
*that* much better than ioctls (they do at least specify a length, and
the number of existing options is smaller, but neither of those seems
an obvious win to me).

Regardless, though, assuming the message length problem can be sorted
out (and that's obviously possible by *some* means), it is clearly
feasible to replace one API with another, and I assume one could move
the innards of the current KBUS to talk to the new interface.

Filtering in userspace
----------------------
You suggest performing message filtering in userspace, by reading all
messages and "throwing away" those which are not of interest. This is
predicated on the idea that the data is low volume. Apart from the
fact that I'm not sure what low volume means (are we contrasting
with network traffic on an STB handling audio/video?), we've tried not
to assume anything much about the amount of traffic over KBUS, or the
number of senders/listeners. Granted I personally wouldn't recommend
sending very large messages (I'm doubtful of the sanity of anything
over a few MB, myself, although KBUS will cope with multiple page
messages - albeit rather slowly), or expecting KBUS to be fast
(whatever "fast" means), but I'm reluctant to put those assumptions
into the design.

The inside of KBUS would indeed be slightly simpler if it did not
perform the message filtering (and this is substantially unoptimised
at the moment). However, if the client receives all of the messages,
that's an awful lot more copying being done. Within the kernel module,
message content is reference counted, but as data goes across the
kernel-to-userspace boundary, all of it gets copied. In the current
system, one can happily send a message knowing that it will not get
sent to recipients who do not care, and thus not worry much about the
cost in CPU, memory and so on. In the non-filtering system, such
concerns would need to be a major issue (spamming many clients with a
single large message that they are going to ignore could be a very big
deal, and would be relatively hard to defend against).

You also suggest splitting buses up into a finer granularity, in the
hope that this would cause less userspace filtering to be necessary.
I'm uncomfortable with that suggestion because it is just that, a
suggestion as to how the user might do things. It doesn't address the
problem in a technical manner at all.

I'd also note that there is virtue in having the unsplit buses. In the
current system, it is sensible to say that all messages for one task
will be on bus 0, and messages for another task on bus 1, and the two
cannot interact. One has, if you will, multiple message namespaces. It
makes sense to say that a program will only send messages on bus 0,
without needing to list the messages. In the proposed new system, a
single task will typically need to span multiple buses, and we've lost
a useful distinction.  Thus the original approach is a win for
documentation and pedagogy, if nothing else.

Replier buses
-------------
Putting replier registration on a bus basis. Hmm. So a recipient would
"bind" to a bus as replier *for that bus". Would all messages on that
bus be seen as requests by the replier? I think it would have to be
so, if only because marking messages as requests *as well* leads to
all sorts of possible confusions.

What if the recipient were monitoring messages as well, so it would
also want to "just receive" the requests? Presumably it would have to
open another connection to the bus to receive requests as ordinary
messages. OK.

Meanwhile, the sender presumably has to indicate that this bus is a
replier bus, with an appropriate setsockopt call. Note that this means
that everyone needs to know beforehand the id of that bus, and we are
getting perilously closer to needing some sort of manager of bus
ids/names (this makes me uncomfortable), or having a formalism about
how buses are named (ditto).

So sending now looks more like:

1. Sender opens bus 0

2. If this is to be a replier bus, sender marks it as such, via
   setsockopt. Definitely not via an ioctl.

   Note that we can't check for someone registered as a replier at
   this point, as they might reasonably not have connected to the bus
   yet.

3. Sender creates a message
4. Sender sends the message.

   As before, in the sockety manner.

5. If this is a replier bus, then KBUS checks to see if the sender has
   enough room to receive a reply on it.

6. KBUS assigns the next message id to the message

   In current KBUS, the message id is unique on each bus, and buses
   are isolated from each other. That doesn't work now, because we
   need the recipient to be able to reconstruct message ordering
   across buses. So the new id generation mechanism needs to be
   bus-independent. Using a 64-bit id should probably give us at least
   the id "granularity" that the current 32-bit id does.

7. If this is a replier bus, then KBUS checks to see if there is a
   replier bound to it (thus, leaving it as late as possible, and
   giving the most chance the replier will be there). If no-one has
   bound as replier, then the send fails. Similarly, if the replier
   does not have room in their queue, the send fails.

8. KBUS copies the message to the queues of everyone who has bound to
   receive it.
  
At the recipient end, the sequence is then presumably something like:

1. Listener opens bus 0.
2. Listener possibly chooses to be a replier for that bus, using
   setsockopt. It is an error if there is already a replier for the
   bus.

   Is it an error to bind as a replier on a bus that is not marked as
   such? I can't see that it can be, because otherwise we would have
   to fail with the race condition where:

   a. Listener opens bus
   b. Sender opens bus
   c. Listener binds to bus as replier
   d. Sender tells bus it is a replier bus

   So I think we have to allow a replier bound on a non-replier bus -
   they'd just never get any messages on it. Which means KBUS has to
   make sure not to send ordinary messages to a listener on a bus
   they've bound to as replier.

     (For a world of pain, invent a getsockopt option to check if a
     bus is marked as a replier bus, and wait for it to get set, and
     *then* bind as replier. But I wouldn't want to advise it.)

   That all feels rather messy, and is really one of the sorts of
   reason we went with just marking messages and leaving buses as
   message agnostic transports.

3. Listener determines if there is a next message, and if so, its
   length.

   Again, as said before, I'm not sure how this would be done.

4. Listener allocates the appropriate number of buffers to receive the
   message.

5. Listener reads the message into the buffers.

6. If the message was read via a socket with the "replier" socket
   option set (one assumes the recipient remembers this), then the
   listener needs to send a reply over that same socket.

Conceptually, this does look like it would work (subject to the
binding nastiness mentioned), and it's clearly more-or-less a dual of
the approach we've already taken. If we were splitting current KBUS
buses into finer granularities, it would be a reasonable approach to
consider.

The problem with splitting related messages over many buses
-----------------------------------------------------------
The trouble is that, whilst the recipient is guaranteed to receive
messages *on a given bus* in the correct order, this is no longer
sufficient, as the order we care about is now split over multiple
buses.

You propose that the recipient should reassemble the message order.
This is clearly possible if they are receiving all messages, but at
the cost of having to keep a list (potentially a very long list) of
message ids received and outstanding, and only "releasing" a message
when all preceding message ids have been encountered. A colleague's
comment on this was that we should not be reimplementing TCP/IP in
userspace. I'd just say that this is a non-trivial problem, and if
every recipient has to do it, a potential burden on the performance of
the whole system (especially if we're talking many buses), so it
should be done at the place that causes fewest reimplementations,
i.e., in the kernel module. Which puts us back where we were.

(Obviously, if the recipient is not getting *all* messages, then this
problem is unsolvable, since it cannot know which message ids are
missing - i.e., if it receives messages 5, 9 and 7, it has no way of
knowing whether it should also have received message 8.)

> In short, API minimalism is key to acceptance in the upstream kernel.
> Try to pare down the core API to the bare minimum to get what you
> need, rather than implementing your final use case directly into the
> kernel using ioctls or whatnot.

Hmm. As I recall, when starting KBUS development we said "what's the
simplest API we can present to the user to do the job", at the same
time as asking "what's the simplest set of functionalities that we
need to provide". So, in a very real sense, we did start by trying to
pare down the core API.

Of course, that same aim led us to reject trying to force sockets to
do the job just because "sockets are used for messaging". Not that
they always are, of course - one doesn't classically communicate with
DSPs over sockets, for instance.

> Thanks,
> Bryan

Thanks again,
Tibs


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-08-29  8:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-18 17:21 [PATCH 00/11] RFC: KBUS messaging subsystem Tony Ibbs
2011-03-18 17:21 ` [PATCH 01/11] Documentation for KBUS Tony Ibbs
2011-03-18 17:21   ` [PATCH 02/11] KBUS external header file Tony Ibbs
2011-03-18 17:21     ` [PATCH 03/11] KBUS internal " Tony Ibbs
2011-03-18 17:21       ` [PATCH 04/11] KBUS main source file, basic device support only Tony Ibbs
2011-03-18 17:21         ` [PATCH 05/11] KBUS add support for messages Tony Ibbs
2011-03-18 17:21           ` [PATCH 06/11] KBUS add ability to receive messages only once Tony Ibbs
2011-03-18 17:21             ` [PATCH 07/11] KBUS add ability to add devices at runtime Tony Ibbs
2011-03-18 17:21               ` [PATCH 08/11] KBUS add Replier Bind Events Tony Ibbs
2011-03-18 17:21                 ` [PATCH 09/11] KBUS Replier Bind Event set-aside lists Tony Ibbs
2011-03-18 17:21                   ` [PATCH 10/11] KBUS report state to userspace Tony Ibbs
2011-03-18 17:21                     ` [PATCH 11/11] KBUS configuration and Makefile Tony Ibbs
2011-03-22 19:36 ` [PATCH 00/11] RFC: KBUS messaging subsystem Jonathan Corbet
2011-03-23 23:13   ` Tony Ibbs
2011-03-24 18:03     ` James Chapman
2011-03-27 19:07       ` Tony Ibbs
2011-04-15 21:34     ` [PATCH] extra/1 Allow setting the maximum KBUS message size Tony Ibbs
2011-04-15 22:46       ` Jonathan Corbet
2011-04-18 14:01         ` Mark Brown
2011-04-19 19:33           ` Tony Ibbs
2011-05-17  8:50   ` [PATCH 00/11] RFC: KBUS messaging subsystem Florian Fainelli
2011-05-22 19:58     ` Tony Ibbs
2011-07-06 16:15       ` Florian Fainelli
2011-07-28 21:48         ` RFC: [Restatement] " Tony Ibbs
2011-07-28 23:58           ` Colin Walters
2011-08-03 20:14             ` Tony Ibbs
2011-08-07 16:47               ` Tony Ibbs
2011-08-03 20:48           ` Pekka Enberg
2011-08-07 20:24             ` Tony Ibbs
2011-08-15 11:46               ` Pekka Enberg
2011-08-21 13:28                 ` Tony Ibbs
2011-08-22  1:15                   ` Bryan Donlan
2011-08-29  8:55                     ` Tony Ibbs
2011-08-03 20:23     ` [PATCH 00/11] RFC: " Tony Ibbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).