All of lore.kernel.org
 help / color / mirror / Atom feed
* AF_BUS socket address family
@ 2012-06-29 16:45 Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 01/15] net: bus: Add " Vincent Sanders
                   ` (19 more replies)
  0 siblings, 20 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller


This series adds the bus address family (AF_BUS) it is against
net-next as of yesterday.
 
AF_BUS is a message oriented inter process communication system. 

The principle features are:

 - Reliable datagram based communication (all sockets are of type
   SOCK_SEQPACKET)

 - Multicast message delivery (one to many, unicast as a subset)

 - Strict ordering (messages are delivered to every client in the same order)

 - Ability to pass file descriptors

 - Ability to pass credentials

The basic concept is to provide a virtual bus on which multiple
processes can communicate and policy is imposed by a "bus master".

Introduction
------------

AF_BUS is based upon AF_UNIX but extended for multicast operation and
removes stream operation, responding to extensive feedback on previous
approaches we have made the implementation as isolated as
possible. There are opportunities in the future to integrate the
socket garbage collector with that of the unix socket implementation.

The impetus for creating this IPC mechanism is to replace the
underlying transport for D-Bus. The D-Bus system currently emulates this
IPC mechanism using AF_UNIX sockets in userspace and has numerous
undesirable behaviours. D-Bus is now widely deployed in many areas and
has become a de-facto IPC standard. Using this IPC mechanism as a
transport gives a significant (100% or more) improvement to throughput
with comparable improvement to latency.

This work was undertaken by Collabora for the GENIVI Alliance and we
are committed to responding to feedback promptly and intend to continue
to support this feature into the future.

Operation
---------

A bus is created by processes connecting on an AF_BUS socket. The
"bus master" binds itself instead of connecting to the NULL address.

The socket address is made up of a path component and a numeric
component. The path component is either a pathname or an abstract
socket similar to a unix socket. The numeric component is used to
uniquely identify each connection to the bus. Thus the path identifies
a specific bus and the numeric component the attachment to that bus.

The numeric component of the address is divided into two fixed parts a
prefix to identify multicast groups and a suffix which identifies the
attachment. The kernel allocates a single address in prefix 0 to each
socket upon connection.

Connections are initially limited to communicating with address the
bus master (address 0) . The bus master is responsible for making all
policy decisions around manipulating other attachments including
building multicast groups. 

It is expected that connecting clients use protocol specific messages
to communicate with the bus master to negotiate differing
configurations although a bus master might implement a fixed
behaviour.

AF_BUS itself is protocol agnostic and implements the configured
policy between attachments which allows for a bus master to leave a
bus and communication between clients to continue.

Some test code has been written [1] which demonstrates the usage of
AF_BUS.

Use with BUS_PROTO_DBUS
-----------------------

The initial aim of AF_BUS is to provide a IPC mechanism suitable for
use to provide the underlying transport for D-Bus. 

A socket created using BUS_PROTO_DBUS indicates that the messages
passed will be in the D-Bus format. The userspace libraries have been
updated to use this transport with an updated D-Bus daemon [2] as a bus
master.

The D-Bus protocol allows for multicast groups to be filtered depending
on message contents. These filters are configured by the bus master
but need to be enforced on message delivery. 

We have simply used the standard kernel netfilter mechanism to achieve
this. This is used to filter delivery to clients that may be part of a
multicast group where they are not receiving all messages according to
policy. If a client wishes to further filter its input provision has
been made to allow them to use BPF.

The kernel based IPC has several benefits for D-Bus over the userspace
emulation:

 - Context switching between userspace processes is reduced.
 - Message data copying is reduced.
 - System call overheads are reduced.
 - The userspace D-Bus daemon was subject to resource starvation,
   client contention and priority inversion.
 - Latency is reduced
 - Throughput is increased.

The tools for testing these assertions are available [3] and
consistently show a doubling in throughput and better than halving of
latency.

[1] http://cgit.collabora.com/git/user/javier/check-unix-multicast.git/log/?h=af-bus
[2] http://cgit.collabora.com/git/user/rodrigo/dbus.git/

[3] git://github.com/kanchev/dbus-ping.git
    https://github.com/kanchev/dbus-ping/blob/master/dbus-genivi-benchmarking.sh

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH net-next 01/15] net: bus: Add AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 02/15] net: bus: Add documentation for AF_BUS Vincent Sanders
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

This adds AF_BUS to the socket headers and net core.

AF_BUS is a message oriented inter process communication system
implemented as asocket address family. The principle features are:

 - Reliable datagram based communication (all sockets are of type
   SOCK_SEQPACKET)
 - Multicast message delivery (one to many, unicast as a subset)
 - Strict ordering (messages are delivered to every client in the same order)
 - Ability to pass file descriptors
 - Ability to pass credentials

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 include/linux/socket.h |    5 ++++-
 net/core/sock.c        |    6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 25d6322..d244e69 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -195,7 +195,8 @@ struct ucred {
 #define AF_CAIF		37	/* CAIF sockets			*/
 #define AF_ALG		38	/* Algorithm sockets		*/
 #define AF_NFC		39	/* NFC sockets			*/
-#define AF_MAX		40	/* For now.. */
+#define AF_BUS		40	/* BUS sockets			*/
+#define AF_MAX		41	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -238,6 +239,7 @@ struct ucred {
 #define PF_CAIF		AF_CAIF
 #define PF_ALG		AF_ALG
 #define PF_NFC		AF_NFC
+#define PF_BUS		AF_BUS
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
@@ -312,6 +314,7 @@ struct ucred {
 #define SOL_IUCV	277
 #define SOL_CAIF	278
 #define SOL_ALG		279
+#define SOL_BUS		280
 
 /* IPX options */
 #define IPX_TYPE	1
diff --git a/net/core/sock.c b/net/core/sock.c
index 929bdcc..b9c5fc8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -208,7 +208,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV"        ,
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN"     , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"      ,
-  "sk_lock-AF_NFC"   , "sk_lock-AF_MAX"
+  "sk_lock-AF_NFC"   , "sk_lock-AF_BUS"     , "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX"     , "slock-AF_INET"     ,
@@ -224,7 +224,7 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_TIPC"  , "slock-AF_BLUETOOTH", "slock-AF_IUCV"     ,
   "slock-AF_RXRPC" , "slock-AF_ISDN"     , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"      ,
-  "slock-AF_NFC"   , "slock-AF_MAX"
+  "slock-AF_NFC"   , "slock-AF_BUS"     , "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX"     , "clock-AF_INET"     ,
@@ -240,7 +240,7 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_TIPC"  , "clock-AF_BLUETOOTH", "clock-AF_IUCV"     ,
   "clock-AF_RXRPC" , "clock-AF_ISDN"     , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"      ,
-  "clock-AF_NFC"   , "clock-AF_MAX"
+  "clock-AF_NFC"   , "clock-AF_BUS"     , "clock-AF_MAX"
 };
 
 /*
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 02/15] net: bus: Add documentation for AF_BUS
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 01/15] net: bus: Add " Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 03/15] net: bus: Add AF_BUS socket and address definitions Vincent Sanders
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

Docuemnt the AF_BUS design, API and usage semantics.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 Documentation/networking/af_bus.txt |  558 +++++++++++++++++++++++++++++++++++
 1 file changed, 558 insertions(+)
 create mode 100644 Documentation/networking/af_bus.txt

diff --git a/Documentation/networking/af_bus.txt b/Documentation/networking/af_bus.txt
new file mode 100644
index 0000000..a0b078f
--- /dev/null
+++ b/Documentation/networking/af_bus.txt
@@ -0,0 +1,558 @@
+			The AF_BUS socket address family
+			================================
+
+Introduction
+------------
+
+AF_BUS is a message oriented inter process communication system.
+
+The principle features are:
+
+ - Reliable datagram based communication (all sockets are of type
+   SOCK_SEQPACKET)
+
+ - Multicast message delivery (one to many, unicast as a subset)
+
+ - Strict ordering (messages are delivered to every client in the same order)
+
+ - Ability to pass file descriptors
+
+ - Ability to pass credentials
+
+The basic concept is to provide a virtual bus on which multiple
+processes can communicate and policy is imposed by a "bus master".
+
+A process can create buses to which other processes can connect and
+communicate with each other by sending messages. Processes' addresses
+are automatically assigned by the bus on connect and are
+unique. Messages can be sent either to a process' unique address or to
+a bus multicast addresses.
+
+Netfilter rules or Berkeley Packet Filter can be used to restrict the
+messages that each peer is allowed to receive. This is especially
+important when sending to multicast addresses.
+
+Besides messages, process can send and receive ancillary data (i.e.,
+SCM_RIGHTS for passing file descriptors or SCM_CREDENTIALS for passing
+Unix credentials). In the case of a multicast message all recipients
+of a message may obtain a copy a file descriptor or credentials.
+
+A bus is created by processes connecting on an AF_BUS socket. The
+"bus master" binds itself instead of connecting to the NULL address.
+
+The socket address is made up of a path component and a numeric
+component. The path component is either a pathname or an abstract
+socket similar to a unix socket. The numeric component is used to
+uniquely identify each connection to the bus. Thus the path identifies
+a specific bus and the numeric component the attachment to that bus.
+
+The process that calls bind(2) on the socket is the owner of the bus
+and is called the bus master. The master is a special client of the
+bus and has some responsibility for the bus' operation. The master is
+assigned a fixed address with all the bits zero (0x0000000000000000).
+
+Each process connected to an AF_BUS socket has one or more addresses
+within that bus. These addresses are 64-bit unsigned integers,
+interpreted by splitting the address into two parts: the most
+significant 16 bits are a prefix identifying the type of address, and
+the remaining 48 bits are the actual client address within that
+prefix, as shown in this figure:
+
+Bit:  0             15 16                                            63
+     +----------------+------------------------------------------------+
+     |  Type prefix   |                Client address                  |
+     +----------------+------------------------------------------------+
+
+The prefix with all bits zero is reserved for use by the kernel, which
+automatically assigns one address from this prefix to each client on
+connection.  The address in this prefix with all bits zero is always
+assigned to the bus master. Addresses on the prefix 0x0000 are unique
+and will never repeat for the lifetime of the bus master.
+
+A client may have multiple addresses. When data is sent to other
+clients, those clients will always see the sender address that is in
+the prefix 0x0000 address space when calling recvmsg(2) or
+recvfrom(2). Similarly, the prefix 0x0000 address is returned by calls
+to getsockname(2) and getpeername(2).
+
+For each prefix, the address where the least significant 48 bits are
+all 1 (i.e., 0xffffffffffff) is also reserved, and can be used to send
+multicast messages to all the peers on a prefix.
+
+The non-reserved addresses in each of the remaining prefixes are
+managed by the bus master, which may assign additional addresses to
+any other connected socket.
+
+Having different name-spaces has two advantages:
+
+  - Clients can have addresses on different mutually-exclusive
+    scopes. This permits sending multicast packets to only clients
+    that have addresses on a given prefix.
+
+  - The addressing scheme can be more flexible. The kernel will only
+    assign unique addresses on the all-bits-zero prefix (0x0000) and
+    allows the bus master process to assign additional addresses to
+    clients on other prefixes.  By having different prefixes, the
+    kernel and bus master assignments will not collide.
+
+AF_BUS transport can support two network topologies. When a process
+first connects to the bus master, it can only communicate with the bus
+master. The process can't send and receive packets from other peers on
+the bus. So, from the client process point of view the network
+topology is point-to-point.
+
+The bus master can allow the connected peer to be part of the bus and
+start to communicate with other peers by setting a socket option with
+the setsockopt(2) system call using the accepted socket descriptor. At
+this point, the topology becomes a bus to the client process.
+
+Packets whose destination address is not assigned to any client are
+routed by default to the bus master (the client accepted socket
+descriptor).
+
+
+Semantics
+---------
+
+Bus features:
+
+ - Unicast and multicast addressing scheme.
+ - Ability to assign addresses from user-space with different prefixes.
+ - Automatic address assignment.
+ - Ordered packets delivery (FIFO, total ordering).
+ - File descriptor and credentials passing.
+ - Support for both point-to-point and bus network topologies.
+ - Bus control access managed from user-space.
+ - Netfilter hooks for packet sending, routing and receiving.
+
+A process (the "bus master") can create an AF_BUS bus with socket(2)
+and use bind(2) to assign an address to the bus. Then it can listen(2)
+on the created socket to start accepting incoming connections with
+accept(2).
+
+Processes can connect to the bus by creating a socket with socket(2)
+and using connect(2). The kernel will assign a unique address to each
+connection and messages can be sent and received by using BSD socket
+primitives.
+
+This uses the connect(2) semantic in a non-traditional way, with
+AF_BUS sockets, it's not possible to connect "my" socket to a specific
+peer socket whereas the traditional BSD sockets API usage, connect(2)
+either connects to stream sockets, or assigns a peer address to a
+datagram socket (so that send(2) can be used instead of sendto()).
+
+An AF_BUS socket address is represented as a combination of a bus
+address and a bus path name. Address are unique within a path. The
+unique bus address is further subdivided into a prefix and a client
+address. Thus the path identifies a specific bus and the numeric
+component the attachment to that bus.
+
+#define BUS_PATH_MAX    108
+
+/* Bus address */
+struct bus_addr {
+	uint64_t    s_addr; 	/* 16-bit prefix + 48-bit client address */
+};
+
+/* Structure describing an AF_BUS socket address. */
+struct sockaddr_bus {
+	sa_family_t     sbus_family; 	   	  /* AF_BUS */
+	struct bus_addr sbus_addr;                /* bus address */
+	char 		sbus_path[BUS_PATH_MAX];  /* pathname */
+};
+
+A process becomes a bus master for a given struct sockaddr_bus by
+calling bind(2) on an AF_BUS addresses. The argument must be { AF_BUS,
+0, path }. 
+
+AF_BUS supports both abstract and non-abstract path names. Abstract
+names are distinguished by the fact that sbus_path[0] == '\0' and they
+don't represent file system paths while non-abstract paths are bound
+to a file system path name. (See the unix(7) man page for a discussion
+of abstract socket addresses in the AF_UNIX address family.)
+
+Then the process calls listen(2) to accept incoming connections. If
+that process calls getsockname(2), the returned address will be {
+AF_BUS, 0, path }.
+
+The conventional string form of the full address is path + ":" +
+prefix + "/" + client address. Prefix and client address are
+represented in hex.
+
+For example the address:
+
+struct sockaddr_bus addr;
+addr.sbus_family = AF_BUS;
+strcpy(addr.sbus_path, "/tmp/test");
+addr.sbus_addr.s_addr   = 0x0002f00ddeadbeef;
+
+would be represented using the string /tmp/test:0002/f00ddeadbeef.
+
+If the bus_addr is 0, then both the prefix and client address may be
+omitted from the string form.  To connect to a bus as a client it is
+sufficient to specify the path, since the listening address always has
+bus_addr == 0. it is not meanigful to specify 'bus_addr' as other than
+0 on connect()
+
+The AF_BUS implementation will automatically assign a unique address
+to each client but the bus master can assign additional addresses on a
+different prefix by means of the setsockopt(2) system call. For
+example:
+
+struct bus_addr addr;
+addr.s_addr = 0x0001deadfee1dead;
+ret = setsockopt(afd, SOL_BUS, BUS_ADD_ADDR, &addr, sizeof(addr));
+
+where afd is the accepted socket descriptor in the daemon. To show graphically:
+
+	  L          The AF_BUS listening socket  }
+       /  |  \                                    }-- listener process
+     A1  A2  A3      The AF_BUS accepted sockets  }
+      |   |   |
+     C1  C2  C3      The AF_BUS connected sockets }-- client processes
+
+So if setsockopt(A1, SOL_BUS, BUS_ADD_ADDR, &addr, sizeof(addr)) is
+called, C1 will get the new address.
+
+The inverse operation is BUS_DEL_ADDR, which the bus master can use to
+remove a client socket AF_BUS address:
+
+ret = setsockopt(afd, SOL_BUS, BUS_DEL_ADDR, &addr, sizeof(addr));
+
+Besides assigning additional addresses, the bus master has to allow a
+client process to communicate with other peers on the bus using a
+setsockopt(2):
+
+ret = setsockopt(afd, SOL_BUS, BUS_JOIN_BUS, NULL, 0);
+
+Clients are not meant to send messages to each other until the master
+tells them (in a protocol-specific way) that the BUS_JOIN_BUS
+setsockopt(2) call was made.
+
+If a client sends a message to a destination other than the bus
+master's all-zero address before joining the bus, a EHOSTUNREACH (No
+route to host) error is returned since the only host that exists in
+the point-to-point network before the client joins the bus are the
+client and the bus master.  
+
+A EHOSTUNREACH is returned if a client that joined a bus tries to send
+a packet to a client from another bus. Cross-bus communication is not
+permited.
+
+When a process wants to send a unicast message to a peer, it fills a
+sockaddr structure and performs a socket operation (i.e., sendto(2))
+
+struct sockaddr_bus addr;
+char *msg = "Hello world";
+
+addr.sbus_family 	   = AF_BUS;
+strcpy(addr.sbus_path, "/tmp/test");
+addr.sbus_addr.s_addr   = 0x0001f00ddeadbeef;
+
+ret = sendto(sockfd, "Hello world", strlen("Hello world"), 0,
+	    (struct sockaddr*)&addr, sizeof(addr));
+
+The current implementation requires that the addr.sbus_path component
+match the one used to conenct() to the bus but in future this
+requirement will be removed.
+
+The kernel will first check that the socket is connected and that the
+bus path of the socket correspond with the destination, then it will
+extract the prefix and client address from the bus address using a
+fixed 16 -bit bitmask.
+
+prefix 		= bus address >> 48 & 0xffff
+client address 	= bus address & 0xffff
+
+If the client address is not all bits one, then the message is unicast
+and is delivered to the socket with that assigned address
+(0x0001f00ddeadbeef).  Otherwise the message is multicast and is
+delivered to all the peers with this address prefix (0x0001 in this
+case).
+
+So, when a process wants to send a multicast message, it just has to
+fill the address structure with the address prefix + 0xffffffffffff:
+
+struct sockaddr_bus addr;
+char *msg = "Hello world";
+
+addr.bus_family = AF_BUS;
+strcpy(addr.sbus_path, "/tmp/test");
+addr.bus_addr   = 0x0001ffffffffffff;
+
+ret = sendto(sockfd, "Hello world", strlen("Hello world"), 0,
+	    (struct sockaddr*)&addr, sizeof(addr));
+
+The kernel, will apply the binary and operation, learn that the
+address is 0xffffffffffff and send the message to all the peers on
+this prefix (0x0001).
+
+Socket transmit queued bytes are limited by a maximum send buffer size
+(sysctl_wmem_max) defined in the kernel and can be modified at runtime
+using the sysctl interface on /proc/sys/net/core/wmem_default. This
+parameter is global for all the sockets families in a Linux system.
+
+AF_BUS permits the definition of a per-bus maximum send buffer size
+using the BUS_SET_SENDBUF socket option. The bus master can call the
+setsockopt(2) system call using as a parameter the listening socket.
+The command sets a maximum write buffer that will be imposed on each
+new socket that connects to the bus:
+
+ret = setsockopt(serverfd, SOL_BUS, BUS_SET_SENDBUF, &sndbuf,
+sizeof(int));
+
+In the transmission path both Berkeley Packet Filters and Netfilter
+hooks are available, so they can be used to filter sending packets.
+
+
+Using this addressing scheme with D-Bus
+---------------------------------------
+
+As an example of a use case for AF_BUS, let's analyze how the D-Bus
+IPC system can be implemented on top of it.
+
+We define a new D-Bus address type "afbus".
+
+A D-Bus client may connect to an address of the form "afbus:path=X"
+where X is a string. This means that it connect()s to { AF_BUS, 0, X }.
+
+For example: afbus:path=/tmp/test connects to { AF_BUS, 0, /tmp/test }.
+
+A D-Bus daemon may listen on the address "afbus:", which means that it
+binds to { AF_BUS, 0, /tmp/test }. It will advertise an address of the
+form "afbus:path=/tmp/test" to clients, for instance via the
+--print-address option, or via dbus-launch setting the
+DBUS_SESSION_BUS_ADDRESS environment variable.  For instance, "afbus:"
+is an appropriate default listening address for the session bus,
+resulting in dbus-launch setting the DBUS_SESSION_BUS_ADDRESS
+environment variable to something like
+"afbus:path=/tmp/test,guid=...".
+
+A D-Bus daemon may listen on the address "afbus:file=/some/file",
+which means that it will do as above, then write its path into the
+given well-known file.  For instance,
+"afbus:file=/run/dbus/system_bus.afbus" is an appropriate listening
+address for the system bus. Only processes with suitable privileges to
+write to that file can impersonate the system bus.
+
+D-Bus clients wishing to connect to the well-known system bus should
+attempt to connect to afbus:file=/run/dbus/system_bus.afbus, falling
+back to unix:path=/var/run/dbus/system_bus_socket if that fails. On
+Linux systems, the well-known system bus daemon should attempt to
+listen on both of those addresses.
+
+The D-Bus daemon will serve as bus master as well since it will be the
+process that creates and listens on the AF_BUS socket.
+
+D-Bus clients will use the fixed bus master address (all zero bits) to
+send messages to the D-Bus daemon and the client's unique address to
+send messages to other D-Bus clients using the bus.
+
+When initially connected, D-Bus clients will only be able to
+communicate with the D-Bus daemon and will send authentication
+information (AUTH message and SCM_CREDENTIALS ancillary
+messages). Since the D-Bus daemon is also the bus master, it can allow
+D-Bus clients to join the bus and be able to send and receive D-Bus
+messages from other peers.
+
+On connection, the kernel will assign to each client an address in the
+prefix 0x0000. If a client attempts to send messages to clients other
+than the bus master, this is considered to be an error, and is
+prevented by the kernel.
+
+When the D-Bus daemon has authenticated a client and determined that
+it is authorized to be on this bus, it uses a setsockopt(2) call to
+tell the kernel that this client has permission to send messages. The
+D-Bus daemon then tells the client by sending the Hello() reply that
+it has made the setsockopt(2) call and that now is able to send
+messages to other peers on the bus.
+
+Well-known names are represented by addresses in the 0x0001, ... prefixes.
+
+Addresses in prefix 0x0000 must be mapped to D-Bus unique names in a
+way that can't collide with unique names allocated by the dbus-daemon
+for legacy clients.
+
+In order to be consistent with current D-Bus unique naming, the AF_BUS
+addresses can be mapped directly to D-Bus unique names, for example
+(0000/0000deadbeef to ":0.deadbeef"). Leading zeroes can be suppressed
+since the common case should be relatively small numbers (the kernel
+allocates client addresses sequentially, and machines could be
+rebooted occasionally).
+
+By having both AF_BUS and legacy D-Bus clients use the same address
+space, the D-Bus daemon can act as a proxy between clients and can be
+sure that D-Bus unique names will be unique for both AF_BUS and legacy
+clients.
+
+To act as a proxy between AF_BUS and legacy clients, each time the
+D-Bus daemon accepts a legacy connection (i.e., AF_UNIX), it will
+create an AF_BUS socket and establish a connection with itself. It
+will then associate this newly created connection with the legacy one.
+
+To explain it graphically:
+
+	  L          The AF_BUS listening socket  }
+       /  |  \                                    }-- listener process
+     A1  A2  A3      The AF_BUS accepted sockets  }
+      |   |   |
+     C1  C2  C3      The AF_BUS connected sockets, where:
+      |                    * C1 belongs to the listener process
+      |                    * C2 and C3 belongs to the client processes
+      |
+ L2--A4       The AF_UNIX listening and accepted sockets \
+      |                            in the listener process
+     C4       The AF_UNIX connected socket in the legacy client process
+
+
+where C2 and C3 are normal AF_BUS clients and C4 is a legacy
+client. The D-Bus daemon after accepting the connection using the
+legacy transport (A4), will create an AF_BUS socket pair (C1, A1)
+associated with the legacy client.
+
+Legacy clients will send messages to the D-Bus daemon using their
+legacy socket and the D-Bus daemon will extract the destination
+address, resolve to the corresponding AF_BUS address and use this to
+send the message to the right peer.  
+
+Conversely, when an AF_BUS client sends a D-Bus message to a legacy
+client, it will use the AF_BUS address of the connection associated
+with that client. The D-Bus daemon will receive the message, modify
+the message's content to set SENDER headers based on the AF_BUS source
+address and use the legacy transport to send the D-Bus message to the
+legacy client.
+
+As a special case, the bus daemon's all-zeroes address maps to
+"org.freedesktop.DBus" and vice versa.
+
+When a D-Bus client receives an AF_BUS message from the bus master
+(0/0), it must use the SENDER header field in the D-Bus message, as
+for any other D-Bus transport, to determine whether the message is
+actually from the D-Bus daemon (the SENDER is "org.freedesktop.DBus"
+or missing), or from another client (the SENDER starts with ":"). It
+is valid for messages from another AF_BUS client to be received via
+the D-Bus daemon; if they are, the SENDER header field will always be
+set.
+
+Besides its unique name, D-Bus services can have well-known names such
+as org.gnome.Keyring or org.freedesktop.Telepathy. These well-known
+names can also be used as a D-Bus message destination
+address. Well-known names are not numeric and AF_BUS is not able to
+parse D-Bus messages.
+
+To solve this, the D-Bus daemon will assign an additional AF_BUS
+address to each D-Bus client that owns a well-known name. The mapping
+between well-known names and AF_BUS address is maintained by the D-Bus
+daemon on a persistent data structure.
+
+D-Bus client libraries will maintain a cache of these mappings so they
+can send messages to services with well-known names using their mapped
+AF_BUS address.
+
+If a client intending to send a D-Bus message to a given well-known
+name does not have that well-known name in its cache, it must send the
+AF_BUS message to the listener (0000/000000000000) instead. 
+
+The listener must forward the D-Bus message to the owner of that
+well-known name, setting the SENDER header field if necessary. It may
+also send this AF_BUS-specific D-Bus signal to the sender, so that the
+sender can update its cache:
+
+     org.freedesktop.DBus.AF_BUS.Forwarded (STRING well_known_name,
+	 UINT64 af_bus_client)
+
+	 Emitted by the D-Bus daemon with sender "org.freedesktop.DBus"
+	 and object path "/org/freedesktop/DBus" to indicate that
+	 the well-known name well_known_name is represented by the
+	 AF_BUS address { AF_BUS, af_bus_client, path } where
+	 path is the path name used by this bus.
+
+	 For instance, if the well-known name "org.gnome.Keyring"
+	 is represented by AF_BUS address 0001/0000deadbeef,
+	 the signal would have arguments ("org.gnome.Keyring",
+	 0x00010000deadbeef), corresponding to the AF_BUS
+	 address { AF_BUS, 0x00010000deadbeef, /tmp/test }.
+
+If the D-Bus service for that well-known name is not active, then the
+D-Bus daemon will first do the service activation, assign an
+additional address to the recently activated service, store the
+well-known service to numeric address mapping on its persistent cache,
+and then send the AF_BUS.Forwarded signal back to the client.
+
+Once the mapping has been made, the AF_BUS address associated with a
+well-known name cannot be reused for the lifetime of the D-Bus daemon
+(which is the same as the lifetime of the socket). 
+
+Nevertheless the AF_BUS address associated with a well-known name can
+change, for example if a service goes away and a new instance gets
+activated. This new instance can have a different AF_BUS address.  The
+D-Bus daemon will maintain a list of the mappings that are currently
+valid so it can send the AF_BUS.
+
+Forwarded signal with the mapping information to the clients. Client
+libraries will maintain a fixed-size Last Recently Used (LRU) cache
+with previous mappings sent by the D-Bus daemon.
+
+If the clients overwrite a mapping due to the LRU replace policy and
+later want to send a D-Bus message to the overwritten well-known name,
+they will send the D-Bus message back to the D-Bus daemon and this
+will send the signal with the mapping information. 
+
+If a service goes away or if the service AF_BUS address changed and
+the client still has the old AF_BUS address in its cache, it will send
+the D-Bus message to the old destination. 
+
+Since packets whose destination AF_BUS addresses are not assigned to
+any process are routed by default to the bus master, the D-Bus daemon
+will receive these D-bus messages and send an AF_BUS.
+
+Forwarded signal back to the client with the new AF_BUS address so it
+can update its cache with the new mapping.
+
+For well-known names, the D-Bus daemon will use a different address
+prefix (0x0001) so it doesn't conflict with the D-Bus unique names
+address prefix (0x0000).
+
+Besides D-Bus method call messages which are unicast, D-Bus allows
+clients to send multicast messages (D-Bus signals). Clients can send
+signals messages using the bus unique name prefix multicast address
+(0x0001ffffffffffff).
+
+A netfilter hook is used to filter these multicast messages and only
+deliver to the correct peers based on match rules.
+
+
+D-Bus aware netfilter module
+----------------------------
+
+AF_BUS is designed to be a generic bus transport supporting both
+unicast and multicast communications.
+
+In order for D-Bus to operate efficiently, the transport method has to
+know the D-Bus message wire-protocol and D-Bus message structure. But
+adding this D-Bus specific knowledge to AF_BUS will break one of the
+fundamental design principles of any network protocol stack, namely
+layer-independence: layer n must not make any assumptions about the
+payload in layer n + 1.
+
+So, in order to have a clean protocol design but be able to allow the
+transport to analyze the D-Bus messages, netfilter hooks are used to
+do the filtering based on match rules.
+
+The kernel module has to maintain the match rules and the D-Bus daemon
+is responsible for managing this information. Every time an add match
+rule message is processed by the D-Bus daemon, this will update the
+netfilter module match rules set so the netfilter hook function can
+use that information to do the match rules based filtering.
+
+The D-Bus daemon and the netfilter module will use the generic netlink
+subsystem to do the kernel-to-user-space communication. Netlink is
+already used by most of the networking subsystem in Linux
+(iptables/netfilter, ip/routing, etc).
+
+We enforce a security scheme so only the bus master's user ID can
+update the netfilter module match rules set.
+
+The advantage of using the netfilter subsystem is that we decouple the
+mechanism from the policy. AF_BUS will only add a set of hook points
+and external modules will be used to enforce a given policy.
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 03/15] net: bus: Add AF_BUS socket and address definitions
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 01/15] net: bus: Add " Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 02/15] net: bus: Add documentation for AF_BUS Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets Vincent Sanders
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

An AF_BUS socket address is made up of a path component and a numeric
component. The path component is either a pathname or an abstract
socket similar to a unix socket. The numeric component is used to
uniquely identify each connection to the bus. Thus the path identifies
a specific bus and the numeric component the attachment to that bus.

The numeric component of the address is a 64-bit unsigned integer,
interpreted by splitting the into two parts: the most significant 16
bits are a prefix identifying the type of address, and the remaining
48 bits are the actual client address within that prefix, as shown in
this figure:

Bit:  0             15 16                                            63
     +----------------+------------------------------------------------+
     |  Type prefix   |                Client address                  |
     +----------------+------------------------------------------------+

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 include/linux/bus.h  |   34 +++++++
 include/net/af_bus.h |  272 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 306 insertions(+)
 create mode 100644 include/linux/bus.h
 create mode 100644 include/net/af_bus.h

diff --git a/include/linux/bus.h b/include/linux/bus.h
new file mode 100644
index 0000000..19cac36
--- /dev/null
+++ b/include/linux/bus.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_BUS_H
+#define _LINUX_BUS_H
+
+#include <linux/socket.h>
+
+/* 'protocol' to use in socket(AF_BUS, SOCK_SEQPACKET, protocol) */
+#define BUS_PROTO_NONE	0
+#define BUS_PROTO_DBUS	1
+#define BUS_PROTO_MAX	1
+
+#define BUS_PATH_MAX	108
+
+/**
+ * struct bus_addr - af_bus address
+ * @s_addr: an af_bus address (16-bit prefix + 48-bit client address)
+ */
+struct bus_addr {
+	u64 s_addr;
+};
+
+
+/**
+ * struct sockaddr_bus - af_bus socket address
+ * @sbus_family: the socket address family
+ * @sbus_addr: an af_bus address
+ * @sbus_path: a path name
+ */
+struct sockaddr_bus {
+	__kernel_sa_family_t sbus_family;
+	struct bus_addr      sbus_addr;
+	char sbus_path[BUS_PATH_MAX];
+};
+
+#endif /* _LINUX_BUS_H */
diff --git a/include/net/af_bus.h b/include/net/af_bus.h
new file mode 100644
index 0000000..19bd7ac
--- /dev/null
+++ b/include/net/af_bus.h
@@ -0,0 +1,272 @@
+/*
+ * Copyright (c) 2012, GENIVI Alliance
+ *
+ * Authors:	Javier Martinez Canillas, <javier.martinez@collabora.co.uk>
+ *              Alban Crequy, <alban.crequy@collabora.co.uk>
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Based on BSD Unix domain sockets (net/unix).
+ */
+
+#ifndef __LINUX_NET_AFBUS_H
+#define __LINUX_NET_AFBUS_H
+
+#include <linux/socket.h>
+#include <linux/bus.h>
+#include <linux/mutex.h>
+#include <net/sock.h>
+#include <net/tcp_states.h>
+
+extern void bus_inflight(struct file *fp);
+extern void bus_notinflight(struct file *fp);
+extern void bus_gc(void);
+extern void wait_for_bus_gc(void);
+extern struct sock *bus_get_socket(struct file *filp);
+extern struct sock *bus_peer_get(struct sock *);
+
+#define BUS_HASH_SIZE	256
+
+extern spinlock_t bus_address_lock;
+extern struct hlist_head bus_address_table[BUS_HASH_SIZE];
+
+#define BUS_MAX_QLEN    10
+#define BUS_MASTER_ADDR 0x0
+#define BUS_PREFIX_BITS 16
+#define BUS_CLIENT_BITS 48
+#define BUS_PREFIX_MASK 0xffff000000000000
+#define BUS_CLIENT_MASK 0x0000ffffffffffff
+
+/* AF_BUS socket options */
+#define BUS_ADD_ADDR 1
+#define BUS_JOIN_BUS 2
+#define BUS_DEL_ADDR 3
+#define BUS_SET_EAVESDROP 4
+#define BUS_UNSET_EAVESDROP 5
+#define BUS_SET_SENDBUF 6
+#define BUS_SET_MAXQLEN 7
+
+/* Connection and socket states */
+enum {
+	BUS_ESTABLISHED = TCP_ESTABLISHED,
+	BUS_CLOSE = TCP_CLOSE,
+	BUS_LISTEN = TCP_LISTEN,
+	BUS_MAX_STATES
+};
+
+#define NF_BUS_SENDING 1
+
+extern unsigned int bus_tot_inflight;
+extern spinlock_t bus_table_lock;
+extern struct hlist_head bus_socket_table[BUS_HASH_SIZE + 1];
+
+/**
+ * struct bus_address - an af_bus address associated with an af_bus sock
+ * @refcnt: address reference counter
+ * @len: address length
+ * @hash: address hash value
+ * @addr_node: member of struct bus_sock.addr_list
+ * @table_node: member of struct hlist_head bus_address_table[hash]
+ * @sock: the af_bus sock that owns this address
+ * @name: the socket address for this address
+ */
+struct bus_address {
+	atomic_t	refcnt;
+	int		len;
+	unsigned	hash;
+	struct hlist_node addr_node;
+	struct hlist_node table_node;
+	struct sock  *sock;
+	struct sockaddr_bus name[0];
+};
+
+/**
+ * struct bus_send_context - sending context for an socket buffer
+ * @sender_socket: the sender socket associated with this sk_buff
+ * @siocb: used to send ancillary data
+ * @timeo: sending timeout
+ * @max_level: file descriptor passing maximum recursion level
+ * @namelen: length of socket address name
+ * @hash: socket name hash value
+ * @other: destination sock
+ * @sender: sender socket address name
+ * @recipient: recipient socket address name
+ * @authenticated: flag whether the sock already joined the bus
+ * @bus_master_side: flag whether the sock is an accepted socket
+ * @to_master: flag whether the destination is the bus master
+ * @multicast: flag whether the destination is a multicast address
+ * @deliver: flag whether the skb has to be delivered
+ * @eavesdropper: flag whether the sock is allowed to eavesdrop
+ * @main_recipient: flag whether the sock is the main recipient
+ */
+struct bus_send_context {
+	struct socket *sender_socket;
+	struct sock_iocb *siocb;
+	long timeo;
+	int max_level;
+	int namelen;
+	unsigned hash;
+	struct sock *other;
+	struct sockaddr_bus	*sender;
+	struct sockaddr_bus	*recipient;
+	unsigned int		authenticated:1;
+	unsigned int		bus_master_side:1;
+	unsigned int		to_master:1;
+	unsigned int		multicast:1;
+	unsigned int            deliver:1;
+	unsigned int            eavesdropper:1;
+	unsigned int            main_recipient:1;
+};
+
+/**
+ * struct bus_skb_parms - socket buffer parameters
+ * @pid: process id
+ * @cred: skb credentials
+ * @fp: passed file descriptors
+ * @secid: security id
+ * @sendctx: skb sending context
+ */
+struct bus_skb_parms {
+	struct pid		*pid;
+	const struct cred	*cred;
+	struct scm_fp_list	*fp;
+#ifdef CONFIG_SECURITY_NETWORK
+	u32			secid;
+#endif
+	struct bus_send_context	*sendctx;
+};
+
+#define BUSCB(skb)      (*(struct bus_skb_parms *)&((skb)->cb))
+#define BUSSID(skb)     (&BUSCB((skb)).secid)
+
+#define bus_state_lock(s)	spin_lock(&bus_sk(s)->lock)
+#define bus_state_unlock(s)	spin_unlock(&bus_sk(s)->lock)
+#define bus_state_lock_nested(s) \
+				spin_lock_nested(&bus_sk(s)->lock, \
+				SINGLE_DEPTH_NESTING)
+
+/**
+ * struct bus - a communication bus
+ * @master: the bus master sock
+ * @peers: list of struct bus_sock.bus_node allowed to join the bus
+ * @lock: protect peers concurrent access
+ * @send_lock: enforce atomic multicast delivery
+ * @kref: bus reference counter
+ * @addr_cnt: address number counter to assign prefix 0x0000 addresses
+ * @eavesdropper_cnt: eavesdroppers counter
+ */
+struct bus {
+	struct sock		*master;
+	struct hlist_head       peers;
+	spinlock_t		lock;
+	spinlock_t		send_lock;
+	struct kref             kref;
+	atomic64_t              addr_cnt;
+	atomic64_t              eavesdropper_cnt;
+};
+
+/**
+ * struct bus_sock - an af_bus socket
+ * @sk: associated sock
+ * @addr: sock principal address
+ * @addr_list: list of struct bus_address.addr_node
+ * @path: sock path name
+ * @readlock: protect from concurrent reading
+ * @peer: peer sock
+ * @other: the listening sock
+ * @link: list of candidates for garbage collection
+ * @inflight: number of times the file descriptor is in flight
+ * @lock: protect the sock from concurrent access
+ * @gc_candidate: flag whether the is a candidate for gc
+ * @gc_maybe_cycle: flag whether could be a cyclic reference
+ * @recursion_level: file passing current recursion level
+ * @peer_wq: peer sock wait queue
+ * @bus: bus that this sock belongs to
+ * @bus_master: flag whether the sock is the bus master
+ * @bus_master_side: flag whether is an accepted socket
+ * @authenticated: flag whether the sock joined the bus
+ * @eavesdropper: flag whether the sock is allowed to eavesdrop
+ * @bus_node: member of struct bus.peers list of joined socks
+ */
+struct bus_sock {
+	/* WARNING: sk has to be the first member */
+	struct sock		sk;
+	struct bus_address     *addr;
+	struct hlist_head       addr_list;
+	struct path		path;
+	struct mutex		readlock;
+	struct sock		*peer;
+	struct sock		*other;
+	struct list_head	link;
+	atomic_long_t		inflight;
+	spinlock_t		lock;
+	unsigned int		gc_candidate:1;
+	unsigned int		gc_maybe_cycle:1;
+	unsigned char		recursion_level;
+	struct socket_wq	peer_wq;
+	struct bus              *bus;
+	bool                    bus_master;
+	bool                    bus_master_side;
+	bool                    authenticated;
+	bool                    eavesdropper;
+	struct hlist_node	bus_node;
+};
+#define bus_sk(__sk) ((struct bus_sock *)__sk)
+
+#define peer_wait peer_wq.wait
+
+/**
+ * bus_same_bus - Test if two socket address belongs to the same bus
+ * @sbusaddr1: socket address name
+ * @sbusaddr2: socket address name
+ */
+static inline bool bus_same_bus(struct sockaddr_bus *sbusaddr1,
+				struct sockaddr_bus *sbusaddr2)
+{
+	int offset;
+
+	if (sbusaddr1->sbus_path[0] != sbusaddr2->sbus_path[0])
+		return false;
+
+	/*
+	 * abstract path names start with a null byte character,
+	 * so they have to be compared starting at the second char.
+	 */
+	offset = (sbusaddr1->sbus_path[0] == '\0');
+
+	return !strncmp(sbusaddr1->sbus_path + offset,
+		       sbusaddr2->sbus_path + offset,
+		       BUS_PATH_MAX);
+}
+
+static inline unsigned int bus_hash_fold(__wsum n)
+{
+	unsigned int hash = (__force unsigned int)n;
+	hash ^= hash>>16;
+	hash ^= hash>>8;
+	return hash&(BUS_HASH_SIZE-1);
+}
+
+static inline unsigned int bus_compute_hash(struct bus_addr addr)
+{
+	return bus_hash_fold(csum_partial((void *)&addr, sizeof(addr), 0));
+}
+
+long bus_inq_len(struct sock *sk);
+long bus_outq_len(struct sock *sk);
+
+#ifdef CONFIG_SYSCTL
+extern int bus_sysctl_register(struct net *net);
+extern void bus_sysctl_unregister(struct net *net);
+#else
+static inline int bus_sysctl_register(struct net *net) { return 0; }
+static inline void bus_sysctl_unregister(struct net *net) {}
+#endif
+
+bool bus_can_write(struct net *net, struct sockaddr_bus *addr, int len,
+		   int protocol);
+
+#endif /* __LINUX_NET_AFBUS_H */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (2 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 03/15] net: bus: Add AF_BUS socket and address definitions Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-07-09  3:32   ` James Morris
  2012-07-09 18:02   ` Paul Moore
  2012-06-29 16:45 ` [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks Vincent Sanders
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

AF_BUS implements a security hook bus_connect() to be used by LSM to
enforce connectivity security policies.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 include/linux/security.h |   11 +++++++++++
 security/capability.c    |    7 +++++++
 security/security.c      |    7 +++++++
 3 files changed, 25 insertions(+)

diff --git a/include/linux/security.h b/include/linux/security.h
index 4e5a73c..d30dc4a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1578,6 +1578,8 @@ struct security_operations {
 
 #ifdef CONFIG_SECURITY_NETWORK
 	int (*unix_stream_connect) (struct sock *sock, struct sock *other, struct sock *newsk);
+	int (*bus_connect) (struct sock *sock, struct sock *other,
+			    struct sock *newsk);
 	int (*unix_may_send) (struct socket *sock, struct socket *other);
 
 	int (*socket_create) (int family, int type, int protocol, int kern);
@@ -2519,6 +2521,8 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32
 #ifdef CONFIG_SECURITY_NETWORK
 
 int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk);
+int security_bus_connect(struct sock *sock, struct sock *other,
+			 struct sock *newsk);
 int security_unix_may_send(struct socket *sock,  struct socket *other);
 int security_socket_create(int family, int type, int protocol, int kern);
 int security_socket_post_create(struct socket *sock, int family,
@@ -2566,6 +2570,13 @@ static inline int security_unix_stream_connect(struct sock *sock,
 	return 0;
 }
 
+static inline int security_bus_connect(struct socket *sock,
+				       struct sock *other,
+				       struct sock *newsk)
+{
+	return 0;
+}
+
 static inline int security_unix_may_send(struct socket *sock,
 					 struct socket *other)
 {
diff --git a/security/capability.c b/security/capability.c
index 61095df..ea57f2b 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -563,6 +563,12 @@ static int cap_unix_may_send(struct socket *sock, struct socket *other)
 	return 0;
 }
 
+static int cap_bus_connect(struct sock *sock, struct sock *other,
+			   struct sock *newsk)
+{
+	return 0;
+}
+
 static int cap_socket_create(int family, int type, int protocol, int kern)
 {
 	return 0;
@@ -1016,6 +1022,7 @@ void __init security_fixup_ops(struct security_operations *ops)
 #ifdef CONFIG_SECURITY_NETWORK
 	set_to_cap_if_null(ops, unix_stream_connect);
 	set_to_cap_if_null(ops, unix_may_send);
+	set_to_cap_if_null(ops, bus_connect);
 	set_to_cap_if_null(ops, socket_create);
 	set_to_cap_if_null(ops, socket_post_create);
 	set_to_cap_if_null(ops, socket_bind);
diff --git a/security/security.c b/security/security.c
index 3efc9b1..00ab7df 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1059,6 +1059,13 @@ int security_unix_may_send(struct socket *sock,  struct socket *other)
 }
 EXPORT_SYMBOL(security_unix_may_send);
 
+int security_bus_connect(struct sock *sock, struct sock *other,
+				struct sock *newsk)
+{
+	return security_ops->bus_connect(sock, other, newsk);
+}
+EXPORT_SYMBOL(security_bus_connect);
+
 int security_socket_create(int family, int type, int protocol, int kern)
 {
 	return security_ops->socket_create(family, type, protocol, kern);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (3 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-07-09 18:38   ` Paul Moore
  2012-06-29 16:45 ` [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family Vincent Sanders
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

Add Security-Enhanced Linux (SELinux) hook for AF_BUS socket address family.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 security/selinux/hooks.c |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4ee6f23..5bacbe2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -67,6 +67,7 @@
 #include <linux/quota.h>
 #include <linux/un.h>		/* for Unix socket types */
 #include <net/af_unix.h>	/* for Unix socket types */
+#include <net/af_bus.h>	/* for Bus socket types */
 #include <linux/parser.h>
 #include <linux/nfs_mount.h>
 #include <net/ipv6.h>
@@ -4101,6 +4102,39 @@ static int selinux_socket_unix_may_send(struct socket *sock,
 			    &ad);
 }
 
+static int selinux_socket_bus_connect(struct sock *sock, struct sock *other,
+				      struct sock *newsk)
+{
+	struct sk_security_struct *sksec_sock = sock->sk_security;
+	struct sk_security_struct *sksec_other = other->sk_security;
+	struct sk_security_struct *sksec_new = newsk->sk_security;
+	struct common_audit_data ad;
+	struct lsm_network_audit net = {0,};
+	int err;
+
+	ad.type = LSM_AUDIT_DATA_NET;
+	ad.u.net = &net;
+	ad.u.net->sk = other;
+
+	err = avc_has_perm(sksec_sock->sid, sksec_other->sid,
+			   sksec_other->sclass,
+			   UNIX_STREAM_SOCKET__CONNECTTO, &ad);
+	if (err)
+		return err;
+
+	/* server child socket */
+	sksec_new->peer_sid = sksec_sock->sid;
+	err = security_sid_mls_copy(sksec_other->sid, sksec_sock->sid,
+				    &sksec_new->sid);
+	if (err)
+		return err;
+
+	/* connecting socket */
+	sksec_sock->peer_sid = sksec_new->sid;
+
+	return 0;
+}
+
 static int selinux_inet_sys_rcv_skb(int ifindex, char *addrp, u16 family,
 				    u32 peer_sid,
 				    struct common_audit_data *ad)
@@ -5643,6 +5677,7 @@ static struct security_operations selinux_ops = {
 
 	.unix_stream_connect =		selinux_socket_unix_stream_connect,
 	.unix_may_send =		selinux_socket_unix_may_send,
+	.bus_connect =		        selinux_socket_bus_connect,
 
 	.socket_create =		selinux_socket_create,
 	.socket_post_create =		selinux_socket_post_create,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (4 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-07-01  2:15   ` Jan Engelhardt
  2012-06-29 16:45 ` [PATCH net-next 07/15] scm: allow AF_BUS sockets to send ancillary data Vincent Sanders
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

AF_BUS sockets add a netfilter NF_HOOK() on the packet sending path.
This allows packet to be mangled by registered netfilter hooks.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 include/linux/netfilter.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index c613cf0..0698924 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -67,6 +67,7 @@ enum {
 	NFPROTO_BRIDGE =  7,
 	NFPROTO_IPV6   = 10,
 	NFPROTO_DECNET = 12,
+	NFPROTO_BUS,
 	NFPROTO_NUMPROTO,
 };
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 07/15] scm: allow AF_BUS sockets to send ancillary data
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (5 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 08/15] net: bus: Add implementation of Bus domain sockets Vincent Sanders
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

Similar to UNIX domain sockets AF_BUS sockets support passing file
descriptors and process credentials which requires supporting passing
control messages.

The core socket level control messages processing requires extending
to allow sockets other than PF_UNIX to send SCM_RIGHTS type messages.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 net/core/scm.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 611c5ef..87e3152 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -158,7 +158,8 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 		switch (cmsg->cmsg_type)
 		{
 		case SCM_RIGHTS:
-			if (!sock->ops || sock->ops->family != PF_UNIX)
+			if (!sock->ops || (sock->ops->family != PF_UNIX &&
+					   sock->ops->family != PF_BUS))
 				goto error;
 			err=scm_fp_copy(cmsg, &p->fp);
 			if (err<0)
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 08/15] net: bus: Add implementation of Bus domain sockets
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (6 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 07/15] scm: allow AF_BUS sockets to send ancillary data Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets Vincent Sanders
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

This is the core impolementation of the AF_BUS socket family its
design and operation are fully covered in
Documentation/networking/af_bus.txt

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 net/bus/af_bus.c | 2629 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2629 insertions(+)
 create mode 100644 net/bus/af_bus.c

diff --git a/net/bus/af_bus.c b/net/bus/af_bus.c
new file mode 100644
index 0000000..0b79754
--- /dev/null
+++ b/net/bus/af_bus.c
@@ -0,0 +1,2629 @@
+/*
+ * Implementation of Bus domain sockets.
+ *
+ * Copyright (c) 2012, GENIVI Alliance
+ *
+ * Authors:	Javier Martinez Canillas <javier.martinez@collabora.co.uk>
+ *              Alban Crequy <alban.crequy@collabora.co.uk>
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Based on BSD Unix domain sockets (net/unix).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/stat.h>
+#include <linux/dcache.h>
+#include <linux/namei.h>
+#include <linux/socket.h>
+#include <linux/bus.h>
+#include <linux/fcntl.h>
+#include <linux/termios.h>
+#include <linux/sockios.h>
+#include <linux/net.h>
+#include <linux/in.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+#include <net/af_bus.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <net/scm.h>
+#include <linux/init.h>
+#include <linux/poll.h>
+#include <linux/rtnetlink.h>
+#include <linux/mount.h>
+#include <net/checksum.h>
+#include <linux/security.h>
+
+struct hlist_head bus_socket_table[BUS_HASH_SIZE + 1];
+EXPORT_SYMBOL_GPL(bus_socket_table);
+struct hlist_head bus_address_table[BUS_HASH_SIZE];
+EXPORT_SYMBOL_GPL(bus_address_table);
+DEFINE_SPINLOCK(bus_table_lock);
+DEFINE_SPINLOCK(bus_address_lock);
+EXPORT_SYMBOL_GPL(bus_address_lock);
+static atomic_long_t bus_nr_socks;
+
+#define bus_sockets_unbound	(&bus_socket_table[BUS_HASH_SIZE])
+
+#define BUS_ABSTRACT(sk)	(bus_sk(sk)->addr->hash != BUS_HASH_SIZE)
+
+#ifdef CONFIG_SECURITY_NETWORK
+static void bus_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
+{
+	memcpy(BUSSID(skb), &scm->secid, sizeof(u32));
+}
+
+static inline void bus_set_secdata(struct scm_cookie *scm, struct sk_buff *skb)
+{
+	scm->secid = *BUSSID(skb);
+}
+#else
+static inline void bus_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
+{ }
+
+static inline void bus_set_secdata(struct scm_cookie *scm, struct sk_buff *skb)
+{ }
+#endif /* CONFIG_SECURITY_NETWORK */
+
+/*
+ *  SMP locking strategy:
+ *    bus_socket_table hash table is protected with spinlock bus_table_lock
+ *    bus_address_table hash table is protected with spinlock bus_address_lock
+ *    each bus is protected by a separate spin lock.
+ *    multicast atomic sending is protected by a separate spin lock.
+ *    each socket state is protected by a separate spin lock.
+ *    each socket address is protected by a separate spin lock.
+ *
+ *  When holding more than one lock, use the following hierarchy:
+ *  - bus_table_lock.
+ *  - bus_address_lock.
+ *  - socket lock.
+ *  - bus lock.
+ *  - bus send_lock.
+ *  - sock address lock.
+ */
+
+#define bus_peer(sk) (bus_sk(sk)->peer)
+
+static inline int bus_our_peer(struct sock *sk, struct sock *osk)
+{
+	return bus_peer(osk) == sk;
+}
+
+static inline int bus_recvq_full(struct sock const *sk)
+{
+	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
+}
+
+static inline u16 bus_addr_prefix(struct sockaddr_bus *busaddr)
+{
+	return (busaddr->sbus_addr.s_addr & BUS_PREFIX_MASK) >> BUS_CLIENT_BITS;
+}
+
+static inline u64 bus_addr_client(struct sockaddr_bus *sbusaddr)
+{
+	return sbusaddr->sbus_addr.s_addr & BUS_CLIENT_MASK;
+}
+
+static inline bool bus_mc_addr(struct sockaddr_bus *sbusaddr)
+{
+	return bus_addr_client(sbusaddr) == BUS_CLIENT_MASK;
+}
+
+struct sock *bus_peer_get(struct sock *s)
+{
+	struct sock *peer;
+
+	bus_state_lock(s);
+	peer = bus_peer(s);
+	if (peer)
+		sock_hold(peer);
+	bus_state_unlock(s);
+	return peer;
+}
+EXPORT_SYMBOL_GPL(bus_peer_get);
+
+static inline void bus_release_addr(struct bus_address *addr)
+{
+	if (atomic_dec_and_test(&addr->refcnt))
+		kfree(addr);
+}
+
+/*
+ *	Check bus socket name:
+ *		- should be not zero length.
+ *	        - if started by not zero, should be NULL terminated (FS object)
+ *		- if started by zero, it is abstract name.
+ */
+
+static int bus_mkname(struct sockaddr_bus *sbusaddr, int len,
+		      unsigned int *hashp)
+{
+	int offset = (sbusaddr->sbus_path[0] == '\0');
+
+	if (len <= sizeof(short) || len > sizeof(*sbusaddr))
+		return -EINVAL;
+	if (!sbusaddr || sbusaddr->sbus_family != AF_BUS)
+		return -EINVAL;
+
+	len = strnlen(sbusaddr->sbus_path + offset, BUS_PATH_MAX) + 1 +
+		sizeof(__kernel_sa_family_t) +
+		sizeof(struct bus_addr);
+
+	*hashp = bus_compute_hash(sbusaddr->sbus_addr);
+	return len;
+}
+
+static void __bus_remove_address(struct bus_address *addr)
+{
+	hlist_del(&addr->table_node);
+}
+
+static void __bus_insert_address(struct hlist_head *list,
+				 struct bus_address *addr)
+{
+	hlist_add_head(&addr->table_node, list);
+}
+
+static inline void bus_remove_address(struct bus_address *addr)
+{
+	spin_lock(&bus_address_lock);
+	__bus_remove_address(addr);
+	spin_unlock(&bus_address_lock);
+}
+
+static inline void bus_insert_address(struct hlist_head *list,
+				      struct bus_address *addr)
+{
+	spin_lock(&bus_address_lock);
+	__bus_insert_address(list, addr);
+	spin_unlock(&bus_address_lock);
+}
+
+static void __bus_remove_socket(struct sock *sk)
+{
+	sk_del_node_init(sk);
+}
+
+static void __bus_insert_socket(struct hlist_head *list, struct sock *sk)
+{
+	WARN_ON(!sk_unhashed(sk));
+	sk_add_node(sk, list);
+}
+
+static inline void bus_remove_socket(struct sock *sk)
+{
+	spin_lock(&bus_table_lock);
+	__bus_remove_socket(sk);
+	spin_unlock(&bus_table_lock);
+}
+
+static inline void bus_insert_socket(struct hlist_head *list, struct sock *sk)
+{
+	spin_lock(&bus_table_lock);
+	__bus_insert_socket(list, sk);
+	spin_unlock(&bus_table_lock);
+}
+
+static inline bool __bus_has_prefix(struct sock *sk, u16 prefix)
+{
+	struct bus_sock *u = bus_sk(sk);
+	struct bus_address *addr;
+	struct hlist_node *node;
+	bool ret = false;
+
+	hlist_for_each_entry(addr, node, &u->addr_list, addr_node) {
+		if (bus_addr_prefix(addr->name) == prefix)
+			ret = true;
+	}
+
+	return ret;
+}
+
+static inline bool bus_has_prefix(struct sock *sk, u16 prefix)
+{
+	bool ret;
+
+	bus_state_lock(sk);
+	ret = __bus_has_prefix(sk, prefix);
+	bus_state_unlock(sk);
+
+	return ret;
+}
+
+static inline bool __bus_eavesdropper(struct sock *sk, u16 condition)
+{
+	struct bus_sock *u = bus_sk(sk);
+
+	return u->eavesdropper;
+}
+
+static inline bool bus_eavesdropper(struct sock *sk, u16 condition)
+{
+	bool ret;
+
+	bus_state_lock(sk);
+	ret = __bus_eavesdropper(sk, condition);
+	bus_state_unlock(sk);
+
+	return ret;
+}
+
+static inline bool bus_has_prefix_eavesdropper(struct sock *sk, u16 prefix)
+{
+	bool ret;
+
+	bus_state_lock(sk);
+	ret = __bus_has_prefix(sk, prefix) || __bus_eavesdropper(sk, 0);
+	bus_state_unlock(sk);
+
+	return ret;
+}
+
+static inline struct bus_address *__bus_get_address(struct sock *sk,
+						    struct bus_addr *sbus_addr)
+{
+	struct bus_sock *u = bus_sk(sk);
+	struct bus_address *addr = NULL;
+	struct hlist_node *node;
+
+	hlist_for_each_entry(addr, node, &u->addr_list, addr_node) {
+		if (addr->name->sbus_addr.s_addr == sbus_addr->s_addr)
+			return addr;
+	}
+
+	return NULL;
+}
+
+static inline struct bus_address *bus_get_address(struct sock *sk,
+						  struct bus_addr *sbus_addr)
+{
+	struct bus_address *addr;
+
+	bus_state_lock(sk);
+	addr = __bus_get_address(sk, sbus_addr);
+	bus_state_unlock(sk);
+
+	return addr;
+}
+
+static struct sock *__bus_find_socket_byname(struct net *net,
+					     struct sockaddr_bus *sbusname,
+					     int len, unsigned int hash)
+{
+	struct sock *s;
+	struct hlist_node *node;
+
+	sk_for_each(s, node, &bus_socket_table[hash]) {
+		struct bus_sock *u = bus_sk(s);
+
+		if (!net_eq(sock_net(s), net))
+			continue;
+
+		if (u->addr->len == len &&
+		    !memcmp(u->addr->name, sbusname, len))
+			return s;
+	}
+
+	return NULL;
+}
+
+static inline struct sock *bus_find_socket_byname(struct net *net,
+						  struct sockaddr_bus *sbusname,
+						  int len, unsigned int hash)
+{
+	struct sock *s;
+
+	spin_lock(&bus_table_lock);
+	s = __bus_find_socket_byname(net, sbusname, len, hash);
+	if (s)
+		sock_hold(s);
+	spin_unlock(&bus_table_lock);
+	return s;
+}
+
+static struct sock *__bus_find_socket_byaddress(struct net *net,
+						struct sockaddr_bus *sbusname,
+						int len, int protocol,
+						unsigned int hash)
+{
+	struct sock *s;
+	struct bus_address *addr;
+	struct hlist_node *node;
+	struct bus_sock *u;
+	int offset = (sbusname->sbus_path[0] == '\0');
+	int path_len = strnlen(sbusname->sbus_path + offset, BUS_PATH_MAX);
+
+	len = path_len + 1 + sizeof(__kernel_sa_family_t) +
+	      sizeof(struct bus_addr);
+
+	hlist_for_each_entry(addr, node, &bus_address_table[hash],
+			     table_node) {
+		s = addr->sock;
+		u = bus_sk(s);
+
+		if (s->sk_protocol != protocol)
+			continue;
+
+		if (!net_eq(sock_net(s), net))
+			continue;
+
+		if (addr->len == len &&
+		    addr->name->sbus_family == sbusname->sbus_family &&
+		    addr->name->sbus_addr.s_addr == sbusname->sbus_addr.s_addr
+		    && bus_same_bus(addr->name, sbusname))
+			goto found;
+	}
+	s = NULL;
+found:
+	return s;
+}
+
+static inline struct sock *bus_find_socket_byaddress(struct net *net,
+						     struct sockaddr_bus *name,
+						     int len, int protocol,
+						     unsigned int hash)
+{
+	struct sock *s;
+
+	spin_lock(&bus_address_lock);
+	s = __bus_find_socket_byaddress(net, name, len, protocol, hash);
+	if (s)
+		sock_hold(s);
+	spin_unlock(&bus_address_lock);
+	return s;
+}
+
+static inline int bus_writable(struct sock *sk)
+{
+	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+}
+
+static void bus_write_space(struct sock *sk)
+{
+	struct bus_sock *u = bus_sk(sk);
+	struct bus_sock *p;
+	struct hlist_node *node;
+	struct socket_wq *wq;
+
+	if (bus_writable(sk)) {
+		rcu_read_lock();
+		wq = rcu_dereference(sk->sk_wq);
+		if (wq_has_sleeper(wq))
+			wake_up_interruptible_sync_poll(&wq->wait,
+				POLLOUT | POLLWRNORM | POLLWRBAND);
+		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
+		rcu_read_unlock();
+
+		if (u && u->bus) {
+			spin_lock(&u->bus->lock);
+			hlist_for_each_entry(p, node, &u->bus->peers,
+					     bus_node) {
+				wake_up_interruptible_sync_poll(sk_sleep(&p->sk),
+								POLLOUT |
+								POLLWRNORM |
+								POLLWRBAND);
+				sk_wake_async(&p->sk, SOCK_WAKE_SPACE,
+					      POLL_OUT);
+			}
+			spin_unlock(&u->bus->lock);
+		}
+	}
+}
+
+static void bus_bus_release(struct kref *kref)
+{
+	struct bus *bus;
+
+	bus = container_of(kref, struct bus, kref);
+
+	kfree(bus);
+}
+
+static void bus_sock_destructor(struct sock *sk)
+{
+	struct bus_sock *u = bus_sk(sk);
+
+	skb_queue_purge(&sk->sk_receive_queue);
+
+	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(!sk_unhashed(sk));
+	WARN_ON(sk->sk_socket);
+	if (!sock_flag(sk, SOCK_DEAD)) {
+		pr_info("Attempt to release alive bus socket: %p\n", sk);
+		return;
+	}
+
+	if (u->bus) {
+		kref_put(&u->bus->kref, bus_bus_release);
+		u->bus = NULL;
+	}
+
+	atomic_long_dec(&bus_nr_socks);
+	local_bh_disable();
+	sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
+	local_bh_enable();
+#ifdef BUS_REFCNT_DEBUG
+	pr_debug("BUS %p is destroyed, %ld are still alive.\n", sk,
+		 atomic_long_read(&bus_nr_socks));
+#endif
+}
+
+static int bus_release_sock(struct sock *sk, int embrion)
+{
+	struct bus_sock *u = bus_sk(sk);
+	struct path path;
+	struct sock *skpair;
+	struct sk_buff *skb;
+	int state;
+	struct bus_address *addr;
+	struct hlist_node *node, *tmp;
+
+	bus_remove_socket(sk);
+
+	if (u->bus && u->authenticated &&
+	    !u->bus_master && !u->bus_master_side) {
+		spin_lock(&u->bus->lock);
+		hlist_del(&u->bus_node);
+		if (u->eavesdropper)
+			atomic64_dec(&u->bus->eavesdropper_cnt);
+		spin_unlock(&u->bus->lock);
+	}
+
+	/* Clear state */
+	bus_state_lock(sk);
+	sock_orphan(sk);
+	sk->sk_shutdown = SHUTDOWN_MASK;
+	path	     = u->path;
+	u->path.dentry = NULL;
+	u->path.mnt = NULL;
+	state = sk->sk_state;
+	sk->sk_state = BUS_CLOSE;
+
+	if (u->bus_master)
+			u->bus->master = NULL;
+
+	if (u->bus_master_side) {
+		bus_release_addr(u->addr);
+		u->addr = NULL;
+	} else {
+		u->addr = NULL;
+
+		spin_lock(&bus_address_lock);
+		hlist_for_each_entry_safe(addr, node, tmp, &u->addr_list,
+					  addr_node) {
+			hlist_del(&addr->addr_node);
+			__bus_remove_address(addr);
+			bus_release_addr(addr);
+		}
+		spin_unlock(&bus_address_lock);
+	}
+
+	bus_state_unlock(sk);
+
+	wake_up_interruptible_all(&u->peer_wait);
+
+	skpair = bus_peer(sk);
+
+	if (skpair != NULL) {
+		bus_state_lock(skpair);
+		/* No more writes */
+		skpair->sk_shutdown = SHUTDOWN_MASK;
+		if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
+			skpair->sk_err = ECONNRESET;
+		bus_state_unlock(skpair);
+		skpair->sk_state_change(skpair);
+		sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
+		sock_put(skpair); /* It may now die */
+		bus_peer(sk) = NULL;
+	}
+
+	/* Try to flush out this socket. Throw out buffers at least */
+
+	while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+		if (state == BUS_LISTEN)
+			bus_release_sock(skb->sk, 1);
+		/* passed fds are erased in the kfree_skb hook	      */
+		kfree_skb(skb);
+	}
+
+	if (path.dentry)
+		path_put(&path);
+
+	sock_put(sk);
+
+	/* ---- Socket is dead now and most probably destroyed ---- */
+
+	if (bus_tot_inflight)
+		bus_gc();		/* Garbage collect fds */
+
+	return 0;
+}
+
+static void init_peercred(struct sock *sk)
+{
+	put_pid(sk->sk_peer_pid);
+	if (sk->sk_peer_cred)
+		put_cred(sk->sk_peer_cred);
+	sk->sk_peer_pid  = get_pid(task_tgid(current));
+	sk->sk_peer_cred = get_current_cred();
+}
+
+static void copy_peercred(struct sock *sk, struct sock *peersk)
+{
+	put_pid(sk->sk_peer_pid);
+	if (sk->sk_peer_cred)
+		put_cred(sk->sk_peer_cred);
+	sk->sk_peer_pid  = get_pid(peersk->sk_peer_pid);
+	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
+}
+
+static int bus_listen(struct socket *sock, int backlog)
+{
+	int err;
+	struct sock *sk = sock->sk;
+	struct bus_sock *u = bus_sk(sk);
+	struct pid *old_pid = NULL;
+	const struct cred *old_cred = NULL;
+
+	err = -EINVAL;
+	if (!u->addr || !u->bus_master)
+		goto out;	/* Only listens on an bound an master socket */
+	bus_state_lock(sk);
+	if (sk->sk_state != BUS_CLOSE && sk->sk_state != BUS_LISTEN)
+		goto out_unlock;
+	if (backlog > sk->sk_max_ack_backlog)
+		wake_up_interruptible_all(&u->peer_wait);
+	sk->sk_max_ack_backlog	= backlog;
+	sk->sk_state		= BUS_LISTEN;
+	/* set credentials so connect can copy them */
+	init_peercred(sk);
+	err = 0;
+
+out_unlock:
+	bus_state_unlock(sk);
+	put_pid(old_pid);
+	if (old_cred)
+		put_cred(old_cred);
+out:
+	return err;
+}
+
+static int bus_release(struct socket *);
+static int bus_bind(struct socket *, struct sockaddr *, int);
+static int bus_connect(struct socket *, struct sockaddr *,
+			       int addr_len, int flags);
+static int bus_accept(struct socket *, struct socket *, int);
+static int bus_getname(struct socket *, struct sockaddr *, int *, int);
+static unsigned int bus_poll(struct file *, struct socket *,
+				    poll_table *);
+static int bus_ioctl(struct socket *, unsigned int, unsigned long);
+static int bus_shutdown(struct socket *, int);
+static int bus_setsockopt(struct socket *, int, int, char __user *,
+			   unsigned int);
+static int bus_sendmsg(struct kiocb *, struct socket *,
+		       struct msghdr *, size_t);
+static int bus_recvmsg(struct kiocb *, struct socket *,
+		       struct msghdr *, size_t, int);
+
+static void bus_set_peek_off(struct sock *sk, int val)
+{
+	struct bus_sock *u = bus_sk(sk);
+
+	mutex_lock(&u->readlock);
+	sk->sk_peek_off = val;
+	mutex_unlock(&u->readlock);
+}
+
+static const struct proto_ops bus_seqpacket_ops = {
+	.family =	PF_BUS,
+	.owner =	THIS_MODULE,
+	.release =	bus_release,
+	.bind =		bus_bind,
+	.connect =	bus_connect,
+	.socketpair =	sock_no_socketpair,
+	.accept =	bus_accept,
+	.getname =	bus_getname,
+	.poll =		bus_poll,
+	.ioctl =	bus_ioctl,
+	.listen =	bus_listen,
+	.shutdown =	bus_shutdown,
+	.setsockopt =	bus_setsockopt,
+	.getsockopt =	sock_no_getsockopt,
+	.sendmsg =	bus_sendmsg,
+	.recvmsg =	bus_recvmsg,
+	.mmap =		sock_no_mmap,
+	.sendpage =	sock_no_sendpage,
+	.set_peek_off =	bus_set_peek_off,
+};
+
+static struct proto bus_proto = {
+	.name			= "BUS",
+	.owner			= THIS_MODULE,
+	.obj_size		= sizeof(struct bus_sock),
+};
+
+/*
+ * AF_BUS sockets do not interact with hardware, hence they
+ * dont trigger interrupts - so it's safe for them to have
+ * bh-unsafe locking for their sk_receive_queue.lock. Split off
+ * this special lock-class by reinitializing the spinlock key:
+ */
+static struct lock_class_key af_bus_sk_receive_queue_lock_key;
+
+static struct sock *bus_create1(struct net *net, struct socket *sock)
+{
+	struct sock *sk = NULL;
+	struct bus_sock *u;
+
+	atomic_long_inc(&bus_nr_socks);
+	if (atomic_long_read(&bus_nr_socks) > 2 * get_max_files())
+		goto out;
+
+	sk = sk_alloc(net, PF_BUS, GFP_KERNEL, &bus_proto);
+	if (!sk)
+		goto out;
+
+	sock_init_data(sock, sk);
+	lockdep_set_class(&sk->sk_receive_queue.lock,
+				&af_bus_sk_receive_queue_lock_key);
+
+	sk->sk_write_space	= bus_write_space;
+	sk->sk_max_ack_backlog	= BUS_MAX_QLEN;
+	sk->sk_destruct		= bus_sock_destructor;
+	u	  = bus_sk(sk);
+	u->path.dentry = NULL;
+	u->path.mnt = NULL;
+	u->bus = NULL;
+	u->bus_master = false;
+	u->authenticated = false;
+	u->eavesdropper = false;
+	spin_lock_init(&u->lock);
+	atomic_long_set(&u->inflight, 0);
+	INIT_LIST_HEAD(&u->link);
+	INIT_HLIST_HEAD(&u->addr_list);
+	INIT_HLIST_NODE(&u->bus_node);
+	mutex_init(&u->readlock); /* single task reading lock */
+	init_waitqueue_head(&u->peer_wait);
+	bus_insert_socket(bus_sockets_unbound, sk);
+out:
+	if (sk == NULL)
+		atomic_long_dec(&bus_nr_socks);
+	else {
+		local_bh_disable();
+		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
+		local_bh_enable();
+	}
+	return sk;
+}
+
+static int bus_create(struct net *net, struct socket *sock, int protocol,
+		       int kern)
+{
+	struct sock *sk;
+
+	if (protocol < BUS_PROTO_NONE || protocol > BUS_PROTO_DBUS)
+		return -EPROTONOSUPPORT;
+
+	if (protocol != BUS_PROTO_NONE)
+		request_module("net-pf-%d-proto-%d", PF_BUS, protocol);
+
+	sock->state = SS_UNCONNECTED;
+
+	if (sock->type == SOCK_SEQPACKET)
+		sock->ops = &bus_seqpacket_ops;
+	else
+		return -ESOCKTNOSUPPORT;
+
+	sk = bus_create1(net, sock);
+	if (!sk)
+		return -ENOMEM;
+
+	sk->sk_protocol = protocol;
+
+	return 0;
+}
+
+static int bus_release(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+
+	if (!sk)
+		return 0;
+
+	sock->sk = NULL;
+
+	return bus_release_sock(sk, 0);
+}
+
+static struct sock *bus_find_other(struct net *net,
+				   struct sockaddr_bus *sbusname, int len,
+				   int protocol, unsigned int hash, int *error)
+{
+	struct sock *u;
+	struct path path;
+	int err = 0;
+
+	if (sbusname->sbus_path[0]) {
+		struct inode *inode;
+		err = kern_path(sbusname->sbus_path, LOOKUP_FOLLOW, &path);
+		if (err)
+			goto fail;
+		inode = path.dentry->d_inode;
+		err = inode_permission(inode, MAY_WRITE);
+		if (err)
+			goto put_fail;
+
+		err = -ECONNREFUSED;
+		if (!S_ISSOCK(inode->i_mode))
+			goto put_fail;
+		u = bus_find_socket_byaddress(net, sbusname, len, protocol,
+					      hash);
+		if (!u)
+			goto put_fail;
+
+		touch_atime(&path);
+		path_put(&path);
+
+	} else {
+		err = -ECONNREFUSED;
+		u = bus_find_socket_byaddress(net, sbusname, len, protocol, hash);
+		if (u) {
+			struct dentry *dentry;
+			dentry = bus_sk(u)->path.dentry;
+			if (dentry)
+				touch_atime(&bus_sk(u)->path);
+		} else
+			goto fail;
+	}
+
+	return u;
+
+put_fail:
+	path_put(&path);
+fail:
+	*error = err;
+	return NULL;
+}
+
+
+static int bus_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+{
+	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
+	struct bus_sock *u = bus_sk(sk);
+	struct sockaddr_bus *sbusaddr = (struct sockaddr_bus *)uaddr;
+	char *sbus_path = sbusaddr->sbus_path;
+	struct dentry *dentry = NULL;
+	struct path path;
+	int err;
+	unsigned int hash;
+	struct bus_address *addr;
+	struct hlist_head *list;
+	struct bus *bus;
+
+	err = -EINVAL;
+	if (sbusaddr->sbus_family != AF_BUS)
+		goto out;
+
+	/* If the address is available, the socket is the bus master */
+	sbusaddr->sbus_addr.s_addr = BUS_MASTER_ADDR;
+
+	err = bus_mkname(sbusaddr, addr_len, &hash);
+	if (err < 0)
+		goto out;
+	addr_len = err;
+
+	mutex_lock(&u->readlock);
+
+	err = -EINVAL;
+	if (u->addr)
+		goto out_up;
+
+	err = -ENOMEM;
+	addr = kzalloc(sizeof(*addr) + sizeof(struct sockaddr_bus), GFP_KERNEL);
+	if (!addr)
+		goto out_up;
+
+	memcpy(addr->name, sbusaddr, sizeof(struct sockaddr_bus));
+	addr->len = addr_len;
+	addr->hash = hash;
+	atomic_set(&addr->refcnt, 1);
+	addr->sock = sk;
+	INIT_HLIST_NODE(&addr->addr_node);
+	INIT_HLIST_NODE(&addr->table_node);
+
+	if (sbus_path[0]) {
+		umode_t mode;
+		err = 0;
+		/*
+		 * Get the parent directory, calculate the hash for last
+		 * component.
+		 */
+		dentry = kern_path_create(AT_FDCWD, sbus_path, &path, 0);
+		err = PTR_ERR(dentry);
+		if (IS_ERR(dentry))
+			goto out_mknod_parent;
+
+		/*
+		 * All right, let's create it.
+		 */
+		mode = S_IFSOCK |
+		       (SOCK_INODE(sock)->i_mode & ~current_umask());
+		err = mnt_want_write(path.mnt);
+		if (err)
+			goto out_mknod_dput;
+		err = security_path_mknod(&path, dentry, mode, 0);
+		if (err)
+			goto out_mknod_drop_write;
+		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
+out_mknod_drop_write:
+		mnt_drop_write(path.mnt);
+		if (err)
+			goto out_mknod_dput;
+		mutex_unlock(&path.dentry->d_inode->i_mutex);
+		dput(path.dentry);
+		path.dentry = dentry;
+	}
+
+	err = -ENOMEM;
+	bus = kzalloc(sizeof(*bus), GFP_KERNEL);
+	if (!bus)
+		goto out_unlock;
+
+	spin_lock(&bus_table_lock);
+
+	if (!sbus_path[0]) {
+		err = -EADDRINUSE;
+		if (__bus_find_socket_byname(net, sbusaddr, addr_len, hash)) {
+			bus_release_addr(addr);
+			kfree(bus);
+			goto out_unlock;
+		}
+
+		list = &bus_socket_table[addr->hash];
+	} else {
+		list = &bus_socket_table[dentry->d_inode->i_ino &
+					 (BUS_HASH_SIZE-1)];
+		u->path = path;
+	}
+
+	kref_init(&bus->kref);
+	bus->master = sk;
+	INIT_HLIST_HEAD(&bus->peers);
+	spin_lock_init(&bus->lock);
+	spin_lock_init(&bus->send_lock);
+	atomic64_set(&bus->addr_cnt, 0);
+	atomic64_set(&bus->eavesdropper_cnt, 0);
+
+	hlist_add_head(&addr->addr_node, &u->addr_list);
+
+	err = 0;
+	__bus_remove_socket(sk);
+	u->addr = addr;
+	u->bus_master = true;
+	u->bus = bus;
+	__bus_insert_socket(list, sk);
+	bus_insert_address(&bus_address_table[addr->hash], addr);
+
+out_unlock:
+	spin_unlock(&bus_table_lock);
+out_up:
+	mutex_unlock(&u->readlock);
+out:
+	return err;
+
+out_mknod_dput:
+	dput(dentry);
+	mutex_unlock(&path.dentry->d_inode->i_mutex);
+	path_put(&path);
+out_mknod_parent:
+	if (err == -EEXIST)
+		err = -EADDRINUSE;
+	bus_release_addr(addr);
+	goto out_up;
+}
+
+static long bus_wait_for_peer(struct sock *other, long timeo)
+{
+	struct bus_sock *u = bus_sk(other);
+	int sched;
+	DEFINE_WAIT(wait);
+
+	prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
+
+	sched = !sock_flag(other, SOCK_DEAD) &&
+		!(other->sk_shutdown & RCV_SHUTDOWN) &&
+		bus_recvq_full(other);
+
+	bus_state_unlock(other);
+
+	if (sched)
+		timeo = schedule_timeout(timeo);
+
+	finish_wait(&u->peer_wait, &wait);
+	return timeo;
+}
+
+static int bus_connect(struct socket *sock, struct sockaddr *uaddr,
+			       int addr_len, int flags)
+{
+	struct sockaddr_bus *sbusaddr = (struct sockaddr_bus *)uaddr;
+	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
+	struct bus_sock *u = bus_sk(sk), *newu, *otheru;
+	struct sock *newsk = NULL;
+	struct sock *other = NULL;
+	struct sk_buff *skb = NULL;
+	struct bus_address *addr = NULL;
+	unsigned int hash;
+	int st;
+	int err;
+	long timeo;
+
+	/* Only connections to the bus master is allowed */
+	sbusaddr->sbus_addr.s_addr = BUS_MASTER_ADDR;
+
+	err = bus_mkname(sbusaddr, addr_len, &hash);
+	if (err < 0)
+		goto out;
+	addr_len = err;
+
+	err = -ENOMEM;
+	addr = kzalloc(sizeof(*addr) + sizeof(struct sockaddr_bus), GFP_KERNEL);
+	if (!addr)
+		goto out;
+
+	atomic_set(&addr->refcnt, 1);
+	INIT_HLIST_NODE(&addr->addr_node);
+	INIT_HLIST_NODE(&addr->table_node);
+
+	timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
+
+	/* First of all allocate resources.
+	   If we will make it after state is locked,
+	   we will have to recheck all again in any case.
+	 */
+
+	err = -ENOMEM;
+
+	/* create new sock for complete connection */
+	newsk = bus_create1(sock_net(sk), NULL);
+	if (newsk == NULL)
+		goto out;
+
+	/* Allocate skb for sending to listening sock */
+	skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL);
+	if (skb == NULL)
+		goto out;
+
+restart:
+	/*  Find listening sock. */
+	other = bus_find_other(net, sbusaddr, addr_len, sk->sk_protocol, hash,
+			       &err);
+	if (!other)
+		goto out;
+
+	/* Latch state of peer */
+	bus_state_lock(other);
+
+	/* Apparently VFS overslept socket death. Retry. */
+	if (sock_flag(other, SOCK_DEAD)) {
+		bus_state_unlock(other);
+		sock_put(other);
+		goto restart;
+	}
+
+	err = -ECONNREFUSED;
+	if (other->sk_state != BUS_LISTEN)
+		goto out_unlock;
+	if (other->sk_shutdown & RCV_SHUTDOWN)
+		goto out_unlock;
+
+	if (bus_recvq_full(other)) {
+		err = -EAGAIN;
+		if (!timeo)
+			goto out_unlock;
+
+		timeo = bus_wait_for_peer(other, timeo);
+
+		err = sock_intr_errno(timeo);
+		if (signal_pending(current))
+			goto out;
+		sock_put(other);
+		goto restart;
+	}
+
+	/* Latch our state.
+
+	   It is tricky place. We need to grab our state lock and cannot
+	   drop lock on peer. It is dangerous because deadlock is
+	   possible. Connect to self case and simultaneous
+	   attempt to connect are eliminated by checking socket
+	   state. other is BUS_LISTEN, if sk is BUS_LISTEN we
+	   check this before attempt to grab lock.
+
+	   Well, and we have to recheck the state after socket locked.
+	 */
+	st = sk->sk_state;
+
+	switch (st) {
+	case BUS_CLOSE:
+		/* This is ok... continue with connect */
+		break;
+	case BUS_ESTABLISHED:
+		/* Socket is already connected */
+		err = -EISCONN;
+		goto out_unlock;
+	default:
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
+	bus_state_lock_nested(sk);
+
+	if (sk->sk_state != st) {
+		bus_state_unlock(sk);
+		bus_state_unlock(other);
+		sock_put(other);
+		goto restart;
+	}
+
+	err = security_bus_connect(sk, other, newsk);
+	if (err) {
+		bus_state_unlock(sk);
+		goto out_unlock;
+	}
+
+	/* The way is open! Fastly set all the necessary fields... */
+
+	sock_hold(sk);
+	bus_peer(newsk)	= sk;
+	newsk->sk_state		= BUS_ESTABLISHED;
+	newsk->sk_type		= sk->sk_type;
+	newsk->sk_protocol	= sk->sk_protocol;
+	init_peercred(newsk);
+	newu = bus_sk(newsk);
+	RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq);
+	otheru = bus_sk(other);
+
+	/* copy address information from listening to new sock*/
+	if (otheru->addr && otheru->bus_master) {
+		atomic_inc(&otheru->addr->refcnt);
+		newu->addr = otheru->addr;
+		memcpy(addr->name, otheru->addr->name,
+		       sizeof(struct sockaddr_bus));
+		addr->len = otheru->addr->len;
+		addr->name->sbus_addr.s_addr =
+			(atomic64_inc_return(&otheru->bus->addr_cnt) &
+			 BUS_CLIENT_MASK);
+		addr->hash = bus_compute_hash(addr->name->sbus_addr);
+		addr->sock = sk;
+		u->addr = addr;
+		kref_get(&otheru->bus->kref);
+		u->bus = otheru->bus;
+		u->bus_master_side = false;
+		kref_get(&otheru->bus->kref);
+		newu->bus = otheru->bus;
+		newu->bus_master_side = true;
+		hlist_add_head(&addr->addr_node, &u->addr_list);
+
+		bus_insert_address(&bus_address_table[addr->hash], addr);
+	}
+	if (otheru->path.dentry) {
+		path_get(&otheru->path);
+		newu->path = otheru->path;
+	}
+
+	/* Set credentials */
+	copy_peercred(sk, other);
+	sk->sk_sndbuf = other->sk_sndbuf;
+	sk->sk_max_ack_backlog	= other->sk_max_ack_backlog;
+	newsk->sk_sndbuf = other->sk_sndbuf;
+
+	sock->state	= SS_CONNECTED;
+	sk->sk_state	= BUS_ESTABLISHED;
+	sock_hold(newsk);
+
+	smp_mb__after_atomic_inc();	/* sock_hold() does an atomic_inc() */
+	bus_peer(sk)	= newsk;
+
+	bus_state_unlock(sk);
+
+	/* take ten and and send info to listening sock */
+	spin_lock(&other->sk_receive_queue.lock);
+	__skb_queue_tail(&other->sk_receive_queue, skb);
+	spin_unlock(&other->sk_receive_queue.lock);
+	bus_state_unlock(other);
+	other->sk_data_ready(other, 0);
+	sock_put(other);
+	return 0;
+
+out_unlock:
+	if (other)
+		bus_state_unlock(other);
+
+out:
+	kfree_skb(skb);
+	if (addr)
+		bus_release_addr(addr);
+	if (newsk)
+		bus_release_sock(newsk, 0);
+	if (other)
+		sock_put(other);
+	return err;
+}
+
+static int bus_accept(struct socket *sock, struct socket *newsock, int flags)
+{
+	struct sock *sk = sock->sk;
+	struct sock *tsk;
+	struct sk_buff *skb;
+	int err;
+
+	err = -EINVAL;
+	if (sk->sk_state != BUS_LISTEN)
+		goto out;
+
+	/* If socket state is BUS_LISTEN it cannot change (for now...),
+	 * so that no locks are necessary.
+	 */
+
+	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err);
+	if (!skb) {
+		/* This means receive shutdown. */
+		if (err == 0)
+			err = -EINVAL;
+		goto out;
+	}
+
+	tsk = skb->sk;
+	skb_free_datagram(sk, skb);
+	wake_up_interruptible(&bus_sk(sk)->peer_wait);
+
+	/* attach accepted sock to socket */
+	bus_state_lock(tsk);
+	newsock->state = SS_CONNECTED;
+	sock_graft(tsk, newsock);
+	bus_state_unlock(tsk);
+	return 0;
+
+out:
+	return err;
+}
+
+
+static int bus_getname(struct socket *sock, struct sockaddr *uaddr,
+		       int *uaddr_len, int peer)
+{
+	struct sock *sk = sock->sk;
+	struct bus_sock *u;
+	DECLARE_SOCKADDR(struct sockaddr_bus *, sbusaddr, uaddr);
+	int err = 0;
+
+	if (peer) {
+		sk = bus_peer_get(sk);
+
+		err = -ENOTCONN;
+		if (!sk)
+			goto out;
+		err = 0;
+	} else {
+		sock_hold(sk);
+	}
+
+	u = bus_sk(sk);
+
+	bus_state_lock(sk);
+	if (!u->addr) {
+		sbusaddr->sbus_family = AF_BUS;
+		sbusaddr->sbus_path[0] = 0;
+		*uaddr_len = sizeof(short);
+	} else {
+		struct bus_address *addr = u->addr;
+
+		*uaddr_len = sizeof(struct sockaddr_bus);
+		memcpy(sbusaddr, addr->name, *uaddr_len);
+	}
+	bus_state_unlock(sk);
+	sock_put(sk);
+out:
+	return err;
+}
+
+static void bus_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
+{
+	int i;
+
+	scm->fp = BUSCB(skb).fp;
+	BUSCB(skb).fp = NULL;
+
+	for (i = scm->fp->count-1; i >= 0; i--)
+		bus_notinflight(scm->fp->fp[i]);
+}
+
+static void bus_destruct_scm(struct sk_buff *skb)
+{
+	struct scm_cookie scm;
+	memset(&scm, 0, sizeof(scm));
+	scm.pid  = BUSCB(skb).pid;
+	scm.cred = BUSCB(skb).cred;
+	if (BUSCB(skb).fp)
+		bus_detach_fds(&scm, skb);
+
+	scm_destroy(&scm);
+	if (skb->sk)
+		sock_wfree(skb);
+}
+
+#define MAX_RECURSION_LEVEL 4
+
+static int bus_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
+{
+	int i;
+	unsigned char max_level = 0;
+	int bus_sock_count = 0;
+
+	for (i = scm->fp->count - 1; i >= 0; i--) {
+		struct sock *sk = bus_get_socket(scm->fp->fp[i]);
+
+		if (sk) {
+			bus_sock_count++;
+			max_level = max(max_level,
+					bus_sk(sk)->recursion_level);
+		}
+	}
+	if (unlikely(max_level > MAX_RECURSION_LEVEL))
+		return -ETOOMANYREFS;
+
+	/*
+	 * Need to duplicate file references for the sake of garbage
+	 * collection.  Otherwise a socket in the fps might become a
+	 * candidate for GC while the skb is not yet queued.
+	 */
+	BUSCB(skb).fp = scm_fp_dup(scm->fp);
+	if (!BUSCB(skb).fp)
+		return -ENOMEM;
+
+	if (bus_sock_count) {
+		for (i = scm->fp->count - 1; i >= 0; i--)
+			bus_inflight(scm->fp->fp[i]);
+	}
+	return max_level;
+}
+
+static int bus_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
+			  bool send_fds)
+{
+	int err = 0;
+
+	BUSCB(skb).pid  = get_pid(scm->pid);
+	if (scm->cred)
+		BUSCB(skb).cred = get_cred(scm->cred);
+	BUSCB(skb).fp = NULL;
+	if (scm->fp && send_fds)
+		err = bus_attach_fds(scm, skb);
+
+	skb->destructor = bus_destruct_scm;
+	return err;
+}
+
+/*
+ * Some apps rely on write() giving SCM_CREDENTIALS
+ * We include credentials if source or destination socket
+ * asserted SOCK_PASSCRED.
+ */
+static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
+			    const struct sock *other)
+{
+	if (BUSCB(skb).cred)
+		return;
+	if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+	    !other->sk_socket ||
+	    test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
+		BUSCB(skb).pid  = get_pid(task_tgid(current));
+		BUSCB(skb).cred = get_current_cred();
+	}
+}
+
+/*
+ *	Send AF_BUS data.
+ */
+
+static void bus_deliver_skb(struct sk_buff *skb)
+{
+	struct bus_send_context *sendctx = BUSCB(skb).sendctx;
+	struct socket *sock = sendctx->sender_socket;
+
+	if (sock_flag(sendctx->other, SOCK_RCVTSTAMP))
+		__net_timestamp(skb);
+	maybe_add_creds(skb, sock, sendctx->other);
+	skb_queue_tail(&sendctx->other->sk_receive_queue, skb);
+	if (sendctx->max_level > bus_sk(sendctx->other)->recursion_level)
+		bus_sk(sendctx->other)->recursion_level = sendctx->max_level;
+}
+
+/**
+ * bus_sendmsg_finish - delivery an skb to a destination
+ * @skb: sk_buff to deliver
+ *
+ * Delivers a packet to a destination. The skb control buffer has
+ * all the information about the destination contained on sending
+ * context. If the sending is unicast, then the skb is delivered
+ * and the receiver notified but if the sending is multicast, the
+ * skb is just marked as delivered and the actual delivery is made
+ * outside the function with the bus->send_lock held to ensure that
+ * the multicast sending is atomic.
+ */
+static int bus_sendmsg_finish(struct sk_buff *skb)
+{
+	int err;
+	struct bus_send_context *sendctx;
+	struct socket *sock;
+	struct sock *sk;
+	struct net *net;
+	size_t len = skb->len;
+
+	sendctx = BUSCB(skb).sendctx;
+	sock = sendctx->sender_socket;
+	sk = sock->sk;
+	net = sock_net(sk);
+
+restart:
+	if (!sendctx->other) {
+		err = -ECONNRESET;
+		if (sendctx->recipient == NULL)
+			goto out_free;
+
+		sendctx->other = bus_find_other(net, sendctx->recipient,
+						sendctx->namelen,
+						sk->sk_protocol,
+						sendctx->hash, &err);
+
+		if (sendctx->other == NULL ||
+		    !bus_sk(sendctx->other)->authenticated) {
+
+			if (sendctx->other)
+				sock_put(sendctx->other);
+
+			if (!bus_sk(sk)->bus_master_side) {
+				err = -ENOTCONN;
+				sendctx->other = bus_peer_get(sk);
+				if (!sendctx->other)
+					goto out_free;
+			} else {
+				sendctx->other = sk;
+				sock_hold(sendctx->other);
+			}
+		}
+	}
+
+	if (sk_filter(sendctx->other, skb) < 0) {
+		/* Toss the packet but do not return any error to the sender */
+		err = len;
+		goto out_free;
+	}
+
+	bus_state_lock(sendctx->other);
+
+	if (sock_flag(sendctx->other, SOCK_DEAD)) {
+		/*
+		 *	Check with 1003.1g - what should
+		 *	datagram error
+		 */
+		bus_state_unlock(sendctx->other);
+		sock_put(sendctx->other);
+
+		err = 0;
+		bus_state_lock(sk);
+		if (bus_peer(sk) == sendctx->other) {
+			bus_peer(sk) = NULL;
+			bus_state_unlock(sk);
+			sock_put(sendctx->other);
+			err = -ECONNREFUSED;
+		} else {
+			bus_state_unlock(sk);
+		}
+
+		sendctx->other = NULL;
+		if (err)
+			goto out_free;
+		goto restart;
+	}
+
+	err = -EPIPE;
+	if (sendctx->other->sk_shutdown & RCV_SHUTDOWN)
+		goto out_unlock;
+
+	if (bus_recvq_full(sendctx->other)) {
+		if (!sendctx->timeo) {
+			err = -EAGAIN;
+			goto out_unlock;
+		}
+
+		sendctx->timeo = bus_wait_for_peer(sendctx->other,
+						   sendctx->timeo);
+
+		err = sock_intr_errno(sendctx->timeo);
+		if (signal_pending(current))
+			goto out_free;
+
+		goto restart;
+	}
+
+	if (!sendctx->multicast && !sendctx->eavesdropper) {
+		bus_deliver_skb(skb);
+		bus_state_unlock(sendctx->other);
+		sendctx->other->sk_data_ready(sendctx->other, 0);
+		sock_put(sendctx->other);
+	} else {
+		sendctx->deliver = 1;
+		bus_state_unlock(sendctx->other);
+	}
+
+	return len;
+
+out_unlock:
+	bus_state_unlock(sendctx->other);
+out_free:
+	kfree_skb(skb);
+	if (sendctx->other)
+		sock_put(sendctx->other);
+
+	return err;
+}
+
+/**
+ * bus_sendmsg_mcast - do a multicast sending
+ * @skb: sk_buff to deliver
+ *
+ * Send a packet to a multicast destination.
+ * The function is also called for unicast sending when eavesdropping
+ * is enabled. Since the unicast destination and the eavesdroppers
+ * have to receive the packet atomically.
+ */
+static int bus_sendmsg_mcast(struct sk_buff *skb)
+{
+	struct bus_send_context *sendctx;
+	struct bus_send_context *tmpctx;
+	struct socket *sock;
+	struct sock *sk;
+	struct net *net;
+	struct bus_sock *u, *s;
+	struct hlist_node *node;
+	u16 prefix = 0;
+	struct sk_buff **skb_set = NULL;
+	struct bus_send_context **sendctx_set = NULL;
+	int  rcp_cnt, send_cnt;
+	int i;
+	int err;
+	int len = skb->len;
+	bool (*is_receiver) (struct sock *, u16);
+	bool main_rcp_found = false;
+
+	sendctx = BUSCB(skb).sendctx;
+	sendctx->deliver = 0;
+	sock = sendctx->sender_socket;
+	sk = sock->sk;
+	u = bus_sk(sk);
+	net = sock_net(sk);
+
+	if (sendctx->multicast) {
+		prefix = bus_addr_prefix(sendctx->recipient);
+		if (sendctx->eavesdropper)
+			is_receiver = &bus_has_prefix_eavesdropper;
+		else
+			is_receiver = &bus_has_prefix;
+	} else {
+		is_receiver = &bus_eavesdropper;
+
+		/*
+		 * If the destination is not the peer accepted socket
+		 * we have to get the correct destination.
+		 */
+		if (!sendctx->to_master && sendctx->recipient) {
+			sendctx->other = bus_find_other(net, sendctx->recipient,
+							sendctx->namelen,
+							sk->sk_protocol,
+							sendctx->hash, &err);
+
+
+			if (sendctx->other == NULL ||
+			    !bus_sk(sendctx->other)->authenticated) {
+
+				if (sendctx->other)
+					sock_put(sendctx->other);
+
+				if (sendctx->other == NULL) {
+					if (!bus_sk(sk)->bus_master_side) {
+						err = -ENOTCONN;
+						sendctx->other = bus_peer_get(sk);
+						if (!sendctx->other)
+							goto out;
+					} else {
+						sendctx->other = sk;
+						sock_hold(sendctx->other);
+					}
+				}
+				sendctx->to_master = 1;
+			}
+		}
+	}
+
+
+try_again:
+	rcp_cnt = 0;
+	main_rcp_found = false;
+
+	spin_lock(&u->bus->lock);
+
+	hlist_for_each_entry(s, node, &u->bus->peers, bus_node) {
+
+		if (!net_eq(sock_net(&s->sk), net))
+			continue;
+
+		if (is_receiver(&s->sk, prefix) ||
+		    (!sendctx->multicast &&
+		     !sendctx->to_master &&
+		     &s->sk == sendctx->other))
+			rcp_cnt++;
+	}
+
+	spin_unlock(&u->bus->lock);
+
+	/*
+	 * Memory can't be allocated while holding a spinlock so
+	 * we have to release the lock, do the allocation for the
+	 * array to store each destination peer sk_buff and grab
+	 * the bus peer lock again. Peers could have joined the
+	 * bus while we relesed the lock so we allocate 5 more
+	 * recipients hoping that this will be enough to not having
+	 * to try again in case only a few peers joined the bus.
+	 */
+	rcp_cnt += 5;
+	skb_set = kzalloc(sizeof(struct sk_buff *) * rcp_cnt, GFP_KERNEL);
+
+	if (!skb_set) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	sendctx_set = kzalloc(sizeof(struct bus_send_context *) * rcp_cnt,
+			      GFP_KERNEL);
+	if (!sendctx_set) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < rcp_cnt; i++) {
+		skb_set[i] = skb_clone(skb, GFP_KERNEL);
+		if (!skb_set[i]) {
+			err = -ENOMEM;
+			goto out_free;
+		}
+		sendctx_set[i] = BUSCB(skb_set[i]).sendctx
+			= kmalloc(sizeof(*sendctx) * rcp_cnt, GFP_KERNEL);
+		if (!sendctx_set[i]) {
+			err = -ENOMEM;
+			goto out_free;
+		}
+		memcpy(sendctx_set[i], sendctx, sizeof(*sendctx));
+		err = bus_scm_to_skb(sendctx_set[i]->siocb->scm,
+				     skb_set[i], true);
+		if (err < 0)
+			goto out_free;
+		bus_get_secdata(sendctx_set[i]->siocb->scm,
+				skb_set[i]);
+
+		sendctx_set[i]->other = NULL;
+	}
+
+	send_cnt = 0;
+
+	spin_lock(&u->bus->lock);
+
+	hlist_for_each_entry(s, node, &u->bus->peers, bus_node) {
+
+		if (!net_eq(sock_net(&s->sk), net))
+			continue;
+
+		if (send_cnt >= rcp_cnt) {
+			spin_unlock(&u->bus->lock);
+
+			for (i = 0; i < rcp_cnt; i++) {
+				sock_put(sendctx_set[i]->other);
+				kfree_skb(skb_set[i]);
+				kfree(sendctx_set[i]);
+			}
+			kfree(skb_set);
+			kfree(sendctx_set);
+			sendctx_set = NULL;
+			skb_set = NULL;
+			goto try_again;
+		}
+
+		if (is_receiver(&s->sk, prefix) ||
+		    (!sendctx->multicast &&
+		     !sendctx->to_master &&
+		     &s->sk == sendctx->other)) {
+			skb_set_owner_w(skb_set[send_cnt], &s->sk);
+			tmpctx = BUSCB(skb_set[send_cnt]).sendctx;
+			sock_hold(&s->sk);
+			if (&s->sk == sendctx->other) {
+				tmpctx->main_recipient = 1;
+				main_rcp_found = true;
+			}
+			tmpctx->other = &s->sk;
+			tmpctx->recipient = s->addr->name;
+			tmpctx->eavesdropper = bus_eavesdropper(&s->sk, 0);
+
+			send_cnt++;
+		}
+	}
+
+	spin_unlock(&u->bus->lock);
+
+	/*
+	 * Peers have left the bus so we have to free
+	 * their pre-allocated bus_send_context and
+	 * socket buffers.
+	 */
+	if (send_cnt < rcp_cnt) {
+		for (i = send_cnt; i < rcp_cnt; i++) {
+			kfree_skb(skb_set[i]);
+			kfree(sendctx_set[i]);
+		}
+		rcp_cnt = send_cnt;
+	}
+
+	for (i = 0; i < send_cnt; i++) {
+		tmpctx = BUSCB(skb_set[i]).sendctx;
+		tmpctx->deliver = 0;
+		err = NF_HOOK(NFPROTO_BUS, NF_BUS_SENDING, skb_set[i],
+			      NULL, NULL, bus_sendmsg_finish);
+		if (err == -EPERM)
+			sock_put(tmpctx->other);
+	}
+
+	/*
+	 * If the send context is not multicast, the destination
+	 * coud be either the peer accepted socket descriptor or
+	 * a peer that is not an eavesdropper. If the peer is not
+	 * the accepted socket descriptor and has been authenticated,
+	 * it is a member of the bus peer list so it has already been
+	 * marked for delivery.
+	 * But if the destination is the accepted socket descriptor
+	 * or is a non-authenticated peer it is not a member of the
+	 * bus peer list so the packet has to be explicitly deliver
+	 * to it.
+	 */
+
+	if (!sendctx->multicast &&
+	    (sendctx->to_master ||
+	     (sendctx->bus_master_side && !main_rcp_found))) {
+		sendctx->main_recipient = 1;
+		err = NF_HOOK(NFPROTO_BUS, NF_BUS_SENDING, skb, NULL, NULL,
+			bus_sendmsg_finish);
+		if (err == -EPERM)
+			sock_put(sendctx->other);
+	}
+
+	spin_lock(&u->bus->send_lock);
+
+	for (i = 0; i < send_cnt; i++) {
+		tmpctx = sendctx_set[i];
+		if (tmpctx->deliver != 1)
+			continue;
+
+		bus_state_lock(tmpctx->other);
+		bus_deliver_skb(skb_set[i]);
+		bus_state_unlock(tmpctx->other);
+	}
+
+	if (!sendctx->multicast &&
+	    sendctx->deliver == 1 &&
+	    !bus_sk(sendctx->other)->eavesdropper) {
+		bus_state_lock(sendctx->other);
+		bus_deliver_skb(skb);
+		bus_state_unlock(sendctx->other);
+	}
+
+	spin_unlock(&u->bus->send_lock);
+
+	for (i = 0; i < send_cnt; i++) {
+		tmpctx = sendctx_set[i];
+		if (tmpctx->deliver != 1)
+			continue;
+
+		tmpctx->other->sk_data_ready(tmpctx->other, 0);
+		sock_put(tmpctx->other);
+	}
+
+	if (!sendctx->multicast &&
+	    sendctx->deliver == 1 &&
+	    !bus_sk(sendctx->other)->eavesdropper) {
+		sendctx->other->sk_data_ready(sendctx->other, 0);
+		sock_put(sendctx->other);
+	}
+
+	err = len;
+	goto out;
+
+out_free:
+	for (i = 0; i < rcp_cnt; i++) {
+		if (skb_set[i])
+			kfree_skb(skb_set[i]);
+	}
+
+out:
+	kfree(skb_set);
+	if (sendctx_set) {
+		for (i = 0; i < rcp_cnt; i++)
+			kfree(sendctx_set[i]);
+		kfree(sendctx_set);
+	}
+
+	if (sendctx->deliver == 0) {
+		if (!sendctx->to_master &&
+		    !(sendctx->bus_master_side && !main_rcp_found))
+			kfree_skb(skb);
+		if (!sendctx->to_master &&
+		    !(sendctx->bus_master_side && !main_rcp_found))
+			if (sendctx->other)
+				sock_put(sendctx->other);
+	}
+	scm_destroy(sendctx->siocb->scm);
+
+	return err;
+}
+
+/**
+ * bus_sendmsg - send an skb to a destination
+ * @kiocb: I/O control block info
+ * @sock: sender socket
+ * @msg: message header
+ * @len: message length
+ *
+ * Send an socket buffer to a destination. The destination could be
+ * either an unicast or a multicast address. In any case, a copy of
+ * the packet has to be send to all the sockets that are allowed to
+ * eavesdrop the communication bus.
+ *
+ * If the destination address is not associated with any socket, the
+ * packet is default routed to the bus master (the sender accepted
+ * socket).
+ *
+ * The af_bus sending path is hooked to the netfilter subsystem so
+ * netfilter hooks can filter or modify the packet before delivery.
+ */
+static int bus_sendmsg(struct kiocb *kiocb, struct socket *sock,
+				struct msghdr *msg, size_t len)
+{
+	struct sock *sk = sock->sk;
+	struct bus_sock *u = bus_sk(sk);
+	struct sockaddr_bus *sbusaddr = msg->msg_name;
+	int err;
+	struct sk_buff *skb;
+	struct scm_cookie tmp_scm;
+	bool to_master = false;
+	bool multicast = false;
+	struct bus_send_context sendctx;
+
+	err = sock_error(sk);
+	if (err)
+		return err;
+
+	if (sk->sk_state != BUS_ESTABLISHED)
+		return -ENOTCONN;
+
+	if (!msg->msg_namelen)
+		sbusaddr = NULL;
+
+	if (sbusaddr && !bus_same_bus(sbusaddr, u->addr->name))
+		return -EHOSTUNREACH;
+
+	if ((!sbusaddr && !u->bus_master_side) ||
+	    (sbusaddr && sbusaddr->sbus_addr.s_addr == BUS_MASTER_ADDR))
+		to_master = true;
+	else if (sbusaddr && !u->bus_master_side && !u->authenticated)
+		return -EHOSTUNREACH;
+
+	sendctx.namelen = 0; /* fake GCC */
+	sendctx.siocb = kiocb_to_siocb(kiocb);
+	sendctx.other = NULL;
+
+	if (NULL == sendctx.siocb->scm)
+		sendctx.siocb->scm = &tmp_scm;
+	wait_for_bus_gc();
+	err = scm_send(sock, msg, sendctx.siocb->scm);
+	if (err < 0)
+		return err;
+
+	err = -EOPNOTSUPP;
+	if (msg->msg_flags&MSG_OOB)
+		goto out;
+
+	if (sbusaddr && !to_master) {
+		err = bus_mkname(sbusaddr, msg->msg_namelen, &sendctx.hash);
+		if (err < 0)
+			goto out;
+		sendctx.namelen = err;
+		multicast = bus_mc_addr(sbusaddr);
+	} else {
+		err = -ENOTCONN;
+		sendctx.other = bus_peer_get(sk);
+		if (!sendctx.other)
+			goto out;
+	}
+
+	err = -EMSGSIZE;
+	if (len > sk->sk_sndbuf - 32)
+		goto out;
+
+	sendctx.timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
+
+restart:
+	bus_state_lock(sk);
+	if (bus_recvq_full(sk)) {
+		err = -EAGAIN;
+		if (!sendctx.timeo) {
+			bus_state_unlock(sk);
+			goto out;
+		}
+
+		sendctx.timeo = bus_wait_for_peer(sk, sendctx.timeo);
+
+		err = sock_intr_errno(sendctx.timeo);
+		if (signal_pending(current))
+			goto out;
+
+		goto restart;
+	} else {
+		bus_state_unlock(sk);
+	}
+
+	skb = sock_alloc_send_skb(sk, len, msg->msg_flags&MSG_DONTWAIT, &err);
+	if (skb == NULL)
+		goto out;
+
+	err = bus_scm_to_skb(sendctx.siocb->scm, skb, true);
+	if (err < 0)
+		goto out_free;
+	sendctx.max_level = err + 1;
+	bus_get_secdata(sendctx.siocb->scm, skb);
+
+	skb_reset_transport_header(skb);
+	err = memcpy_fromiovec(skb_put(skb, len), msg->msg_iov, len);
+	if (err)
+		goto out_free;
+
+	sendctx.sender_socket = sock;
+	if (u->bus_master_side && sendctx.other) {
+		/* if the bus master sent an unicast message to a peer, we
+		 * need the address of that peer
+		 */
+		sendctx.sender = bus_sk(sendctx.other)->addr->name;
+	} else {
+		sendctx.sender = u->addr->name;
+	}
+	sendctx.recipient = sbusaddr;
+	sendctx.authenticated = u->authenticated;
+	sendctx.bus_master_side = u->bus_master_side;
+	sendctx.to_master = to_master;
+	sendctx.multicast = multicast;
+	sendctx.eavesdropper = atomic64_read(&u->bus->eavesdropper_cnt) ? 1 : 0;
+	BUSCB(skb).sendctx = &sendctx;
+
+	if (sendctx.multicast || sendctx.eavesdropper) {
+		sendctx.main_recipient = 0;
+		err = bus_sendmsg_mcast(skb);
+		return sendctx.multicast ? len : err;
+	} else {
+		sendctx.main_recipient = 1;
+		len = NF_HOOK(NFPROTO_BUS, NF_BUS_SENDING, skb, NULL, NULL,
+			      bus_sendmsg_finish);
+
+		if (len == -EPERM) {
+			err = len;
+			goto out;
+		} else {
+			scm_destroy(sendctx.siocb->scm);
+			return len;
+		}
+	}
+
+out_free:
+	kfree_skb(skb);
+out:
+	if (sendctx.other)
+		sock_put(sendctx.other);
+	scm_destroy(sendctx.siocb->scm);
+	return err;
+}
+
+static void bus_copy_addr(struct msghdr *msg, struct sock *sk)
+{
+	struct bus_sock *u = bus_sk(sk);
+
+	msg->msg_namelen = 0;
+	if (u->addr) {
+		msg->msg_namelen = u->addr->len;
+		memcpy(msg->msg_name, u->addr->name,
+		       sizeof(struct sockaddr_bus));
+	}
+}
+
+static int bus_recvmsg(struct kiocb *iocb, struct socket *sock,
+			  struct msghdr *msg, size_t size, int flags)
+{
+	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
+	struct scm_cookie tmp_scm;
+	struct sock *sk = sock->sk;
+	struct bus_sock *u = bus_sk(sk);
+	int noblock = flags & MSG_DONTWAIT;
+	struct sk_buff *skb;
+	int err;
+	int peeked, skip;
+
+	if (sk->sk_state != BUS_ESTABLISHED)
+		return -ENOTCONN;
+
+	err = -EOPNOTSUPP;
+	if (flags&MSG_OOB)
+		goto out;
+
+	msg->msg_namelen = 0;
+
+	err = mutex_lock_interruptible(&u->readlock);
+	if (err) {
+		err = sock_intr_errno(sock_rcvtimeo(sk, noblock));
+		goto out;
+	}
+
+	skip = sk_peek_offset(sk, flags);
+
+	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err);
+	if (!skb) {
+		bus_state_lock(sk);
+		/* Signal EOF on disconnected non-blocking SEQPACKET socket. */
+		if (err == -EAGAIN && (sk->sk_shutdown & RCV_SHUTDOWN))
+			err = 0;
+		bus_state_unlock(sk);
+		goto out_unlock;
+	}
+
+	wake_up_interruptible_sync_poll(&u->peer_wait,
+					POLLOUT | POLLWRNORM | POLLWRBAND);
+
+	if (msg->msg_name)
+		bus_copy_addr(msg, skb->sk);
+
+	if (size > skb->len - skip)
+		size = skb->len - skip;
+	else if (size < skb->len - skip)
+		msg->msg_flags |= MSG_TRUNC;
+
+	err = skb_copy_datagram_iovec(skb, skip, msg->msg_iov, size);
+	if (err)
+		goto out_free;
+
+	if (sock_flag(sk, SOCK_RCVTSTAMP))
+		__sock_recv_timestamp(msg, sk, skb);
+
+	if (!siocb->scm) {
+		siocb->scm = &tmp_scm;
+		memset(&tmp_scm, 0, sizeof(tmp_scm));
+	}
+	scm_set_cred(siocb->scm, BUSCB(skb).pid, BUSCB(skb).cred);
+	bus_set_secdata(siocb->scm, skb);
+
+	if (!(flags & MSG_PEEK)) {
+		if (BUSCB(skb).fp)
+			bus_detach_fds(siocb->scm, skb);
+
+		sk_peek_offset_bwd(sk, skb->len);
+	} else {
+		/* It is questionable: on PEEK we could:
+		   - do not return fds - good, but too simple 8)
+		   - return fds, and do not return them on read (old strategy,
+		     apparently wrong)
+		   - clone fds (I chose it for now, it is the most universal
+		     solution)
+
+		   POSIX 1003.1g does not actually define this clearly
+		   at all. POSIX 1003.1g doesn't define a lot of things
+		   clearly however!
+
+		*/
+
+		sk_peek_offset_fwd(sk, size);
+
+		if (BUSCB(skb).fp)
+			siocb->scm->fp = scm_fp_dup(BUSCB(skb).fp);
+	}
+	err = (flags & MSG_TRUNC) ? skb->len - skip : size;
+
+	scm_recv(sock, msg, siocb->scm, flags);
+
+out_free:
+	skb_free_datagram(sk, skb);
+out_unlock:
+	mutex_unlock(&u->readlock);
+out:
+	return err;
+}
+
+static int bus_shutdown(struct socket *sock, int mode)
+{
+	struct sock *sk = sock->sk;
+	struct sock *other;
+
+	mode = (mode+1)&(RCV_SHUTDOWN|SEND_SHUTDOWN);
+
+	if (!mode)
+		return 0;
+
+	bus_state_lock(sk);
+	sk->sk_shutdown |= mode;
+	other = bus_peer(sk);
+	if (other)
+		sock_hold(other);
+	bus_state_unlock(sk);
+	sk->sk_state_change(sk);
+
+	if (other) {
+
+		int peer_mode = 0;
+
+		if (mode&RCV_SHUTDOWN)
+			peer_mode |= SEND_SHUTDOWN;
+		if (mode&SEND_SHUTDOWN)
+			peer_mode |= RCV_SHUTDOWN;
+		bus_state_lock(other);
+		other->sk_shutdown |= peer_mode;
+		bus_state_unlock(other);
+		other->sk_state_change(other);
+		if (peer_mode == SHUTDOWN_MASK)
+			sk_wake_async(other, SOCK_WAKE_WAITD, POLL_HUP);
+		else if (peer_mode & RCV_SHUTDOWN)
+			sk_wake_async(other, SOCK_WAKE_WAITD, POLL_IN);
+		sock_put(other);
+	}
+
+	return 0;
+}
+
+static int bus_add_addr(struct sock *sk, struct bus_addr *sbus_addr)
+{
+	struct bus_address *addr;
+	struct sock *other;
+	struct bus_sock *u = bus_sk(sk);
+	struct net *net = sock_net(sk);
+	int ret = 0;
+
+	addr = kzalloc(sizeof(*addr) + sizeof(struct sockaddr_bus), GFP_KERNEL);
+	if (!addr) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	memcpy(addr->name, u->addr->name, sizeof(struct sockaddr_bus));
+	addr->len = u->addr->len;
+
+	addr->name->sbus_addr.s_addr = sbus_addr->s_addr;
+	addr->hash = bus_compute_hash(addr->name->sbus_addr);
+	other = bus_find_socket_byaddress(net, addr->name, addr->len,
+					  sk->sk_protocol, addr->hash);
+
+	if (other) {
+		sock_put(other);
+		kfree(addr);
+		ret = -EADDRINUSE;
+		goto out;
+	}
+
+	atomic_set(&addr->refcnt, 1);
+	INIT_HLIST_NODE(&addr->addr_node);
+	INIT_HLIST_NODE(&addr->table_node);
+
+	addr->sock = sk;
+
+	hlist_add_head(&addr->addr_node, &u->addr_list);
+	bus_insert_address(&bus_address_table[addr->hash], addr);
+
+out:
+	sock_put(sk);
+
+	return ret;
+}
+
+static int bus_del_addr(struct sock *sk, struct bus_addr *sbus_addr)
+{
+	struct bus_address *addr;
+	int ret = 0;
+
+	bus_state_lock(sk);
+	addr = __bus_get_address(sk, sbus_addr);
+	if (!addr) {
+		ret = -EINVAL;
+		bus_state_unlock(sk);
+		goto out;
+	}
+	hlist_del(&addr->addr_node);
+	bus_state_unlock(sk);
+
+	bus_remove_address(addr);
+	bus_release_addr(addr);
+out:
+	sock_put(sk);
+
+	return ret;
+}
+
+static int bus_join_bus(struct sock *sk)
+{
+	struct sock *peer;
+	struct bus_sock *u = bus_sk(sk), *peeru;
+	int err = 0;
+
+	peer = bus_peer_get(sk);
+	if (!peer)
+		return -ENOTCONN;
+	peeru = bus_sk(peer);
+
+	if (!u->bus_master_side || peeru->authenticated) {
+		err = -EINVAL;
+		goto sock_put_out;
+	}
+
+	if (sk->sk_state != BUS_ESTABLISHED) {
+		err = -ENOTCONN;
+		goto sock_put_out;
+	}
+
+	if (peer->sk_shutdown != 0) {
+		err = -ENOTCONN;
+		goto sock_put_out;
+	}
+
+	bus_state_lock(peer);
+	peeru->authenticated = true;
+	bus_state_unlock(peer);
+
+	spin_lock(&u->bus->lock);
+	hlist_add_head(&peeru->bus_node, &u->bus->peers);
+	spin_unlock(&u->bus->lock);
+
+sock_put_out:
+	sock_put(peer);
+	return err;
+}
+
+static int __bus_set_eavesdrop(struct sock *sk, bool eavesdrop)
+{
+	struct sock *peer = bus_peer_get(sk);
+	struct bus_sock *u = bus_sk(sk), *peeru;
+	int err = 0;
+
+	if (!peer)
+		return -ENOTCONN;
+
+	if (sk->sk_state != BUS_ESTABLISHED) {
+		err = -ENOTCONN;
+		goto sock_put_out;
+	}
+
+	peeru = bus_sk(peer);
+
+	if (!u->bus_master_side || !peeru->authenticated) {
+		err = -EINVAL;
+		goto sock_put_out;
+	}
+
+	if (peer->sk_shutdown != 0) {
+		err = -ENOTCONN;
+		goto sock_put_out;
+	}
+
+	bus_state_lock(peeru);
+	if (peeru->eavesdropper != eavesdrop) {
+		peeru->eavesdropper = eavesdrop;
+		if (eavesdrop)
+			atomic64_inc(&u->bus->eavesdropper_cnt);
+		else
+			atomic64_dec(&u->bus->eavesdropper_cnt);
+	}
+	bus_state_unlock(peeru);
+
+sock_put_out:
+	sock_put(peer);
+	return err;
+}
+
+static int bus_set_eavesdrop(struct sock *sk)
+{
+	return __bus_set_eavesdrop(sk, true);
+}
+
+static int bus_unset_eavesdrop(struct sock *sk)
+{
+	return __bus_set_eavesdrop(sk, false);
+}
+
+static inline void sk_sendbuf_set(struct sock *sk, int sndbuf)
+{
+	bus_state_lock(sk);
+	sk->sk_sndbuf = sndbuf;
+	bus_state_unlock(sk);
+}
+
+static inline void sk_maxqlen_set(struct sock *sk, int qlen)
+{
+	bus_state_lock(sk);
+	sk->sk_max_ack_backlog = qlen;
+	bus_state_unlock(sk);
+}
+
+static int bus_setsockopt(struct socket *sock, int level, int optname,
+			   char __user *optval, unsigned int optlen)
+{
+	struct bus_addr addr;
+	int res;
+	int val;
+
+	if (level != SOL_BUS)
+		return -ENOPROTOOPT;
+
+	switch (optname) {
+	case BUS_ADD_ADDR:
+	case BUS_DEL_ADDR:
+		if (optlen < sizeof(struct bus_addr))
+			return -EINVAL;
+
+		if (!bus_sk(sock->sk)->bus_master_side)
+			return -EINVAL;
+
+		if (copy_from_user(&addr, optval, sizeof(struct bus_addr)))
+			return -EFAULT;
+
+		if (optname == BUS_ADD_ADDR)
+			res = bus_add_addr(bus_peer_get(sock->sk), &addr);
+		else
+			res = bus_del_addr(bus_peer_get(sock->sk), &addr);
+		break;
+	case BUS_JOIN_BUS:
+		res = bus_join_bus(sock->sk);
+		break;
+	case BUS_SET_EAVESDROP:
+		res = bus_set_eavesdrop(sock->sk);
+		break;
+	case BUS_UNSET_EAVESDROP:
+		res = bus_unset_eavesdrop(sock->sk);
+		break;
+	case BUS_SET_SENDBUF:
+	case BUS_SET_MAXQLEN:
+		if (sock->sk->sk_state != BUS_LISTEN) {
+			res = -EINVAL;
+		} else {
+			res = -EFAULT;
+
+			if (copy_from_user(&val, optval, optlen))
+				break;
+
+			res = 0;
+
+			if (optname == BUS_SET_SENDBUF)
+				sk_sendbuf_set(sock->sk, val);
+			else
+				sk_maxqlen_set(sock->sk, val);
+		}
+		break;
+	default:
+		res = -EINVAL;
+		break;
+	}
+
+	return res;
+}
+
+long bus_inq_len(struct sock *sk)
+{
+	struct sk_buff *skb;
+	long amount = 0;
+
+	if (sk->sk_state == BUS_LISTEN)
+		return -EINVAL;
+
+	spin_lock(&sk->sk_receive_queue.lock);
+	skb_queue_walk(&sk->sk_receive_queue, skb)
+		amount += skb->len;
+	spin_unlock(&sk->sk_receive_queue.lock);
+
+	return amount;
+}
+EXPORT_SYMBOL_GPL(bus_inq_len);
+
+long bus_outq_len(struct sock *sk)
+{
+	return sk_wmem_alloc_get(sk);
+}
+EXPORT_SYMBOL_GPL(bus_outq_len);
+
+static int bus_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
+{
+	struct sock *sk = sock->sk;
+	long amount = 0;
+	int err;
+
+	switch (cmd) {
+	case SIOCOUTQ:
+		amount = bus_outq_len(sk);
+		err = put_user(amount, (int __user *)arg);
+		break;
+	case SIOCINQ:
+		amount = bus_inq_len(sk);
+		if (amount < 0)
+			err = amount;
+		else
+			err = put_user(amount, (int __user *)arg);
+		break;
+	default:
+		err = -ENOIOCTLCMD;
+		break;
+	}
+	return err;
+}
+
+static unsigned int bus_poll(struct file *file, struct socket *sock,
+				    poll_table *wait)
+{
+	struct sock *sk = sock->sk, *other;
+	unsigned int mask, writable;
+	struct bus_sock *u = bus_sk(sk), *p;
+	struct hlist_node *node;
+
+	sock_poll_wait(file, sk_sleep(sk), wait);
+	mask = 0;
+
+	/* exceptional events? */
+	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+		mask |= POLLERR;
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		mask |= POLLRDHUP | POLLIN | POLLRDNORM;
+	if (sk->sk_shutdown == SHUTDOWN_MASK)
+		mask |= POLLHUP;
+
+	/* readable? */
+	if (!skb_queue_empty(&sk->sk_receive_queue))
+		mask |= POLLIN | POLLRDNORM;
+
+	/* Connection-based need to check for termination and startup */
+	if (sk->sk_state == BUS_CLOSE)
+		mask |= POLLHUP;
+
+	/* No write status requested, avoid expensive OUT tests. */
+	if (!(poll_requested_events(wait) & (POLLWRBAND|POLLWRNORM|POLLOUT)))
+		return mask;
+
+	writable = bus_writable(sk);
+	other = bus_peer_get(sk);
+	if (other) {
+		if (bus_recvq_full(other))
+			writable = 0;
+		sock_put(other);
+	}
+
+	/*
+	 * If the socket has already joined the bus we have to check
+	 * that each peer receiver queue on the bus is not full.
+	 */
+	if (!u->bus_master_side && u->authenticated) {
+		spin_lock(&u->bus->lock);
+		hlist_for_each_entry(p, node, &u->bus->peers, bus_node) {
+			if (bus_recvq_full(&p->sk)) {
+				writable = 0;
+				break;
+			}
+		}
+		spin_unlock(&u->bus->lock);
+	}
+
+	if (writable)
+		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
+	else
+		set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
+
+	return mask;
+}
+
+#ifdef CONFIG_PROC_FS
+static struct sock *first_bus_socket(int *i)
+{
+	for (*i = 0; *i <= BUS_HASH_SIZE; (*i)++) {
+		if (!hlist_empty(&bus_socket_table[*i]))
+			return __sk_head(&bus_socket_table[*i]);
+	}
+	return NULL;
+}
+
+static struct sock *next_bus_socket(int *i, struct sock *s)
+{
+	struct sock *next = sk_next(s);
+	/* More in this chain? */
+	if (next)
+		return next;
+	/* Look for next non-empty chain. */
+	for ((*i)++; *i <= BUS_HASH_SIZE; (*i)++) {
+		if (!hlist_empty(&bus_socket_table[*i]))
+			return __sk_head(&bus_socket_table[*i]);
+	}
+	return NULL;
+}
+
+struct bus_iter_state {
+	struct seq_net_private p;
+	int i;
+};
+
+static struct sock *bus_seq_idx(struct seq_file *seq, loff_t pos)
+{
+	struct bus_iter_state *iter = seq->private;
+	loff_t off = 0;
+	struct sock *s;
+
+	for (s = first_bus_socket(&iter->i); s;
+	     s = next_bus_socket(&iter->i, s)) {
+		if (sock_net(s) != seq_file_net(seq))
+			continue;
+		if (off == pos)
+			return s;
+		++off;
+	}
+	return NULL;
+}
+
+static void *bus_seq_start(struct seq_file *seq, loff_t *pos)
+	__acquires(bus_table_lock)
+{
+	spin_lock(&bus_table_lock);
+	return *pos ? bus_seq_idx(seq, *pos - 1) : SEQ_START_TOKEN;
+}
+
+static void *bus_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bus_iter_state *iter = seq->private;
+	struct sock *sk = v;
+	++*pos;
+
+	if (v == SEQ_START_TOKEN)
+		sk = first_bus_socket(&iter->i);
+	else
+		sk = next_bus_socket(&iter->i, sk);
+	while (sk && (sock_net(sk) != seq_file_net(seq)))
+		sk = next_bus_socket(&iter->i, sk);
+	return sk;
+}
+
+static void bus_seq_stop(struct seq_file *seq, void *v)
+	__releases(bus_table_lock)
+{
+	spin_unlock(&bus_table_lock);
+}
+
+static int bus_seq_show(struct seq_file *seq, void *v)
+{
+
+	if (v == SEQ_START_TOKEN)
+		seq_puts(seq, "Num       RefCount Protocol Flags    Type St " \
+			 "Inode Path\n");
+	else {
+		struct sock *s = v;
+		struct bus_sock *u = bus_sk(s);
+		bus_state_lock(s);
+
+		seq_printf(seq, "%pK: %08X %08X %08X %04X %02X %5lu",
+			s,
+			atomic_read(&s->sk_refcnt),
+			0,
+			s->sk_state == BUS_LISTEN ? __SO_ACCEPTCON : 0,
+			s->sk_type,
+			s->sk_socket ?
+			(s->sk_state == BUS_ESTABLISHED ? SS_CONNECTED : SS_UNCONNECTED) :
+			(s->sk_state == BUS_ESTABLISHED ? SS_CONNECTING : SS_DISCONNECTING),
+			sock_i_ino(s));
+
+		if (u->addr) {
+			int i, len;
+			seq_putc(seq, ' ');
+
+			i = 0;
+			len = u->addr->len - sizeof(short);
+			if (!BUS_ABSTRACT(s))
+				len--;
+			else {
+				seq_putc(seq, '@');
+				i++;
+			}
+			for ( ; i < len; i++)
+				seq_putc(seq, u->addr->name->sbus_path[i]);
+		}
+		bus_state_unlock(s);
+		seq_putc(seq, '\n');
+	}
+
+	return 0;
+}
+
+static const struct seq_operations bus_seq_ops = {
+	.start  = bus_seq_start,
+	.next   = bus_seq_next,
+	.stop   = bus_seq_stop,
+	.show   = bus_seq_show,
+};
+
+static int bus_seq_open(struct inode *inode, struct file *file)
+{
+	return seq_open_net(inode, file, &bus_seq_ops,
+			    sizeof(struct bus_iter_state));
+}
+
+static const struct file_operations bus_seq_fops = {
+	.owner		= THIS_MODULE,
+	.open		= bus_seq_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release_net,
+};
+
+#endif
+
+static const struct net_proto_family bus_family_ops = {
+	.family = PF_BUS,
+	.create = bus_create,
+	.owner	= THIS_MODULE,
+};
+
+static int __init af_bus_init(void)
+{
+	int rc = -1;
+	struct sk_buff *dummy_skb;
+
+	BUILD_BUG_ON(sizeof(struct bus_skb_parms) > sizeof(dummy_skb->cb));
+
+	rc = proto_register(&bus_proto, 1);
+	if (rc != 0) {
+		pr_crit("%s: Cannot create bus_sock SLAB cache!\n", __func__);
+		return rc;
+	}
+
+	sock_register(&bus_family_ops);
+	return rc;
+}
+
+static void __exit af_bus_exit(void)
+{
+	sock_unregister(PF_BUS);
+	proto_unregister(&bus_proto);
+}
+
+module_init(af_bus_init);
+module_exit(af_bus_exit);
+
+MODULE_AUTHOR("Alban Crequy, Javier Martinez Canillas");
+MODULE_DESCRIPTION("Linux Bus domain sockets");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(PF_BUS);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets.
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (7 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 08/15] net: bus: Add implementation of Bus domain sockets Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-07-02 17:44   ` Ben Hutchings
  2012-06-29 16:45 ` [PATCH net-next 10/15] net: bus: Add the AF_BUS socket address family to KBuild Vincent Sanders
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

This patch adds a garbage collector for AF_BUS sockets.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 net/bus/garbage.c |  322 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 322 insertions(+)
 create mode 100644 net/bus/garbage.c

diff --git a/net/bus/garbage.c b/net/bus/garbage.c
new file mode 100644
index 0000000..2435f38
--- /dev/null
+++ b/net/bus/garbage.c
@@ -0,0 +1,322 @@
+/*
+ * Garbage Collector For AF_BUS sockets
+ *
+ * Based on Garbage Collector For AF_UNIX sockets (net/unix/garbage.c).
+ */
+
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/socket.h>
+#include <linux/un.h>
+#include <linux/net.h>
+#include <linux/fs.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/file.h>
+#include <linux/proc_fs.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+
+#include <net/sock.h>
+#include <net/af_bus.h>
+#include <net/scm.h>
+#include <net/tcp_states.h>
+
+/* Internal data structures and random procedures: */
+
+static LIST_HEAD(gc_inflight_list);
+static LIST_HEAD(gc_candidates);
+static DEFINE_SPINLOCK(bus_gc_lock);
+static DECLARE_WAIT_QUEUE_HEAD(bus_gc_wait);
+
+unsigned int bus_tot_inflight;
+
+
+struct sock *bus_get_socket(struct file *filp)
+{
+	struct sock *u_sock = NULL;
+	struct inode *inode = filp->f_path.dentry->d_inode;
+
+	/*
+	 *	Socket ?
+	 */
+	if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) {
+		struct socket *sock = SOCKET_I(inode);
+		struct sock *s = sock->sk;
+
+		/*
+		 *	PF_BUS ?
+		 */
+		if (s && sock->ops && sock->ops->family == PF_BUS)
+			u_sock = s;
+	}
+	return u_sock;
+}
+
+/*
+ *	Keep the number of times in flight count for the file
+ *	descriptor if it is for an AF_BUS socket.
+ */
+
+void bus_inflight(struct file *fp)
+{
+	struct sock *s = bus_get_socket(fp);
+	if (s) {
+		struct bus_sock *u = bus_sk(s);
+		spin_lock(&bus_gc_lock);
+		if (atomic_long_inc_return(&u->inflight) == 1) {
+			BUG_ON(!list_empty(&u->link));
+			list_add_tail(&u->link, &gc_inflight_list);
+		} else {
+			BUG_ON(list_empty(&u->link));
+		}
+		bus_tot_inflight++;
+		spin_unlock(&bus_gc_lock);
+	}
+}
+
+void bus_notinflight(struct file *fp)
+{
+	struct sock *s = bus_get_socket(fp);
+	if (s) {
+		struct bus_sock *u = bus_sk(s);
+		spin_lock(&bus_gc_lock);
+		BUG_ON(list_empty(&u->link));
+		if (atomic_long_dec_and_test(&u->inflight))
+			list_del_init(&u->link);
+		bus_tot_inflight--;
+		spin_unlock(&bus_gc_lock);
+	}
+}
+
+static void scan_inflight(struct sock *x, void (*func)(struct bus_sock *),
+			  struct sk_buff_head *hitlist)
+{
+	struct sk_buff *skb;
+	struct sk_buff *next;
+
+	spin_lock(&x->sk_receive_queue.lock);
+	skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
+		/*
+		 *	Do we have file descriptors ?
+		 */
+		if (BUSCB(skb).fp) {
+			bool hit = false;
+			/*
+			 *	Process the descriptors of this socket
+			 */
+			int nfd = BUSCB(skb).fp->count;
+			struct file **fp = BUSCB(skb).fp->fp;
+			while (nfd--) {
+				/*
+				 *	Get the socket the fd matches
+				 *	if it indeed does so
+				 */
+				struct sock *sk = bus_get_socket(*fp++);
+				if (sk) {
+					struct bus_sock *u = bus_sk(sk);
+
+					/*
+					 * Ignore non-candidates, they could
+					 * have been added to the queues after
+					 * starting the garbage collection
+					 */
+					if (u->gc_candidate) {
+						hit = true;
+						func(u);
+					}
+				}
+			}
+			if (hit && hitlist != NULL) {
+				__skb_unlink(skb, &x->sk_receive_queue);
+				__skb_queue_tail(hitlist, skb);
+			}
+		}
+	}
+	spin_unlock(&x->sk_receive_queue.lock);
+}
+
+static void scan_children(struct sock *x, void (*func)(struct bus_sock *),
+			  struct sk_buff_head *hitlist)
+{
+	if (x->sk_state != TCP_LISTEN)
+		scan_inflight(x, func, hitlist);
+	else {
+		struct sk_buff *skb;
+		struct sk_buff *next;
+		struct bus_sock *u;
+		LIST_HEAD(embryos);
+
+		/*
+		 * For a listening socket collect the queued embryos
+		 * and perform a scan on them as well.
+		 */
+		spin_lock(&x->sk_receive_queue.lock);
+		skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
+			u = bus_sk(skb->sk);
+
+			/*
+			 * An embryo cannot be in-flight, so it's safe
+			 * to use the list link.
+			 */
+			BUG_ON(!list_empty(&u->link));
+			list_add_tail(&u->link, &embryos);
+		}
+		spin_unlock(&x->sk_receive_queue.lock);
+
+		while (!list_empty(&embryos)) {
+			u = list_entry(embryos.next, struct bus_sock, link);
+			scan_inflight(&u->sk, func, hitlist);
+			list_del_init(&u->link);
+		}
+	}
+}
+
+static void dec_inflight(struct bus_sock *usk)
+{
+	atomic_long_dec(&usk->inflight);
+}
+
+static void inc_inflight(struct bus_sock *usk)
+{
+	atomic_long_inc(&usk->inflight);
+}
+
+static void inc_inflight_move_tail(struct bus_sock *u)
+{
+	atomic_long_inc(&u->inflight);
+	/*
+	 * If this still might be part of a cycle, move it to the end
+	 * of the list, so that it's checked even if it was already
+	 * passed over
+	 */
+	if (u->gc_maybe_cycle)
+		list_move_tail(&u->link, &gc_candidates);
+}
+
+static bool gc_in_progress = false;
+#define BUS_INFLIGHT_TRIGGER_GC 16000
+
+void wait_for_bus_gc(void)
+{
+	/*
+	 * If number of inflight sockets is insane,
+	 * force a garbage collect right now.
+	 */
+	if (bus_tot_inflight > BUS_INFLIGHT_TRIGGER_GC && !gc_in_progress)
+		bus_gc();
+	wait_event(bus_gc_wait, gc_in_progress == false);
+}
+
+/* The external entry point: bus_gc() */
+void bus_gc(void)
+{
+	struct bus_sock *u;
+	struct bus_sock *next;
+	struct sk_buff_head hitlist;
+	struct list_head cursor;
+	LIST_HEAD(not_cycle_list);
+
+	spin_lock(&bus_gc_lock);
+
+	/* Avoid a recursive GC. */
+	if (gc_in_progress)
+		goto out;
+
+	gc_in_progress = true;
+	/*
+	 * First, select candidates for garbage collection.  Only
+	 * in-flight sockets are considered, and from those only ones
+	 * which don't have any external reference.
+	 *
+	 * Holding bus_gc_lock will protect these candidates from
+	 * being detached, and hence from gaining an external
+	 * reference.  Since there are no possible receivers, all
+	 * buffers currently on the candidates' queues stay there
+	 * during the garbage collection.
+	 *
+	 * We also know that no new candidate can be added onto the
+	 * receive queues.  Other, non candidate sockets _can_ be
+	 * added to queue, so we must make sure only to touch
+	 * candidates.
+	 */
+	list_for_each_entry_safe(u, next, &gc_inflight_list, link) {
+		long total_refs;
+		long inflight_refs;
+
+		total_refs = file_count(u->sk.sk_socket->file);
+		inflight_refs = atomic_long_read(&u->inflight);
+
+		BUG_ON(inflight_refs < 1);
+		BUG_ON(total_refs < inflight_refs);
+		if (total_refs == inflight_refs) {
+			list_move_tail(&u->link, &gc_candidates);
+			u->gc_candidate = 1;
+			u->gc_maybe_cycle = 1;
+		}
+	}
+
+	/*
+	 * Now remove all internal in-flight reference to children of
+	 * the candidates.
+	 */
+	list_for_each_entry(u, &gc_candidates, link)
+		scan_children(&u->sk, dec_inflight, NULL);
+
+	/*
+	 * Restore the references for children of all candidates,
+	 * which have remaining references.  Do this recursively, so
+	 * only those remain, which form cyclic references.
+	 *
+	 * Use a "cursor" link, to make the list traversal safe, even
+	 * though elements might be moved about.
+	 */
+	list_add(&cursor, &gc_candidates);
+	while (cursor.next != &gc_candidates) {
+		u = list_entry(cursor.next, struct bus_sock, link);
+
+		/* Move cursor to after the current position. */
+		list_move(&cursor, &u->link);
+
+		if (atomic_long_read(&u->inflight) > 0) {
+			list_move_tail(&u->link, &not_cycle_list);
+			u->gc_maybe_cycle = 0;
+			scan_children(&u->sk, inc_inflight_move_tail, NULL);
+		}
+	}
+	list_del(&cursor);
+
+	/*
+	 * not_cycle_list contains those sockets which do not make up a
+	 * cycle.  Restore these to the inflight list.
+	 */
+	while (!list_empty(&not_cycle_list)) {
+		u = list_entry(not_cycle_list.next, struct bus_sock, link);
+		u->gc_candidate = 0;
+		list_move_tail(&u->link, &gc_inflight_list);
+	}
+
+	/*
+	 * Now gc_candidates contains only garbage.  Restore original
+	 * inflight counters for these as well, and remove the skbuffs
+	 * which are creating the cycle(s).
+	 */
+	skb_queue_head_init(&hitlist);
+	list_for_each_entry(u, &gc_candidates, link)
+	scan_children(&u->sk, inc_inflight, &hitlist);
+
+	spin_unlock(&bus_gc_lock);
+
+	/* Here we are. Hitlist is filled. Die. */
+	__skb_queue_purge(&hitlist);
+
+	spin_lock(&bus_gc_lock);
+
+	/* All candidates should have been detached by now. */
+	BUG_ON(!list_empty(&gc_candidates));
+	gc_in_progress = false;
+	wake_up(&bus_gc_wait);
+
+ out:
+	spin_unlock(&bus_gc_lock);
+}
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 10/15] net: bus: Add the AF_BUS socket address family to KBuild
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (8 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 11/15] netlink: connector: implement cn_netlink_reply Vincent Sanders
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Vincent Sanders

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

This patch adds the AF_BUS code to the Linux Kernel build system.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
---
 net/Kconfig      |    1 +
 net/Makefile     |    1 +
 net/bus/Kconfig  |   15 +++++++++++++++
 net/bus/Makefile |    7 +++++++
 4 files changed, 24 insertions(+)
 create mode 100644 net/bus/Kconfig
 create mode 100644 net/bus/Makefile

diff --git a/net/Kconfig b/net/Kconfig
index 245831b..339a630 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -47,6 +47,7 @@ menu "Networking options"
 
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
+source "net/bus/Kconfig"
 source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 
diff --git a/net/Makefile b/net/Makefile
index 4f4ee08..ad0e900 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX)		+= unix/
+obj-$(CONFIG_AF_BUS)		+= bus/
 obj-$(CONFIG_NET)		+= ipv6/
 obj-$(CONFIG_PACKET)		+= packet/
 obj-$(CONFIG_NET_KEY)		+= key/
diff --git a/net/bus/Kconfig b/net/bus/Kconfig
new file mode 100644
index 0000000..5f01410
--- /dev/null
+++ b/net/bus/Kconfig
@@ -0,0 +1,15 @@
+#
+# Bus Domain Sockets
+#
+
+config AF_BUS
+	tristate "Bus domain sockets (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	---help---
+	  If you say Y here, you will include support for Bus domain sockets.
+	  These sockets are used to create communication buses for IPC.
+
+	  To compile this driver as a module, choose M here: the module will be
+	  called bus.
+
+	  Say N unless you know what you are doing.
diff --git a/net/bus/Makefile b/net/bus/Makefile
new file mode 100644
index 0000000..8c1fea2
--- /dev/null
+++ b/net/bus/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the Linux bus domain socket layer.
+#
+
+obj-$(CONFIG_AF_BUS)	+= af-bus.o
+
+af-bus-y		:= af_bus.o garbage.o
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 11/15] netlink: connector: implement cn_netlink_reply
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (9 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 10/15] net: bus: Add the AF_BUS socket address family to KBuild Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 12/15] netlink: connector: Add idx and val identifiers for netfilter D-Bus Vincent Sanders
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller; +Cc: Alban Crequy

From: Alban Crequy <alban.crequy@collabora.co.uk>

In a connector callback, it was not possible to reply to a message only to a
sender. This patch implements cn_netlink_reply(). It uses the connector socket
to send an unicast netlink message back to the sender.

The following pseudo-code can be used from a connector callback:

        struct cn_msg *cn_reply;
        cn_reply = kzalloc(sizeof(struct cn_msg)
                + sizeof(struct ..._nl_cfg_reply), GFP_KERNEL);

        cn_reply->id = msg->id;
        cn_reply->seq = msg->seq;
        cn_reply->ack = msg->ack  + 1;
        cn_reply->len = sizeof(struct ..._nl_cfg_reply);
        cn_reply->flags = 0;

        rr = cn_netlink_reply(cn_reply, nsp->pid, GFP_KERNEL);

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
 drivers/connector/connector.c |   32 ++++++++++++++++++++++++++++++++
 include/linux/connector.h     |    1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 34e0e9e..a728d33 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -118,6 +118,38 @@ int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t gfp_mask)
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
+ * Send an unicast reply from a connector callback
+ *
+ */
+int cn_netlink_reply(struct cn_msg *msg, u32 pid, gfp_t gfp_mask)
+{
+	unsigned int size;
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct cn_msg *data;
+	struct cn_dev *dev = &cdev;
+
+	size = NLMSG_SPACE(sizeof(*msg) + msg->len);
+
+	skb = alloc_skb(size, gfp_mask);
+	if (!skb)
+		return -ENOMEM;
+
+	nlh = nlmsg_put(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh), 0);
+	if (nlh == NULL) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	data = nlmsg_data(nlh);
+
+	memcpy(data, msg, sizeof(*data) + msg->len);
+
+	return netlink_unicast(dev->nls, skb, pid, 1);
+}
+EXPORT_SYMBOL_GPL(cn_netlink_reply);
+
+/*
  * Callback helper - queues work and setup destructor for given data.
  */
 static int cn_call_callback(struct sk_buff *skb)
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 7638407..c27be60 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -125,6 +125,7 @@ int cn_add_callback(struct cb_id *id, const char *name,
 		    void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
 void cn_del_callback(struct cb_id *);
 int cn_netlink_send(struct cn_msg *, u32, gfp_t);
+int cn_netlink_reply(struct cn_msg *, u32, gfp_t);
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
 			  struct cb_id *id,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 12/15] netlink: connector: Add idx and val identifiers for netfilter D-Bus
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (10 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 11/15] netlink: connector: implement cn_netlink_reply Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing Vincent Sanders
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Alban Crequy

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

The D-bus IPC system implements a transport that uses AF_BUS sockets to
send D-Bus messages to the peers. This allows decouple the routing logic
from the daemon and move it to the kernel which has the advantage of
reducing the number of context switches and the messages copied to
user-space.

A D-Bus protocol aware netfilter module decide which peer can recive a
given message based on a set of D-Bus match rules. These match rules
are set from user-space using the netlink connector API.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
 include/linux/connector.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/connector.h b/include/linux/connector.h
index c27be60..519d010 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -44,8 +44,10 @@
 #define CN_VAL_DRBD			0x1
 #define CN_KVP_IDX			0x9	/* HyperV KVP */
 #define CN_KVP_VAL			0x1	/* queries from the kernel */
+#define CN_IDX_NFDBUS                   0xA     /* netfilter D-Bus */
+#define CN_VAL_NFDBUS                   0x1
 
-#define CN_NETLINK_USERS		10	/* Highest index + 1 */
+#define CN_NETLINK_USERS		11	/* Highest index + 1 */
 
 /*
  * Maximum connector's message size.
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (11 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 12/15] netlink: connector: Add idx and val identifiers for netfilter D-Bus Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 17:11   ` Pablo Neira Ayuso
  2012-06-29 16:45 ` [PATCH net-next 14/15] netfilter: nfdbus: Add D-bus match rule implementation Vincent Sanders
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Alban Crequy

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

The netfilter D-Bus module needs to parse D-bus messages sent by
applications to decide whether a peer can receive or not a D-Bus
message. Add D-bus message parsing logic to be able to analyze.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
 net/netfilter/nfdbus/message.c |  194 ++++++++++++++++++++++++++++++++++++++++
 net/netfilter/nfdbus/message.h |   71 +++++++++++++++
 2 files changed, 265 insertions(+)
 create mode 100644 net/netfilter/nfdbus/message.c
 create mode 100644 net/netfilter/nfdbus/message.h

diff --git a/net/netfilter/nfdbus/message.c b/net/netfilter/nfdbus/message.c
new file mode 100644
index 0000000..93c409c
--- /dev/null
+++ b/net/netfilter/nfdbus/message.c
@@ -0,0 +1,194 @@
+/*
+ * message.c  Basic D-Bus message parsing
+ *
+ * Copyright (C) 2010-2012  Collabora Ltd
+ * Authors:	Alban Crequy <alban.crequy@collabora.co.uk>
+ * Copyright (C) 2002, 2003, 2004, 2005  Red Hat Inc.
+ * Copyright (C) 2002, 2003  CodeFactory AB
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include <linux/slab.h>
+
+#include "message.h"
+
+int dbus_message_type_from_string(const char *type_str)
+{
+	if (strcmp(type_str, "method_call") == 0)
+		return DBUS_MESSAGE_TYPE_METHOD_CALL;
+	if (strcmp(type_str, "method_return") == 0)
+		return DBUS_MESSAGE_TYPE_METHOD_RETURN;
+	else if (strcmp(type_str, "signal") == 0)
+		return DBUS_MESSAGE_TYPE_SIGNAL;
+	else if (strcmp(type_str, "error") == 0)
+		return DBUS_MESSAGE_TYPE_ERROR;
+	else
+		return DBUS_MESSAGE_TYPE_INVALID;
+}
+
+int dbus_message_parse(unsigned char *message, size_t len,
+		       struct dbus_message *dbus_message)
+{
+	unsigned char *cur;
+	int array_header_len;
+
+	dbus_message->message = message;
+
+	if (len < 4 + 4 + 4 + 4 || message[1] == 0 || message[1] > 4)
+		return -EINVAL;
+
+	dbus_message->type = message[1];
+	dbus_message->body_length = *((u32 *)(message + 4));
+	cur = message + 12;
+	array_header_len = *(u32 *)cur;
+	dbus_message->len_offset = 12;
+	cur += 4;
+	while (cur < message + len
+	       && cur < message + 12 + 4 + array_header_len) {
+		int header_code;
+		int signature_len;
+		unsigned char *signature;
+		int str_len;
+		unsigned char *str;
+
+		/* D-Bus alignment craziness */
+		if ((cur - message) % 8 != 0)
+			cur += 8 - (cur - message) % 8;
+
+		header_code = *(char *)cur;
+		cur++;
+		signature_len = *(char *)cur;
+		/* All header fields of the current D-Bus spec have a simple
+		 * type, either o, s, g, or u */
+		if (signature_len != 1)
+			return -EINVAL;
+		cur++;
+		signature = cur;
+		cur += signature_len + 1;
+		if (signature[0] != 'o' &&
+		    signature[0] != 's' &&
+		    signature[0] != 'g' &&
+		    signature[0] != 'u')
+			return -EINVAL;
+
+		if (signature[0] == 'u') {
+			cur += 4;
+			continue;
+		}
+
+		if (signature[0] != 'g') {
+			str_len = *(u32 *)cur;
+			cur += 4;
+		} else {
+			str_len = *(char *)cur;
+			cur += 1;
+		}
+
+		str = cur;
+		switch (header_code) {
+		case 1:
+			dbus_message->path = str;
+			break;
+		case 2:
+			dbus_message->interface = str;
+			break;
+		case 3:
+			dbus_message->member = str;
+			break;
+		case 6:
+			dbus_message->destination = str;
+			break;
+		case 7:
+			dbus_message->sender = str;
+			break;
+		case 8:
+			dbus_message->body_signature = str;
+			break;
+		}
+		cur += str_len + 1;
+	}
+
+	dbus_message->padding_end = (8 - (cur - message) % 8) % 8;
+
+	/* Jump to body D-Bus alignment craziness */
+	if ((cur - message) % 8 != 0)
+		cur += 8 - (cur - message) % 8;
+	dbus_message->new_header_offset = cur - message;
+
+	if (dbus_message->new_header_offset
+	    + dbus_message->body_length != len) {
+		pr_warn("Message truncated? " \
+			"Header %d + Body %d != Length %zd\n",
+			dbus_message->new_header_offset,
+			dbus_message->body_length, len);
+		return -EINVAL;
+	}
+
+	if (dbus_message->body_signature &&
+	    dbus_message->body_signature[0] == 's') {
+		int str_len;
+		str_len = *(u32 *)cur;
+		cur += 4;
+		dbus_message->arg0 = cur;
+		cur += str_len + 1;
+	}
+
+	if ((cur - message) % 4 != 0)
+		cur += 4 - (cur - message) % 4;
+
+	if (dbus_message->body_signature &&
+	    dbus_message->body_signature[0] == 's' &&
+	    dbus_message->body_signature[1] == 's') {
+		int str_len;
+		str_len = *(u32 *)cur;
+		cur += 4;
+		dbus_message->arg1 = cur;
+		cur += str_len + 1;
+	}
+
+	if ((cur - message) % 4 != 0)
+		cur += 4 - (cur - message) % 4;
+
+	if (dbus_message->body_signature &&
+	    dbus_message->body_signature[0] == 's' &&
+	    dbus_message->body_signature[1] == 's' &&
+	    dbus_message->body_signature[2] == 's') {
+		int str_len;
+		str_len = *(u32 *)cur;
+		cur += 4;
+		dbus_message->arg2 = cur;
+		cur += str_len + 1;
+	}
+
+	if ((cur - message) % 4 != 0)
+		cur += 4 - (cur - message) % 4;
+
+	if (dbus_message->type == DBUS_MESSAGE_TYPE_SIGNAL &&
+	    dbus_message->sender && dbus_message->path &&
+	    dbus_message->interface && dbus_message->member &&
+	    dbus_message->arg0 &&
+	    strcmp(dbus_message->sender, "org.freedesktop.DBus") == 0 &&
+	    strcmp(dbus_message->interface, "org.freedesktop.DBus") == 0 &&
+	    strcmp(dbus_message->path, "/org/freedesktop/DBus") == 0) {
+		if (strcmp(dbus_message->member, "NameAcquired") == 0)
+			dbus_message->name_acquired = dbus_message->arg0;
+		else if (strcmp(dbus_message->member, "NameLost") == 0)
+			dbus_message->name_lost = dbus_message->arg0;
+	}
+
+	return 0;
+}
diff --git a/net/netfilter/nfdbus/message.h b/net/netfilter/nfdbus/message.h
new file mode 100644
index 0000000..e3ea4d3
--- /dev/null
+++ b/net/netfilter/nfdbus/message.h
@@ -0,0 +1,71 @@
+/*
+ * message.h  Basic D-Bus message parsing
+ *
+ * Copyright (C) 2010  Collabora Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef DBUS_MESSAGE_H
+#define DBUS_MESSAGE_H
+
+#include <linux/list.h>
+
+#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
+
+/* Types of message */
+
+#define DBUS_MESSAGE_TYPE_INVALID       0
+#define DBUS_MESSAGE_TYPE_METHOD_CALL   1
+#define DBUS_MESSAGE_TYPE_METHOD_RETURN 2
+#define DBUS_MESSAGE_TYPE_ERROR         3
+#define DBUS_MESSAGE_TYPE_SIGNAL        4
+#define DBUS_NUM_MESSAGE_TYPES          5
+
+/* No need to implement a feature-complete parser. It only implement what is
+ * needed by the bus. */
+struct dbus_message {
+	char *message;
+	size_t len;
+	size_t new_len;
+
+	/* direct pointers to the fields */
+	int type;
+	char *path;
+	char *interface;
+	char *member;
+	char *destination;
+	char *sender;
+	char *body_signature;
+	int body_length;
+	char *arg0;
+	char *arg1;
+	char *arg2;
+	char *name_acquired;
+	char *name_lost;
+
+	/* How to add the 'sender' field in the headers */
+	int new_header_offset;
+	int len_offset;
+	int padding_end;
+};
+
+int dbus_message_type_from_string(const char *type_str);
+
+int dbus_message_parse(unsigned char *message, size_t len,
+		       struct dbus_message *dbus_message);
+
+#endif /* DBUS_MESSAGE_H */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 14/15] netfilter: nfdbus: Add D-bus match rule implementation
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (12 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 16:45 ` [PATCH net-next 15/15] netfilter: add netfilter D-Bus module Vincent Sanders
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller
  Cc: Javier Martinez Canillas, Alban Crequy

From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>

The D-Bus netfilter module needs to decode D-Bus match rules to decide
if a given peer can receive or not a D-Bus message. Add a match rule
implementation to be used by the netfilter D-Bus module.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
 net/netfilter/nfdbus/matchrule.c | 1132 ++++++++++++++++++++++++++++++++++++++
 net/netfilter/nfdbus/matchrule.h |   82 +++
 2 files changed, 1214 insertions(+)
 create mode 100644 net/netfilter/nfdbus/matchrule.c
 create mode 100644 net/netfilter/nfdbus/matchrule.h

diff --git a/net/netfilter/nfdbus/matchrule.c b/net/netfilter/nfdbus/matchrule.c
new file mode 100644
index 0000000..4106bd5
--- /dev/null
+++ b/net/netfilter/nfdbus/matchrule.c
@@ -0,0 +1,1132 @@
+/*
+ * matchrule.c  D-Bus match rule implementation
+ *
+ * Based on signals.c from dbus
+ *
+ * Copyright (C) 2010  Collabora, Ltd.
+ * Copyright (C) 2003, 2005  Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include "matchrule.h"
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+
+#include "message.h"
+
+enum bus_match_flags {
+	BUS_MATCH_MESSAGE_TYPE            = 1 << 0,
+	BUS_MATCH_INTERFACE               = 1 << 1,
+	BUS_MATCH_MEMBER                  = 1 << 2,
+	BUS_MATCH_SENDER                  = 1 << 3,
+	BUS_MATCH_DESTINATION             = 1 << 4,
+	BUS_MATCH_PATH                    = 1 << 5,
+	BUS_MATCH_ARGS                    = 1 << 6,
+	BUS_MATCH_PATH_NAMESPACE          = 1 << 7,
+	BUS_MATCH_CLIENT_IS_EAVESDROPPING = 1 << 8
+};
+
+struct bus_match_rule {
+	/* For debugging only*/
+	char *rule_text;
+
+	unsigned int flags; /**< BusMatchFlags */
+
+	int   message_type;
+	char *interface;
+	char *member;
+	char *sender;
+	char *destination;
+	char *path;
+
+	unsigned int *arg_lens;
+	char **args;
+	int args_len;
+
+	/* bus_match_rule is attached to rule_pool, either in a simple
+	 * double-linked list if the rule does not have any interface, or in a
+	 * red-black tree sorted by interface. If several rules can have the
+	 * same interface, the first one is attached with struct rb_node and the
+	 * next ones are in the list
+	 */
+
+	struct rb_node node;
+	/* Doubly-linked non-circular list. If the rule has an interface, it is
+	 * in the rb tree and the single head is right here. Otherwise, the
+	 * single head is in rule_pool->rules_without_iface. With this data
+	 * structure, we don't need any allocation to insert or remove the rule.
+	 */
+	struct hlist_head first;
+	struct hlist_node list;
+
+	/* used to delete all names from the tree */
+	struct list_head del_list;
+};
+
+struct dbus_name {
+	struct rb_node node;
+	char *name;
+
+	/* used to delete all names from the tree */
+	struct list_head del_list;
+};
+
+#define BUS_MATCH_ARG_IS_PATH  0x8000000u
+
+#define DBUS_STRING_MAX_LENGTH 1024
+
+/** Max length of a match rule string; to keep people from hosing the
+ * daemon with some huge rule
+ */
+#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
+
+struct bus_match_rule *bus_match_rule_new(gfp_t gfp_flags)
+{
+	struct bus_match_rule *rule;
+
+	rule = kzalloc(sizeof(struct bus_match_rule), gfp_flags);
+	if (rule == NULL)
+		return NULL;
+
+	return rule;
+}
+
+void bus_match_rule_free(struct bus_match_rule *rule)
+{
+	kfree(rule->rule_text);
+	kfree(rule->interface);
+	kfree(rule->member);
+	kfree(rule->sender);
+	kfree(rule->destination);
+	kfree(rule->path);
+	kfree(rule->arg_lens);
+
+	/* can't use dbus_free_string_array() since there
+	 * are embedded NULL
+	 */
+	if (rule->args) {
+		int i;
+
+		i = 0;
+		while (i < rule->args_len) {
+			kfree(rule->args[i]);
+			++i;
+		}
+
+		kfree(rule->args);
+	}
+
+	kfree(rule);
+}
+
+static int
+bus_match_rule_set_message_type(struct bus_match_rule *rule,
+				int type,
+				gfp_t gfp_flags)
+{
+	rule->flags |= BUS_MATCH_MESSAGE_TYPE;
+
+	rule->message_type = type;
+
+	return 1;
+}
+
+static int
+bus_match_rule_set_interface(struct bus_match_rule *rule,
+			     const char *interface,
+			     gfp_t gfp_flags)
+{
+	char *new;
+
+	WARN_ON(!interface);
+
+	new = kstrdup(interface, gfp_flags);
+	if (new == NULL)
+		return 0;
+
+	rule->flags |= BUS_MATCH_INTERFACE;
+	kfree(rule->interface);
+	rule->interface = new;
+
+	return 1;
+}
+
+static int
+bus_match_rule_set_member(struct bus_match_rule *rule,
+			  const char *member,
+			  gfp_t gfp_flags)
+{
+	char *new;
+
+	WARN_ON(!member);
+
+	new = kstrdup(member, gfp_flags);
+	if (new == NULL)
+		return 0;
+
+	rule->flags |= BUS_MATCH_MEMBER;
+	kfree(rule->member);
+	rule->member = new;
+
+	return 1;
+}
+
+static int
+bus_match_rule_set_sender(struct bus_match_rule *rule,
+			  const char *sender,
+			  gfp_t gfp_flags)
+{
+	char *new;
+
+	WARN_ON(!sender);
+
+	new = kstrdup(sender, gfp_flags);
+	if (new == NULL)
+		return 0;
+
+	rule->flags |= BUS_MATCH_SENDER;
+	kfree(rule->sender);
+	rule->sender = new;
+
+	return 1;
+}
+
+static int
+bus_match_rule_set_destination(struct bus_match_rule *rule,
+			       const char   *destination,
+			       gfp_t gfp_flags)
+{
+	char *new;
+
+	WARN_ON(!destination);
+
+	new = kstrdup(destination, gfp_flags);
+	if (new == NULL)
+		return 0;
+
+	rule->flags |= BUS_MATCH_DESTINATION;
+	kfree(rule->destination);
+	rule->destination = new;
+
+	return 1;
+}
+
+#define ISWHITE(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n') || \
+		    ((c) == '\r'))
+
+static int find_key(const char *str, int start, char *key, int *value_pos)
+{
+	const char *p;
+	const char *s;
+	const char *key_start;
+	const char *key_end;
+
+	s = str;
+
+	p = s + start;
+
+	while (*p && ISWHITE(*p))
+		++p;
+
+	key_start = p;
+
+	while (*p && *p != '=' && !ISWHITE(*p))
+		++p;
+
+	key_end = p;
+
+	while (*p && ISWHITE(*p))
+		++p;
+
+	if (key_start == key_end) {
+		/* Empty match rules or trailing whitespace are OK */
+		*value_pos = p - s;
+		return 1;
+	}
+
+	if (*p != '=') {
+		pr_warn("Match rule has a key with no subsequent '=' character");
+		return 0;
+	}
+	++p;
+
+	strncat(key, key_start, key_end - key_start);
+
+	*value_pos = p - s;
+
+	return 1;
+}
+
+static int find_value(const char *str, int start, const char *key, char *value,
+		      int *value_end)
+{
+	const char *p;
+	const char *s;
+	char quote_char;
+	int orig_len;
+
+	orig_len = strlen(value);
+
+	s = str;
+
+	p = s + start;
+
+	quote_char = '\0';
+
+	while (*p) {
+		if (quote_char == '\0') {
+			switch (*p) {
+			case '\0':
+				goto done;
+
+			case '\'':
+				quote_char = '\'';
+				goto next;
+
+			case ',':
+				++p;
+				goto done;
+
+			case '\\':
+				quote_char = '\\';
+				goto next;
+
+			default:
+				strncat(value, p, 1);
+			}
+		} else if (quote_char == '\\') {
+			/*\ only counts as an escape if escaping a quote mark */
+			if (*p != '\'')
+				strncat(value, "\\", 1);
+
+			strncat(value, p, 1);
+
+			quote_char = '\0';
+		} else {
+			if (*p == '\'')
+				quote_char = '\0';
+			else
+				strncat(value, p, 1);
+		}
+
+next:
+		++p;
+	}
+
+done:
+
+	if (quote_char == '\\')
+		strncat(value, "\\", 1);
+	else if (quote_char == '\'') {
+		pr_warn("Unbalanced quotation marks in match rule");
+		return 0;
+	}
+
+	/* Zero-length values are allowed */
+
+	*value_end = p - s;
+
+	return 1;
+}
+
+/* duplicates aren't allowed so the real legitimate max is only 6 or
+ * so. Leaving extra so we don't have to bother to update it.
+ * FIXME this is sort of busted now with arg matching, but we let
+ * you match on up to 10 args for now
+ */
+#define MAX_RULE_TOKENS 16
+
+/* this is slightly too high level to be termed a "token"
+ * but let's not be pedantic.
+ */
+struct rule_token {
+	char *key;
+	char *value;
+};
+
+static int tokenize_rule(const char *rule_text,
+			 struct rule_token tokens[MAX_RULE_TOKENS],
+			 gfp_t gfp_flags)
+{
+	int i;
+	int pos;
+	int retval;
+
+	retval = 0;
+
+	i = 0;
+	pos = 0;
+	while (i < MAX_RULE_TOKENS &&
+	       pos < strlen(rule_text)) {
+		char *key;
+		char *value;
+
+		key = kzalloc(DBUS_STRING_MAX_LENGTH, gfp_flags);
+		if (!key) {
+			pr_err("Out of memory");
+			return 0;
+		}
+
+		value = kzalloc(DBUS_STRING_MAX_LENGTH, gfp_flags);
+		if (!value) {
+			kfree(key);
+			pr_err("Out of memory");
+			return 0;
+		}
+
+		if (!find_key(rule_text, pos, key, &pos))
+			goto out;
+
+		if (strlen(key) == 0)
+			goto next;
+
+		tokens[i].key = key;
+
+		if (!find_value(rule_text, pos, tokens[i].key, value, &pos))
+			goto out;
+
+		tokens[i].value = value;
+
+next:
+		++i;
+	}
+
+	retval = 1;
+
+out:
+	if (!retval) {
+		i = 0;
+		while (tokens[i].key || tokens[i].value) {
+			kfree(tokens[i].key);
+			kfree(tokens[i].value);
+			tokens[i].key = NULL;
+			tokens[i].value = NULL;
+			++i;
+		}
+	}
+
+	return retval;
+}
+
+/*
+ * The format is comma-separated with strings quoted with single quotes
+ * as for the shell (to escape a literal single quote, use '\'').
+ *
+ * type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',
+ * member='Foo', path='/bar/foo',destination=':452345.34'
+ *
+ */
+struct bus_match_rule *bus_match_rule_parse(const char *rule_text,
+					    gfp_t gfp_flags)
+{
+	struct bus_match_rule *rule;
+	struct rule_token tokens[MAX_RULE_TOKENS+1]; /* NULL termination + 1 */
+	int i;
+
+	if (strlen(rule_text) > DBUS_MAXIMUM_MATCH_RULE_LENGTH) {
+		pr_warn("Match rule text is %ld bytes, maximum is %d",
+			    strlen(rule_text),
+			    DBUS_MAXIMUM_MATCH_RULE_LENGTH);
+		return NULL;
+	}
+
+	memset(tokens, '\0', sizeof(tokens));
+
+	rule = bus_match_rule_new(gfp_flags);
+	if (rule == NULL) {
+		pr_err("Out of memory");
+		goto failed;
+	}
+
+	rule->rule_text = kstrdup(rule_text, gfp_flags);
+	if (rule->rule_text == NULL) {
+		pr_err("Out of memory");
+		goto failed;
+	}
+
+	if (!tokenize_rule(rule_text, tokens, gfp_flags))
+		goto failed;
+
+	i = 0;
+	while (tokens[i].key != NULL) {
+		const char *key = tokens[i].key;
+		const char *value = tokens[i].value;
+
+		if (strcmp(key, "type") == 0) {
+			int t;
+
+			if (rule->flags & BUS_MATCH_MESSAGE_TYPE) {
+				pr_warn("Key %s specified twice in match rule\n",
+					key);
+				goto failed;
+			}
+
+			t = dbus_message_type_from_string(value);
+
+			if (t == DBUS_MESSAGE_TYPE_INVALID) {
+				pr_warn("Invalid message type (%s) in match rule\n",
+					value);
+				goto failed;
+			}
+
+			if (!bus_match_rule_set_message_type(rule, t,
+							     gfp_flags)) {
+				pr_err("Out of memeory");
+				goto failed;
+			}
+		} else if (strcmp(key, "sender") == 0) {
+			if (rule->flags & BUS_MATCH_SENDER) {
+				pr_warn("Key %s specified twice in match rule\n",
+					key);
+				goto failed;
+			}
+
+			if (!bus_match_rule_set_sender(rule, value,
+						       gfp_flags)) {
+				pr_err("Out of memeory");
+				goto failed;
+			}
+		} else if (strcmp(key, "interface") == 0) {
+			if (rule->flags & BUS_MATCH_INTERFACE) {
+				pr_warn("Key %s specified twice in match rule\n",
+					key);
+				goto failed;
+			}
+
+			if (!bus_match_rule_set_interface(rule, value,
+							  gfp_flags)) {
+				pr_err("Out of memeory");
+				goto failed;
+			}
+		} else if (strcmp(key, "member") == 0) {
+			if (rule->flags & BUS_MATCH_MEMBER) {
+				pr_warn("Key %s specified twice in match rule\n",
+					key);
+				goto failed;
+			}
+
+			if (!bus_match_rule_set_member(rule, value,
+						       gfp_flags)) {
+				pr_err("Out of memeory");
+				goto failed;
+			}
+		} else if (strcmp(key, "destination") == 0) {
+			if (rule->flags & BUS_MATCH_DESTINATION) {
+				pr_warn("Key %s specified twice in match rule\n",
+					key);
+				goto failed;
+			}
+
+			if (!bus_match_rule_set_destination(rule, value,
+							    gfp_flags)) {
+				pr_err("Out of memeory");
+				goto failed;
+			}
+		} else if (strcmp(key, "eavesdrop") == 0) {
+			if (strcmp(value, "true") == 0) {
+				rule->flags |= BUS_MATCH_CLIENT_IS_EAVESDROPPING;
+			} else if (strcmp(value, "false") == 0) {
+				rule->flags &= ~(BUS_MATCH_CLIENT_IS_EAVESDROPPING);
+			} else {
+				pr_warn("eavesdrop='%s' is invalid, " \
+					"it should be 'true' or 'false'\n",
+					value);
+				goto failed;
+			}
+		} else if (strncmp(key, "arg", 3) != 0) {
+			pr_warn("Unknown key \"%s\" in match rule\n",
+				   key);
+			goto failed;
+		}
+
+		++i;
+	}
+
+	goto out;
+
+failed:
+	if (rule) {
+		bus_match_rule_free(rule);
+		rule = NULL;
+	}
+
+out:
+
+	i = 0;
+	while (tokens[i].key || tokens[i].value) {
+		WARN_ON(i >= MAX_RULE_TOKENS);
+		kfree(tokens[i].key);
+		kfree(tokens[i].value);
+		++i;
+	}
+
+	return rule;
+}
+
+/* return the match rule containing the hlist_head. It may not be the first
+ * match rule in the list. */
+struct bus_match_rule *match_rule_search(struct rb_root *root,
+					 const char *interface)
+{
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct bus_match_rule *data =
+			container_of(node, struct bus_match_rule, node);
+		int result;
+
+		result = strcmp(interface, data->interface);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return data;
+	}
+	return NULL;
+}
+
+void match_rule_insert(struct rb_root *root, struct bus_match_rule *data)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct bus_match_rule *this =
+			container_of(*new, struct bus_match_rule, node);
+		int result = strcmp(data->interface, this->interface);
+
+		parent = *new;
+		if (result < 0)
+			new = &((*new)->rb_left);
+		else if (result > 0)
+			new = &((*new)->rb_right);
+		else {
+			/* the head is not used */
+			INIT_HLIST_HEAD(&data->first);
+			/* Add it at the beginning of the list */
+			hlist_add_head(&data->list, &this->first);
+			return;
+		}
+	}
+
+	/* this rule is single in its list */
+	INIT_HLIST_HEAD(&data->first);
+	hlist_add_head(&data->list, &data->first);
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+struct bus_match_maker *bus_matchmaker_new(gfp_t gfp_flags)
+{
+	struct bus_match_maker *matchmaker;
+	int i;
+
+	matchmaker = kzalloc(sizeof(struct bus_match_maker), gfp_flags);
+	if (matchmaker == NULL)
+		return NULL;
+
+	for (i = DBUS_MESSAGE_TYPE_INVALID; i < DBUS_NUM_MESSAGE_TYPES; i++) {
+		struct rule_pool *p = matchmaker->rules_by_type + i;
+
+		p->rules_by_iface = RB_ROOT;
+	}
+
+	kref_init(&matchmaker->kref);
+
+	return matchmaker;
+}
+
+void bus_matchmaker_free(struct kref *kref)
+{
+	struct bus_match_maker *matchmaker;
+	struct list_head del_list;
+	struct rb_node *n;
+	int i;
+
+	matchmaker = container_of(kref, struct bus_match_maker, kref);
+
+	/* free names */
+	INIT_LIST_HEAD(&del_list);
+	n = matchmaker->names.rb_node;
+	if (n) {
+		struct dbus_name *dbus_name, *cur, *tmp;
+
+		dbus_name = rb_entry(n, struct dbus_name, node);
+		list_add_tail(&dbus_name->del_list, &del_list);
+
+		list_for_each_entry(cur, &del_list, del_list) {
+			struct dbus_name *right, *left;
+			if (cur->node.rb_right) {
+				right = rb_entry(cur->node.rb_right,
+						 struct dbus_name, node);
+				list_add_tail(&right->del_list, &del_list);
+			}
+			if (cur->node.rb_left) {
+				left = rb_entry(cur->node.rb_left,
+						struct dbus_name, node);
+				list_add_tail(&left->del_list, &del_list);
+			}
+		}
+		list_for_each_entry_safe(dbus_name, tmp, &del_list, del_list) {
+			kfree(dbus_name->name);
+			list_del(&dbus_name->del_list);
+			kfree(dbus_name);
+		}
+	}
+	WARN_ON(!list_empty_careful(&del_list));
+
+	/* free match rules */
+	for (i = 0 ; i < DBUS_NUM_MESSAGE_TYPES ; i++) {
+		struct rule_pool *pool = matchmaker->rules_by_type + i;
+		struct bus_match_rule *match_rule, *cur, *tmp;
+		struct hlist_node *list_tmp, *list_tmp2;
+
+		/* free match rules from the list */
+		hlist_for_each_entry_safe(cur, list_tmp, list_tmp2,
+					  &pool->rules_without_iface, list) {
+			bus_match_rule_free(cur);
+		}
+
+		/* free match rules from the tree */
+		if (!pool->rules_by_iface.rb_node)
+			continue;
+		match_rule = rb_entry(pool->rules_by_iface.rb_node,
+				      struct bus_match_rule, node);
+		list_add_tail(&match_rule->del_list, &del_list);
+
+		list_for_each_entry(cur, &del_list, del_list) {
+			struct bus_match_rule *right, *left;
+			if (cur->node.rb_right) {
+				right = rb_entry(cur->node.rb_right,
+						 struct bus_match_rule, node);
+				list_add_tail(&right->del_list, &del_list);
+			}
+			if (cur->node.rb_left) {
+				left = rb_entry(cur->node.rb_left,
+						struct bus_match_rule, node);
+				list_add_tail(&left->del_list, &del_list);
+			}
+		}
+		list_for_each_entry_safe(match_rule, tmp, &del_list, del_list) {
+			/* keep a ref during the loop to ensure the first
+			 * iteration of the loop does not delete it */
+			hlist_for_each_entry_safe(cur, list_tmp, list_tmp2,
+						  &match_rule->first, list) {
+				if (cur != match_rule)
+					bus_match_rule_free(cur);
+			}
+			list_del(&match_rule->del_list);
+			bus_match_rule_free(match_rule);
+		}
+		WARN_ON(!list_empty_careful(&del_list));
+	}
+
+	kfree(matchmaker);
+}
+
+/* The rule can't be modified after it's added. */
+int bus_matchmaker_add_rule(struct bus_match_maker *matchmaker,
+			    struct bus_match_rule *rule)
+{
+	struct rule_pool *pool;
+
+	WARN_ON(rule->message_type < 0);
+	WARN_ON(rule->message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+	pool = matchmaker->rules_by_type + rule->message_type;
+
+	if (rule->interface)
+		match_rule_insert(&pool->rules_by_iface, rule);
+	else
+		hlist_add_head(&rule->list, &pool->rules_without_iface);
+
+	return 1;
+}
+
+static int match_rule_equal(struct bus_match_rule *a,
+			    struct bus_match_rule *b)
+{
+	if (a->flags != b->flags)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_MESSAGE_TYPE) &&
+	    a->message_type != b->message_type)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_MEMBER) &&
+	    strcmp(a->member, b->member) != 0)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_PATH) &&
+	    strcmp(a->path, b->path) != 0)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_INTERFACE) &&
+	    strcmp(a->interface, b->interface) != 0)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_SENDER) &&
+	    strcmp(a->sender, b->sender) != 0)
+		return 0;
+
+	if ((a->flags & BUS_MATCH_DESTINATION) &&
+	    strcmp(a->destination, b->destination) != 0)
+		return 0;
+
+	if (a->flags & BUS_MATCH_ARGS) {
+		int i;
+
+		if (a->args_len != b->args_len)
+			return 0;
+
+		i = 0;
+		while (i < a->args_len) {
+			int length;
+
+			if ((a->args[i] != NULL) != (b->args[i] != NULL))
+				return 0;
+
+			if (a->arg_lens[i] != b->arg_lens[i])
+				return 0;
+
+			length = a->arg_lens[i] & ~BUS_MATCH_ARG_IS_PATH;
+
+			if (a->args[i] != NULL) {
+				WARN_ON(!b->args[i]);
+				if (memcmp(a->args[i], b->args[i], length) != 0)
+					return 0;
+			}
+
+			++i;
+		}
+	}
+
+	return 1;
+}
+
+/* Remove a single rule which is equal to the given rule by value */
+void bus_matchmaker_remove_rule_by_value(struct bus_match_maker *matchmaker,
+					 struct bus_match_rule *rule)
+{
+	struct rule_pool *pool;
+
+	WARN_ON(rule->message_type < 0);
+	WARN_ON(rule->message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+	pool = matchmaker->rules_by_type + rule->message_type;
+
+	if (rule->interface) {
+		struct bus_match_rule *head =
+			match_rule_search(&pool->rules_by_iface,
+					  rule->interface);
+
+		struct hlist_node *cur;
+		struct bus_match_rule *cur_rule;
+		hlist_for_each_entry(cur_rule, cur, &head->first, list) {
+			if (match_rule_equal(cur_rule, rule)) {
+				hlist_del(cur);
+				if (hlist_empty(&head->first))
+					rb_erase(&head->node,
+						 &pool->rules_by_iface);
+				bus_match_rule_free(cur_rule);
+				break;
+			}
+		}
+	} else {
+		struct hlist_head *head = &pool->rules_without_iface;
+
+		struct hlist_node *cur;
+		struct bus_match_rule *cur_rule;
+		hlist_for_each_entry(cur_rule, cur, head, list) {
+			if (match_rule_equal(cur_rule, rule)) {
+				hlist_del(cur);
+				bus_match_rule_free(cur_rule);
+				break;
+			}
+		}
+	}
+
+}
+
+static int connection_is_primary_owner(struct bus_match_maker *connection,
+				       const char *service_name)
+{
+	struct rb_node *node = connection->names.rb_node;
+
+	if (!service_name)
+		return 0;
+
+	while (node) {
+		struct dbus_name *data = container_of(node, struct dbus_name,
+						      node);
+		int result;
+
+		result = strcmp(service_name, data->name);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return 1;
+	}
+	return 0;
+}
+
+static int match_rule_matches(struct bus_match_maker *matchmaker,
+			      struct bus_match_maker *sender,
+			      int eavesdrop,
+			      struct bus_match_rule *rule,
+			      const struct dbus_message *message)
+{
+	/* Don't consider the rule if this is a eavesdropping match rule
+	 * and eavesdropping is not allowed on that peer */
+	if ((rule->flags & BUS_MATCH_CLIENT_IS_EAVESDROPPING) && !eavesdrop)
+		return 0;
+
+	/* Since D-Bus 1.5.6, match rules do not match messages which have a
+	 * DESTINATION field unless the match rule specifically requests this
+	 * by specifying eavesdrop='true' in the match rule. */
+	if (message->destination &&
+	    !(rule->flags & BUS_MATCH_CLIENT_IS_EAVESDROPPING))
+		return 0;
+
+	if (rule->flags & BUS_MATCH_MEMBER) {
+		const char *member;
+
+		WARN_ON(!rule->member);
+
+		member = message->member;
+		if (member == NULL)
+			return 0;
+
+		if (strcmp(member, rule->member) != 0)
+			return 0;
+	}
+
+	if (rule->flags & BUS_MATCH_SENDER) {
+		WARN_ON(!rule->sender);
+
+		if (sender == NULL) {
+			if (strcmp(rule->sender,
+				   "org.freedesktop.DBus") != 0)
+				return 0;
+		} else
+			if (!connection_is_primary_owner(sender, rule->sender))
+				return 0;
+	}
+
+	if (rule->flags & BUS_MATCH_DESTINATION) {
+		const char *destination;
+
+		WARN_ON(!rule->destination);
+
+		destination = message->destination;
+		if (destination == NULL)
+			return 0;
+
+		/* This will not just work out of the box because it this is
+		 * an eavesdropping match rule. */
+		if (matchmaker == NULL) {
+			if (strcmp(rule->destination,
+				   "org.freedesktop.DBus") != 0)
+				return 0;
+		} else
+			if (!connection_is_primary_owner(matchmaker,
+							 rule->destination))
+				return 0;
+	}
+
+	if (rule->flags & BUS_MATCH_PATH) {
+		const char *path;
+
+		WARN_ON(!rule->path);
+
+		path = message->path;
+		if (path == NULL)
+			return 0;
+
+		if (strcmp(path, rule->path) != 0)
+			return 0;
+	}
+
+	return 1;
+}
+
+static bool get_recipients_from_list(struct bus_match_maker *matchmaker,
+				     struct bus_match_maker *sender,
+				     int eavesdrop,
+				     struct hlist_head *rules,
+				     const struct dbus_message *message)
+{
+	struct hlist_node *cur;
+	struct bus_match_rule *rule;
+
+	if (rules == NULL) {
+		pr_debug("no rules of this type\n");
+		return 0;
+	}
+
+	hlist_for_each_entry(rule, cur, rules, list) {
+		if (match_rule_matches(matchmaker, sender, eavesdrop, rule,
+					message)) {
+			pr_debug("[YES] deliver with match rule \"%s\"\n",
+				 rule->rule_text);
+			return 1;
+		} else {
+			pr_debug("[NO]  deliver with match rule \"%s\"\n",
+				 rule->rule_text);
+		}
+	}
+	pr_debug("[NO]  no match rules\n");
+	return 0;
+}
+
+static struct hlist_head
+*bus_matchmaker_get_rules(struct bus_match_maker *matchmaker,
+			  int message_type, const char *interface)
+{
+	static struct hlist_head empty = {0,};
+	struct rule_pool *p;
+
+	WARN_ON(message_type < 0);
+	WARN_ON(message_type >= DBUS_NUM_MESSAGE_TYPES);
+
+	p = matchmaker->rules_by_type + message_type;
+
+	if (interface == NULL)
+		return &p->rules_without_iface;
+	else {
+		struct bus_match_rule *rule =
+			match_rule_search(&p->rules_by_iface, interface);
+		if (rule)
+			return &rule->first;
+		else
+			return &empty;
+	}
+}
+
+bool bus_matchmaker_filter(struct bus_match_maker *matchmaker,
+			   struct bus_match_maker *sender,
+			   int eavesdrop,
+			   const struct dbus_message *message)
+{
+	int type;
+	const char *interface;
+	struct hlist_head *neither, *just_type, *just_iface, *both;
+
+	type = message->type;
+	interface = message->interface;
+
+	neither = bus_matchmaker_get_rules(matchmaker,
+					   DBUS_MESSAGE_TYPE_INVALID, NULL);
+	just_type = just_iface = both = NULL;
+
+	if (interface != NULL)
+		just_iface = bus_matchmaker_get_rules(matchmaker,
+						      DBUS_MESSAGE_TYPE_INVALID,
+						      interface);
+
+	if (type > DBUS_MESSAGE_TYPE_INVALID && type < DBUS_NUM_MESSAGE_TYPES) {
+		just_type = bus_matchmaker_get_rules(matchmaker, type, NULL);
+
+		if (interface != NULL)
+			both = bus_matchmaker_get_rules(matchmaker, type,
+							interface);
+	}
+
+	if (get_recipients_from_list(matchmaker, sender, eavesdrop, neither,
+				     message))
+		return 1;
+	if (get_recipients_from_list(matchmaker, sender, eavesdrop, just_iface,
+				     message))
+		return 1;
+	if (get_recipients_from_list(matchmaker, sender, eavesdrop, just_type,
+				     message))
+		return 1;
+	if (get_recipients_from_list(matchmaker, sender, eavesdrop, both,
+				     message))
+		return 1;
+
+	return connection_is_primary_owner(matchmaker, message->destination);
+}
+
+void bus_matchmaker_add_name(struct bus_match_maker *matchmaker,
+			     const char *name,
+			     gfp_t gfp_flags)
+{
+	struct dbus_name *dbus_name;
+	struct rb_node **new = &(matchmaker->names.rb_node), *parent = NULL;
+
+	dbus_name = kmalloc(sizeof(struct dbus_name), gfp_flags);
+	if (!dbus_name)
+		return;
+	dbus_name->name = kstrdup(name, gfp_flags);
+	if (!dbus_name->name)
+		return;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct dbus_name *this = container_of(*new, struct dbus_name,
+						      node);
+		int result = strcmp(dbus_name->name, this->name);
+
+		parent = *new;
+		if (result < 0)
+			new = &((*new)->rb_left);
+		else if (result > 0)
+			new = &((*new)->rb_right);
+		else
+			return;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&dbus_name->node, parent, new);
+	rb_insert_color(&dbus_name->node, &matchmaker->names);
+}
+
+void bus_matchmaker_remove_name(struct bus_match_maker *matchmaker,
+				const char *name)
+{
+	struct rb_node *node = matchmaker->names.rb_node;
+
+	while (node) {
+		struct dbus_name *data = container_of(node, struct dbus_name,
+						      node);
+		int result;
+
+		result = strcmp(name, data->name);
+
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else {
+			rb_erase(&data->node, &matchmaker->names);
+			kfree(data->name);
+			kfree(data);
+		}
+	}
+
+}
+
diff --git a/net/netfilter/nfdbus/matchrule.h b/net/netfilter/nfdbus/matchrule.h
new file mode 100644
index 0000000..e16580c
--- /dev/null
+++ b/net/netfilter/nfdbus/matchrule.h
@@ -0,0 +1,82 @@
+/*
+ * signals.h  Bus signal connection implementation
+ *
+ * Copyright (C) 2003  Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef BUS_SIGNALS_H
+#define BUS_SIGNALS_H
+
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <net/af_bus.h>
+
+#include "message.h"
+
+struct bus_match_rule *bus_match_rule_new(gfp_t gfp_flags);
+void bus_match_rule_free(struct bus_match_rule *rule);
+
+struct bus_match_rule *bus_match_rule_parse(const char *rule_text,
+					    gfp_t gfp_flags);
+
+struct rule_pool {
+	/* Maps non-NULL interface names to a list of bus_match_rule */
+	struct rb_root rules_by_iface;
+
+	/* List of bus_match_rule which don't specify an interface */
+	struct hlist_head rules_without_iface;
+};
+
+struct bus_match_maker {
+	struct sockaddr_bus addr;
+
+	struct hlist_node table_node;
+
+	/* Pools of rules, grouped by the type of message they match. 0
+	 * (DBUS_MESSAGE_TYPE_INVALID) represents rules that do not specify a
+	 * message type.
+	 */
+	struct rule_pool rules_by_type[DBUS_NUM_MESSAGE_TYPES];
+
+	struct rb_root names;
+
+	struct kref kref;
+};
+
+
+struct bus_match_maker *bus_matchmaker_new(gfp_t gfp_flags);
+void bus_matchmaker_free(struct kref *kref);
+
+int bus_matchmaker_add_rule(struct bus_match_maker *matchmaker,
+			    struct bus_match_rule *rule);
+void bus_matchmaker_remove_rule_by_value(struct bus_match_maker *matchmaker,
+					 struct bus_match_rule *value);
+
+bool bus_matchmaker_filter(struct bus_match_maker *matchmaker,
+			   struct bus_match_maker *sender,
+			   int eavesdrop,
+			   const struct dbus_message *message);
+
+void bus_matchmaker_add_name(struct bus_match_maker *matchmaker,
+			     const char *name, gfp_t gfp_flags);
+void bus_matchmaker_remove_name(struct bus_match_maker *matchmaker,
+				const char *name);
+
+#endif /* BUS_SIGNALS_H */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH net-next 15/15] netfilter: add netfilter D-Bus module
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (13 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 14/15] netfilter: nfdbus: Add D-bus match rule implementation Vincent Sanders
@ 2012-06-29 16:45 ` Vincent Sanders
  2012-06-29 18:16 ` AF_BUS socket address family Chris Friesen
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 16:45 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller; +Cc: Alban Crequy

From: Alban Crequy <alban.crequy@collabora.co.uk>

AF_BUS has netfilter hooks on the packet sending path. This allows the
netfilter subsystem to register netfilter hook handlers.

The netfilter_dbus module allows to inspect D-Bus messages and take
actions based on the information contained on these messages.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
---
 net/netfilter/Kconfig         |    2 +
 net/netfilter/Makefile        |    3 +
 net/netfilter/nfdbus/Kconfig  |   12 ++
 net/netfilter/nfdbus/Makefile |    6 +
 net/netfilter/nfdbus/nfdbus.c |  456 +++++++++++++++++++++++++++++++++++++++++
 net/netfilter/nfdbus/nfdbus.h |   44 ++++
 6 files changed, 523 insertions(+)
 create mode 100644 net/netfilter/nfdbus/Kconfig
 create mode 100644 net/netfilter/nfdbus/Makefile
 create mode 100644 net/netfilter/nfdbus/nfdbus.c
 create mode 100644 net/netfilter/nfdbus/nfdbus.h

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c19b214..a105d9b 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1187,3 +1187,5 @@ endmenu
 source "net/netfilter/ipset/Kconfig"
 
 source "net/netfilter/ipvs/Kconfig"
+
+source "net/netfilter/nfdbus/Kconfig"
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 1c5160f..6dd4ade 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -123,3 +123,6 @@ obj-$(CONFIG_IP_SET) += ipset/
 
 # IPVS
 obj-$(CONFIG_IP_VS) += ipvs/
+
+# Dbus
+obj-$(CONFIG_NETFILTER_DBUS) += nfdbus/
diff --git a/net/netfilter/nfdbus/Kconfig b/net/netfilter/nfdbus/Kconfig
new file mode 100644
index 0000000..25699a1
--- /dev/null
+++ b/net/netfilter/nfdbus/Kconfig
@@ -0,0 +1,12 @@
+#
+# Netfilter D-Bus module configuration
+#
+config NETFILTER_DBUS
+	tristate "Netfilter D-bus (EXPERIMENTAL)"
+	depends on AF_BUS && CONNECTOR && EXPERIMENTAL
+	---help---
+	  If you say Y here, you will include support for a netfilter hook to
+	  parse D-Bus messages sent using the AF_BUS socket address family.
+
+	  To compile this as a module, choose M here: the module will be
+	  called netfilter_dbus.
diff --git a/net/netfilter/nfdbus/Makefile b/net/netfilter/nfdbus/Makefile
new file mode 100644
index 0000000..1a825f8
--- /dev/null
+++ b/net/netfilter/nfdbus/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the netfilter D-Bus module
+#
+obj-$(CONFIG_NETFILTER_DBUS) += netfilter_dbus.o
+
+netfilter_dbus-y := nfdbus.o message.o matchrule.o
diff --git a/net/netfilter/nfdbus/nfdbus.c b/net/netfilter/nfdbus/nfdbus.c
new file mode 100644
index 0000000..f6642e2
--- /dev/null
+++ b/net/netfilter/nfdbus/nfdbus.c
@@ -0,0 +1,456 @@
+/*
+ *  nfdbus.c - Netfilter module for AF_BUS/BUS_PROTO_DBUS.
+ */
+
+#define DRIVER_AUTHOR "Alban Crequy"
+#define DRIVER_DESC   "Netfilter module for AF_BUS/BUS_PROTO_DBUS."
+
+#include "nfdbus.h"
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/connector.h>
+#include <net/af_bus.h>
+
+#include "message.h"
+#include "matchrule.h"
+
+static struct nf_hook_ops nfho_dbus;
+
+static struct cb_id cn_cmd_id = { CN_IDX_NFDBUS, CN_VAL_NFDBUS };
+
+static unsigned int hash;
+
+/* Scoped by AF_BUS address */
+struct hlist_head matchrules_table[BUS_HASH_SIZE];
+DEFINE_SPINLOCK(matchrules_lock);
+
+static struct bus_match_maker *find_match_maker(struct sockaddr_bus *addr,
+		bool create, bool delete)
+{
+	u64 hash;
+	struct hlist_node *node;
+	struct bus_match_maker *matchmaker;
+	int path_len = strlen(addr->sbus_path);
+
+	hash = csum_partial(addr->sbus_path,
+			    strlen(addr->sbus_path), 0);
+	hash ^= addr->sbus_addr.s_addr;
+	hash ^= hash >> 32;
+	hash ^= hash >> 16;
+	hash ^= hash >> 8;
+	hash &= 0xff;
+
+	spin_lock(&matchrules_lock);
+	hlist_for_each_entry(matchmaker, node, &matchrules_table[hash],
+			     table_node) {
+		if (addr->sbus_family == matchmaker->addr.sbus_family &&
+		    addr->sbus_addr.s_addr == matchmaker->addr.sbus_addr.s_addr &&
+		    !memcmp(addr->sbus_path, matchmaker->addr.sbus_path,
+			   path_len)) {
+			kref_get(&matchmaker->kref);
+			if (delete)
+				hlist_del(&matchmaker->table_node);
+			spin_unlock(&matchrules_lock);
+			pr_debug("Found matchmaker for hash %llu", hash);
+			return matchmaker;
+		}
+	}
+	spin_unlock(&matchrules_lock);
+
+	if (!create) {
+		pr_debug("Matchmaker for hash %llu not found", hash);
+		return NULL;
+	}
+
+	matchmaker = bus_matchmaker_new(GFP_ATOMIC);
+	matchmaker->addr.sbus_family = addr->sbus_family;
+	matchmaker->addr.sbus_addr.s_addr = addr->sbus_addr.s_addr;
+	memcpy(matchmaker->addr.sbus_path, addr->sbus_path, BUS_PATH_MAX);
+
+	pr_debug("Create new matchmaker for hash %llu\n", hash);
+	spin_lock(&matchrules_lock);
+	hlist_add_head(&matchmaker->table_node, &matchrules_table[hash]);
+	kref_get(&matchmaker->kref);
+	spin_unlock(&matchrules_lock);
+	return matchmaker;
+}
+
+static unsigned int dbus_filter(unsigned int hooknum,
+				struct sk_buff *skb,
+				const struct net_device *in,
+				const struct net_device *out,
+				int (*okfn)(struct sk_buff *))
+{
+	struct bus_send_context	*sendctx;
+	struct bus_match_maker *matchmaker = NULL;
+	struct bus_match_maker *sender = NULL;
+	struct dbus_message msg = {0,};
+	unsigned char *data;
+	size_t len;
+	int err;
+	int ret;
+
+	if (!skb->sk || skb->sk->sk_family != PF_BUS) {
+		WARN(1, "netfilter_dbus received an invalid skb");
+		return NF_DROP;
+	}
+
+	data = skb->data;
+	sendctx = BUSCB(skb).sendctx;
+	if (!sendctx || !sendctx->sender || !sendctx->sender_socket) {
+		WARN(1, "netfilter_dbus received an AF_BUS packet" \
+		     " without context. This is a bug. Dropping the"
+			" packet.");
+		return NF_DROP;
+	}
+
+	if (sendctx->sender_socket->sk->sk_protocol != BUS_PROTO_DBUS) {
+		/* This kernel module is for D-Bus. It must not
+		 * interfere with other users of AF_BUS. */
+		return NF_ACCEPT;
+	}
+	if (sendctx->recipient)
+		matchmaker = find_match_maker(sendctx->recipient, false, false);
+
+	len =  skb_tail_pointer(skb) - data;
+
+	if (sendctx->to_master && sendctx->main_recipient) {
+		pr_debug("AF_BUS packet to the bus master. ACCEPT.\n");
+		ret = NF_ACCEPT;
+		goto out;
+	}
+
+	if (sendctx->main_recipient && !sendctx->bus_master_side) {
+		pr_debug("AF_BUS packet from a peer to a peer (unicast). ACCEPT.\n");
+		ret = NF_ACCEPT;
+		goto out;
+	}
+
+	err = dbus_message_parse(data, len, &msg);
+	if (err) {
+		if (!sendctx->main_recipient) {
+			pr_debug("AF_BUS packet for an eavesdropper or " \
+				 "multicast is not parsable. DROP.\n");
+			ret = NF_DROP;
+			goto out;
+		} else if (sendctx->bus_master_side) {
+			pr_debug("AF_BUS packet from bus master is not parsable. ACCEPT.\n");
+			ret = NF_ACCEPT;
+			goto out;
+		} else {
+			pr_debug("AF_BUS packet from peer is not parsable. DROP.\n");
+			ret = NF_DROP;
+			goto out;
+		}
+	}
+
+	if (sendctx->bus_master_side && !sendctx->main_recipient) {
+		pr_debug("AF_BUS packet '%s' from the bus master is for an " \
+			 "eavesdropper. DROP.\n",
+		       msg.member ? msg.member : "");
+		ret = NF_DROP;
+		goto out;
+	}
+	if (sendctx->bus_master_side) {
+		if (msg.name_acquired) {
+			pr_debug("New name: %s [%p %p].\n",
+				 msg.name_acquired, sendctx->sender,
+				 sendctx->recipient);
+
+			sender = find_match_maker(sendctx->sender, true, false);
+			bus_matchmaker_add_name(sender, msg.name_acquired,
+						GFP_ATOMIC);
+		}
+		if (msg.name_lost) {
+			pr_debug("Lost name: %s [%p %p].\n",
+				 msg.name_lost, sendctx->sender,
+				 sendctx->recipient);
+
+			sender = find_match_maker(sendctx->sender, true, false);
+			bus_matchmaker_remove_name(sender, msg.name_acquired);
+		}
+
+		pr_debug("AF_BUS packet '%s' from the bus master. ACCEPT.\n",
+			 msg.member ? msg.member : "");
+		ret = NF_ACCEPT;
+		goto out;
+	}
+
+	pr_debug("Multicast AF_BUS packet, %ld bytes, " \
+		 "considering recipient %lld...\n", len,
+		 sendctx->recipient ? sendctx->recipient->sbus_addr.s_addr : 0);
+
+	pr_debug("Message type %d %s->%s [iface: %s][member: %s][matchmaker=%p]...\n",
+		 msg.type,
+		 msg.sender ? msg.sender : "",
+		 msg.destination ? msg.destination : "",
+		 msg.interface ? msg.interface : "",
+		 msg.member ? msg.member : "",
+		 matchmaker);
+
+	if (!matchmaker) {
+		pr_debug("No match rules for this recipient. DROP.\n");
+		ret = NF_DROP;
+		goto out;
+	}
+
+	sender = find_match_maker(sendctx->sender, true, false);
+	err = bus_matchmaker_filter(matchmaker, sender, sendctx->eavesdropper,
+				    &msg);
+	if (err) {
+		pr_debug("Matchmaker: ACCEPT.\n");
+		ret = NF_ACCEPT;
+		goto out;
+	} else {
+		pr_debug("Matchmaker: DROP.\n");
+		ret = NF_DROP;
+		goto out;
+	}
+
+out:
+	if (matchmaker)
+		kref_put(&matchmaker->kref, bus_matchmaker_free);
+	if (sender)
+		kref_put(&sender->kref, bus_matchmaker_free);
+	return ret;
+}
+
+/* Taken from drbd_nl_send_reply() */
+static void nfdbus_nl_send_reply(struct cn_msg *msg, int ret_code)
+{
+	char buffer[sizeof(struct cn_msg)+sizeof(struct nfdbus_nl_cfg_reply)];
+	struct cn_msg *cn_reply = (struct cn_msg *) buffer;
+	struct nfdbus_nl_cfg_reply *reply =
+		(struct nfdbus_nl_cfg_reply *)cn_reply->data;
+	int rr;
+
+	memset(buffer, 0, sizeof(buffer));
+	cn_reply->id = msg->id;
+
+	cn_reply->seq = msg->seq;
+	cn_reply->ack = msg->ack  + 1;
+	cn_reply->len = sizeof(struct nfdbus_nl_cfg_reply);
+	cn_reply->flags = 0;
+
+	reply->ret_code = ret_code;
+
+	rr = cn_netlink_send(cn_reply, 0, GFP_NOIO);
+	if (rr && rr != -ESRCH)
+		pr_debug("nfdbus: cn_netlink_send()=%d\n", rr);
+}
+
+/**
+ * nfdbus_check_perm - check if a pid is allowed to update match rules
+ * @sockaddr_bus: the socket address of the bus
+ * @pid: the process id that wants to update the match rules set
+ *
+ * Test if a given process id is allowed to update the match rules set
+ * for this bus. Only the process that owns the bus master listen socket
+ * is allowed to update the match rules set for the bus.
+ */
+static bool nfdbus_check_perm(struct sockaddr_bus *sbusname, pid_t pid)
+{
+	struct net *net = get_net_ns_by_pid(pid);
+	struct sock *s;
+	struct bus_address *addr;
+	struct hlist_node *node;
+	int offset = (sbusname->sbus_path[0] == '\0');
+	int path_len = strnlen(sbusname->sbus_path + offset, BUS_PATH_MAX);
+	int len;
+	if (!net)
+		return false;
+
+	len = path_len + 1 + sizeof(__kernel_sa_family_t) +
+	      sizeof(struct bus_addr);
+
+	spin_lock(&bus_address_lock);
+
+	hlist_for_each_entry(addr, node, &bus_address_table[hash],
+			     table_node) {
+		s = addr->sock;
+
+		if (s->sk_protocol != BUS_PROTO_DBUS)
+			continue;
+
+		if (!net_eq(sock_net(s), net))
+			continue;
+
+		if (addr->len == len &&
+		    addr->name->sbus_family == sbusname->sbus_family &&
+		    addr->name->sbus_addr.s_addr == BUS_MASTER_ADDR &&
+		    bus_same_bus(addr->name, sbusname) &&
+		    pid_nr(s->sk_peer_pid) == pid) {
+			spin_unlock(&bus_address_lock);
+			return true;
+		}
+	}
+
+	spin_unlock(&bus_address_lock);
+
+	return false;
+}
+
+static void cn_cmd_cb(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+{
+	struct nfdbus_nl_cfg_req *nlp = (struct nfdbus_nl_cfg_req *)msg->data;
+	struct cn_msg *cn_reply;
+	struct nfdbus_nl_cfg_reply *reply;
+	int retcode, rr;
+	pid_t pid = task_tgid_vnr(current);
+	int reply_size = sizeof(struct cn_msg)
+		+ sizeof(struct nfdbus_nl_cfg_reply);
+
+	pr_debug("nfdbus: %s nsp->pid=%d pid=%d\n", __func__, nsp->pid, pid);
+
+	if (!nfdbus_check_perm(&nlp->addr, pid)) {
+		pr_debug(KERN_ERR "nfdbus: pid=%d is not allowed!\n", pid);
+		retcode = EPERM;
+		goto fail;
+	}
+
+	cn_reply = kzalloc(reply_size, GFP_KERNEL);
+	if (!cn_reply) {
+		retcode = ENOMEM;
+		goto fail;
+	}
+	reply = (struct nfdbus_nl_cfg_reply *) cn_reply->data;
+
+	if (msg->len < sizeof(struct nfdbus_nl_cfg_req)) {
+		reply->ret_code = EINVAL;
+	} else if (nlp->cmd == NFDBUS_CMD_ADDMATCH) {
+		struct bus_match_rule *rule;
+		struct bus_match_maker *matchmaker;
+		reply->ret_code = 0;
+
+		if (msg->len == 0)
+			reply->ret_code = EINVAL;
+
+		rule = bus_match_rule_parse(nlp->data, GFP_ATOMIC);
+		if (rule) {
+			matchmaker = find_match_maker(&nlp->addr, true, false);
+			pr_debug("Add match rule for matchmaker %p\n",
+				 matchmaker);
+			bus_matchmaker_add_rule(matchmaker, rule);
+			kref_put(&matchmaker->kref, bus_matchmaker_free);
+		} else {
+			reply->ret_code = EINVAL;
+		}
+	} else if (nlp->cmd == NFDBUS_CMD_REMOVEMATCH) {
+		struct bus_match_rule *rule;
+		struct bus_match_maker *matchmaker;
+
+		rule = bus_match_rule_parse(nlp->data, GFP_ATOMIC);
+		matchmaker = find_match_maker(&nlp->addr, false, false);
+		if (!matchmaker) {
+			reply->ret_code = EINVAL;
+		} else {
+			pr_debug("Remove match rule for matchmaker %p\n",
+				 matchmaker);
+			bus_matchmaker_remove_rule_by_value(matchmaker, rule);
+			kref_put(&matchmaker->kref, bus_matchmaker_free);
+			reply->ret_code = 0;
+		}
+		bus_match_rule_free(rule);
+
+	} else if (nlp->cmd == NFDBUS_CMD_REMOVEALLMATCH) {
+		struct bus_match_maker *matchmaker;
+
+		matchmaker = find_match_maker(&nlp->addr, false, true);
+		if (!matchmaker) {
+			reply->ret_code = EINVAL;
+		} else {
+			pr_debug("Remove matchmaker %p\n", matchmaker);
+			kref_put(&matchmaker->kref, bus_matchmaker_free);
+			kref_put(&matchmaker->kref, bus_matchmaker_free);
+			reply->ret_code = 0;
+		}
+
+	} else {
+		reply->ret_code = EINVAL;
+	}
+
+	cn_reply->id = msg->id;
+	cn_reply->seq = msg->seq;
+	cn_reply->ack = msg->ack  + 1;
+	cn_reply->len = sizeof(struct nfdbus_nl_cfg_reply);
+	cn_reply->flags = 0;
+
+	rr = cn_netlink_reply(cn_reply, nsp->pid, GFP_KERNEL);
+	if (rr && rr != -ESRCH)
+		pr_debug("nfdbus: cn_netlink_send()=%d\n", rr);
+	pr_debug("nfdbus: cn_netlink_reply(pid=%d)=%d\n", nsp->pid, rr);
+
+	kfree(cn_reply);
+	return;
+fail:
+	nfdbus_nl_send_reply(msg, retcode);
+}
+
+static int __init nfdbus_init(void)
+{
+	int err;
+	struct bus_addr master_addr;
+
+	master_addr.s_addr = BUS_MASTER_ADDR;
+	hash = bus_compute_hash(master_addr);
+
+	pr_debug("Loading netfilter_dbus\n");
+
+	/* Install D-Bus netfilter hook */
+	nfho_dbus.hook     = dbus_filter;
+	nfho_dbus.hooknum  = NF_BUS_SENDING;
+	nfho_dbus.pf       = NFPROTO_BUS; /* Do not use PF_BUS, you fool! */
+	nfho_dbus.priority = 0;
+	nfho_dbus.owner = THIS_MODULE;
+	err = nf_register_hook(&nfho_dbus);
+	if (err)
+		return err;
+	pr_debug("Netfilter hook for D-Bus: installed.\n");
+
+	/* Install connector hook */
+	err = cn_add_callback(&cn_cmd_id, "nfdbus", cn_cmd_cb);
+	if (err)
+		goto err_cn_cmd_out;
+	pr_debug("Connector hook: installed.\n");
+
+	return 0;
+
+err_cn_cmd_out:
+	nf_unregister_hook(&nfho_dbus);
+
+	return err;
+}
+
+static void __exit nfdbus_cleanup(void)
+{
+	int i;
+	struct hlist_node *node, *tmp;
+	struct bus_match_maker *matchmaker;
+	nf_unregister_hook(&nfho_dbus);
+
+	cn_del_callback(&cn_cmd_id);
+
+	spin_lock(&matchrules_lock);
+	for (i = 0; i < BUS_HASH_SIZE; i++) {
+		hlist_for_each_entry_safe(matchmaker, node, tmp,
+					  &matchrules_table[i], table_node) {
+			hlist_del(&matchmaker->table_node);
+			kref_put(&matchmaker->kref, bus_matchmaker_free);
+		}
+	}
+	spin_unlock(&matchrules_lock);
+
+	pr_debug("Unloading netfilter_dbus\n");
+}
+
+module_init(nfdbus_init);
+module_exit(nfdbus_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_ALIAS_NET_PF_PROTO(PF_BUS, BUS_PROTO_DBUS);
diff --git a/net/netfilter/nfdbus/nfdbus.h b/net/netfilter/nfdbus/nfdbus.h
new file mode 100644
index 0000000..477bde3
--- /dev/null
+++ b/net/netfilter/nfdbus/nfdbus.h
@@ -0,0 +1,44 @@
+/*
+ * nfdbus.h  Netfilter module for AF_BUS/BUS_PROTO_DBUS.
+ *
+ * Copyright (C) 2012  Collabora Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#ifndef NETFILTER_DBUS_H
+#define NETFILTER_DBUS_H
+
+#include <linux/types.h>
+#include <linux/bus.h>
+
+#define NFDBUS_CMD_ADDMATCH        0x01
+#define NFDBUS_CMD_REMOVEMATCH     0x02
+#define NFDBUS_CMD_REMOVEALLMATCH  0x03
+
+struct nfdbus_nl_cfg_req {
+	__u32 cmd;
+	__u32 len;
+	struct sockaddr_bus addr;
+	__u64 pad;
+	unsigned char data[0];
+};
+
+struct nfdbus_nl_cfg_reply {
+	__u32 ret_code;
+};
+
+#endif /* NETFILTER_DBUS_H */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
  2012-06-29 16:45 ` [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing Vincent Sanders
@ 2012-06-29 17:11   ` Pablo Neira Ayuso
  2012-07-02 15:43     ` Javier Martinez Canillas
  0 siblings, 1 reply; 53+ messages in thread
From: Pablo Neira Ayuso @ 2012-06-29 17:11 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas,
	Alban Crequy

On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> 
> The netfilter D-Bus module needs to parse D-bus messages sent by
> applications to decide whether a peer can receive or not a D-Bus
> message. Add D-bus message parsing logic to be able to analyze.

Not talking about the entire patchset, only about the part I'm
responsible for.

I don't see why you think this belong to netfilter at all.

This doesn't integrate into the existing filtering infrastructure,
neither it extends it in any way.

> Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
> ---
>  net/netfilter/nfdbus/message.c |  194 ++++++++++++++++++++++++++++++++++++++++
>  net/netfilter/nfdbus/message.h |   71 +++++++++++++++
>  2 files changed, 265 insertions(+)
>  create mode 100644 net/netfilter/nfdbus/message.c
>  create mode 100644 net/netfilter/nfdbus/message.h
> 
> diff --git a/net/netfilter/nfdbus/message.c b/net/netfilter/nfdbus/message.c
> new file mode 100644
> index 0000000..93c409c
> --- /dev/null
> +++ b/net/netfilter/nfdbus/message.c
> @@ -0,0 +1,194 @@
> +/*
> + * message.c  Basic D-Bus message parsing
> + *
> + * Copyright (C) 2010-2012  Collabora Ltd
> + * Authors:	Alban Crequy <alban.crequy@collabora.co.uk>
> + * Copyright (C) 2002, 2003, 2004, 2005  Red Hat Inc.
> + * Copyright (C) 2002, 2003  CodeFactory AB
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + *
> + */
> +
> +#include <linux/slab.h>
> +
> +#include "message.h"
> +
> +int dbus_message_type_from_string(const char *type_str)
> +{
> +	if (strcmp(type_str, "method_call") == 0)
> +		return DBUS_MESSAGE_TYPE_METHOD_CALL;
> +	if (strcmp(type_str, "method_return") == 0)
> +		return DBUS_MESSAGE_TYPE_METHOD_RETURN;
> +	else if (strcmp(type_str, "signal") == 0)
> +		return DBUS_MESSAGE_TYPE_SIGNAL;
> +	else if (strcmp(type_str, "error") == 0)
> +		return DBUS_MESSAGE_TYPE_ERROR;
> +	else
> +		return DBUS_MESSAGE_TYPE_INVALID;
> +}
> +
> +int dbus_message_parse(unsigned char *message, size_t len,
> +		       struct dbus_message *dbus_message)
> +{
> +	unsigned char *cur;
> +	int array_header_len;
> +
> +	dbus_message->message = message;
> +
> +	if (len < 4 + 4 + 4 + 4 || message[1] == 0 || message[1] > 4)
> +		return -EINVAL;
> +
> +	dbus_message->type = message[1];
> +	dbus_message->body_length = *((u32 *)(message + 4));
> +	cur = message + 12;
> +	array_header_len = *(u32 *)cur;
> +	dbus_message->len_offset = 12;
> +	cur += 4;
> +	while (cur < message + len
> +	       && cur < message + 12 + 4 + array_header_len) {
> +		int header_code;
> +		int signature_len;
> +		unsigned char *signature;
> +		int str_len;
> +		unsigned char *str;
> +
> +		/* D-Bus alignment craziness */
> +		if ((cur - message) % 8 != 0)
> +			cur += 8 - (cur - message) % 8;
> +
> +		header_code = *(char *)cur;
> +		cur++;
> +		signature_len = *(char *)cur;
> +		/* All header fields of the current D-Bus spec have a simple
> +		 * type, either o, s, g, or u */
> +		if (signature_len != 1)
> +			return -EINVAL;
> +		cur++;
> +		signature = cur;
> +		cur += signature_len + 1;
> +		if (signature[0] != 'o' &&
> +		    signature[0] != 's' &&
> +		    signature[0] != 'g' &&
> +		    signature[0] != 'u')
> +			return -EINVAL;
> +
> +		if (signature[0] == 'u') {
> +			cur += 4;
> +			continue;
> +		}
> +
> +		if (signature[0] != 'g') {
> +			str_len = *(u32 *)cur;
> +			cur += 4;
> +		} else {
> +			str_len = *(char *)cur;
> +			cur += 1;
> +		}
> +
> +		str = cur;
> +		switch (header_code) {
> +		case 1:
> +			dbus_message->path = str;
> +			break;
> +		case 2:
> +			dbus_message->interface = str;
> +			break;
> +		case 3:
> +			dbus_message->member = str;
> +			break;
> +		case 6:
> +			dbus_message->destination = str;
> +			break;
> +		case 7:
> +			dbus_message->sender = str;
> +			break;
> +		case 8:
> +			dbus_message->body_signature = str;
> +			break;
> +		}
> +		cur += str_len + 1;
> +	}
> +
> +	dbus_message->padding_end = (8 - (cur - message) % 8) % 8;
> +
> +	/* Jump to body D-Bus alignment craziness */
> +	if ((cur - message) % 8 != 0)
> +		cur += 8 - (cur - message) % 8;
> +	dbus_message->new_header_offset = cur - message;
> +
> +	if (dbus_message->new_header_offset
> +	    + dbus_message->body_length != len) {
> +		pr_warn("Message truncated? " \
> +			"Header %d + Body %d != Length %zd\n",
> +			dbus_message->new_header_offset,
> +			dbus_message->body_length, len);
> +		return -EINVAL;
> +	}
> +
> +	if (dbus_message->body_signature &&
> +	    dbus_message->body_signature[0] == 's') {
> +		int str_len;
> +		str_len = *(u32 *)cur;
> +		cur += 4;
> +		dbus_message->arg0 = cur;
> +		cur += str_len + 1;
> +	}
> +
> +	if ((cur - message) % 4 != 0)
> +		cur += 4 - (cur - message) % 4;
> +
> +	if (dbus_message->body_signature &&
> +	    dbus_message->body_signature[0] == 's' &&
> +	    dbus_message->body_signature[1] == 's') {
> +		int str_len;
> +		str_len = *(u32 *)cur;
> +		cur += 4;
> +		dbus_message->arg1 = cur;
> +		cur += str_len + 1;
> +	}
> +
> +	if ((cur - message) % 4 != 0)
> +		cur += 4 - (cur - message) % 4;
> +
> +	if (dbus_message->body_signature &&
> +	    dbus_message->body_signature[0] == 's' &&
> +	    dbus_message->body_signature[1] == 's' &&
> +	    dbus_message->body_signature[2] == 's') {
> +		int str_len;
> +		str_len = *(u32 *)cur;
> +		cur += 4;
> +		dbus_message->arg2 = cur;
> +		cur += str_len + 1;
> +	}
> +
> +	if ((cur - message) % 4 != 0)
> +		cur += 4 - (cur - message) % 4;
> +
> +	if (dbus_message->type == DBUS_MESSAGE_TYPE_SIGNAL &&
> +	    dbus_message->sender && dbus_message->path &&
> +	    dbus_message->interface && dbus_message->member &&
> +	    dbus_message->arg0 &&
> +	    strcmp(dbus_message->sender, "org.freedesktop.DBus") == 0 &&
> +	    strcmp(dbus_message->interface, "org.freedesktop.DBus") == 0 &&
> +	    strcmp(dbus_message->path, "/org/freedesktop/DBus") == 0) {
> +		if (strcmp(dbus_message->member, "NameAcquired") == 0)
> +			dbus_message->name_acquired = dbus_message->arg0;
> +		else if (strcmp(dbus_message->member, "NameLost") == 0)
> +			dbus_message->name_lost = dbus_message->arg0;
> +	}
> +
> +	return 0;
> +}
> diff --git a/net/netfilter/nfdbus/message.h b/net/netfilter/nfdbus/message.h
> new file mode 100644
> index 0000000..e3ea4d3
> --- /dev/null
> +++ b/net/netfilter/nfdbus/message.h
> @@ -0,0 +1,71 @@
> +/*
> + * message.h  Basic D-Bus message parsing
> + *
> + * Copyright (C) 2010  Collabora Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + *
> + */
> +
> +#ifndef DBUS_MESSAGE_H
> +#define DBUS_MESSAGE_H
> +
> +#include <linux/list.h>
> +
> +#define DBUS_MAXIMUM_MATCH_RULE_LENGTH 1024
> +
> +/* Types of message */
> +
> +#define DBUS_MESSAGE_TYPE_INVALID       0
> +#define DBUS_MESSAGE_TYPE_METHOD_CALL   1
> +#define DBUS_MESSAGE_TYPE_METHOD_RETURN 2
> +#define DBUS_MESSAGE_TYPE_ERROR         3
> +#define DBUS_MESSAGE_TYPE_SIGNAL        4
> +#define DBUS_NUM_MESSAGE_TYPES          5
> +
> +/* No need to implement a feature-complete parser. It only implement what is
> + * needed by the bus. */
> +struct dbus_message {
> +	char *message;
> +	size_t len;
> +	size_t new_len;
> +
> +	/* direct pointers to the fields */
> +	int type;
> +	char *path;
> +	char *interface;
> +	char *member;
> +	char *destination;
> +	char *sender;
> +	char *body_signature;
> +	int body_length;
> +	char *arg0;
> +	char *arg1;
> +	char *arg2;
> +	char *name_acquired;
> +	char *name_lost;
> +
> +	/* How to add the 'sender' field in the headers */
> +	int new_header_offset;
> +	int len_offset;
> +	int padding_end;
> +};
> +
> +int dbus_message_type_from_string(const char *type_str);
> +
> +int dbus_message_parse(unsigned char *message, size_t len,
> +		       struct dbus_message *dbus_message);
> +
> +#endif /* DBUS_MESSAGE_H */
> -- 
> 1.7.10
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (14 preceding siblings ...)
  2012-06-29 16:45 ` [PATCH net-next 15/15] netfilter: add netfilter D-Bus module Vincent Sanders
@ 2012-06-29 18:16 ` Chris Friesen
  2012-06-29 19:33   ` Ben Hutchings
  2012-06-29 18:45 ` Casey Schaufler
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Chris Friesen @ 2012-06-29 18:16 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: netdev, linux-kernel, David S. Miller

On 06/29/2012 10:45 AM, Vincent Sanders wrote:
> This series adds the bus address family (AF_BUS) it is against
> net-next as of yesterday.
>
> AF_BUS is a message oriented inter process communication system.
>
> The principle features are:
>
>   - Reliable datagram based communication (all sockets are of type
>     SOCK_SEQPACKET)
>
>   - Multicast message delivery (one to many, unicast as a subset)
>
>   - Strict ordering (messages are delivered to every client in the same order)
>
>   - Ability to pass file descriptors
>
>   - Ability to pass credentials
>

I haven't had time to look at the code yet, but if you haven't already 
I'd like to propose adding the ability for someone with suitable 
privileges to eavesdrop on all communications.  We've been using 
something similar to this (essentially a simplified multicast unix 
datagram protocol) for many years now and having a tcpdump-like ability 
is very useful for debugging.

Chris



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (15 preceding siblings ...)
  2012-06-29 18:16 ` AF_BUS socket address family Chris Friesen
@ 2012-06-29 18:45 ` Casey Schaufler
  2012-06-29 23:22   ` Vincent Sanders
  2012-06-29 22:36 ` David Miller
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 53+ messages in thread
From: Casey Schaufler @ 2012-06-29 18:45 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: netdev, linux-kernel, David S. Miller

On 6/29/2012 9:45 AM, Vincent Sanders wrote:
> This series adds the bus address family (AF_BUS) it is against
> net-next as of yesterday.
>  
> AF_BUS is a message oriented inter process communication system. 
>
> The principle features are:
>
>  - Reliable datagram based communication (all sockets are of type
>    SOCK_SEQPACKET)
>
>  - Multicast message delivery (one to many, unicast as a subset)
>
>  - Strict ordering (messages are delivered to every client in the same order)
>
>  - Ability to pass file descriptors
>
>  - Ability to pass credentials
>
> The basic concept is to provide a virtual bus on which multiple
> processes can communicate and policy is imposed by a "bus master".
>
> Introduction
> ------------
>
> AF_BUS is based upon AF_UNIX but extended for multicast operation and
> removes stream operation, responding to extensive feedback on previous
> approaches we have made the implementation as isolated as
> possible. There are opportunities in the future to integrate the
> socket garbage collector with that of the unix socket implementation.
>
> The impetus for creating this IPC mechanism is to replace the
> underlying transport for D-Bus. The D-Bus system currently emulates this
> IPC mechanism using AF_UNIX sockets in userspace and has numerous
> undesirable behaviours. D-Bus is now widely deployed in many areas and
> has become a de-facto IPC standard. Using this IPC mechanism as a
> transport gives a significant (100% or more) improvement to throughput
> with comparable improvement to latency.
>
> This work was undertaken by Collabora for the GENIVI Alliance and we
> are committed to responding to feedback promptly and intend to continue
> to support this feature into the future.
>
> Operation
> ---------
>
> A bus is created by processes connecting on an AF_BUS socket. The
> "bus master" binds itself instead of connecting to the NULL address.
>
> The socket address is made up of a path component and a numeric
> component. The path component is either a pathname or an abstract
> socket similar to a unix socket. The numeric component is used to
> uniquely identify each connection to the bus. Thus the path identifies
> a specific bus and the numeric component the attachment to that bus.
>
> The numeric component of the address is divided into two fixed parts a
> prefix to identify multicast groups and a suffix which identifies the
> attachment. The kernel allocates a single address in prefix 0 to each
> socket upon connection.
>
> Connections are initially limited to communicating with address the
> bus master (address 0) . The bus master is responsible for making all
> policy decisions around manipulating other attachments including
> building multicast groups. 
>
> It is expected that connecting clients use protocol specific messages
> to communicate with the bus master to negotiate differing
> configurations although a bus master might implement a fixed
> behaviour.
>
> AF_BUS itself is protocol agnostic and implements the configured
> policy between attachments which allows for a bus master to leave a
> bus and communication between clients to continue.
>
> Some test code has been written [1] which demonstrates the usage of
> AF_BUS.
>
> Use with BUS_PROTO_DBUS
> -----------------------
>
> The initial aim of AF_BUS is to provide a IPC mechanism suitable for
> use to provide the underlying transport for D-Bus. 
>
> A socket created using BUS_PROTO_DBUS indicates that the messages
> passed will be in the D-Bus format. The userspace libraries have been
> updated to use this transport with an updated D-Bus daemon [2] as a bus
> master.

Why don't you go whole hog and put all of D-Bus into the kernel?

>
> The D-Bus protocol allows for multicast groups to be filtered depending
> on message contents. These filters are configured by the bus master
> but need to be enforced on message delivery. 
>
> We have simply used the standard kernel netfilter mechanism to achieve
> this. This is used to filter delivery to clients that may be part of a
> multicast group where they are not receiving all messages according to
> policy. If a client wishes to further filter its input provision has
> been made to allow them to use BPF.
>
> The kernel based IPC has several benefits for D-Bus over the userspace
> emulation:
>
>  - Context switching between userspace processes is reduced.
>  - Message data copying is reduced.
>  - System call overheads are reduced.
>  - The userspace D-Bus daemon was subject to resource starvation,
>    client contention and priority inversion.
>  - Latency is reduced
>  - Throughput is increased.
>
> The tools for testing these assertions are available [3] and
> consistently show a doubling in throughput and better than halving of
> latency.

Please cross-post Patches 04/15 and 05/15 to the linux-security-module list.
Please cross-post Patch 05/15 to the selinux list.

Where is the analogous patch for the Smack LSM?



>
> [1] http://cgit.collabora.com/git/user/javier/check-unix-multicast.git/log/?h=af-bus
> [2] http://cgit.collabora.com/git/user/rodrigo/dbus.git/
>
> [3] git://github.com/kanchev/dbus-ping.git
>     https://github.com/kanchev/dbus-ping/blob/master/dbus-genivi-benchmarking.sh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 18:16 ` AF_BUS socket address family Chris Friesen
@ 2012-06-29 19:33   ` Ben Hutchings
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Hutchings @ 2012-06-29 19:33 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller

On Fri, 2012-06-29 at 12:16 -0600, Chris Friesen wrote:
> On 06/29/2012 10:45 AM, Vincent Sanders wrote:
> > This series adds the bus address family (AF_BUS) it is against
> > net-next as of yesterday.
> >
> > AF_BUS is a message oriented inter process communication system.
> >
> > The principle features are:
> >
> >   - Reliable datagram based communication (all sockets are of type
> >     SOCK_SEQPACKET)
> >
> >   - Multicast message delivery (one to many, unicast as a subset)
> >
> >   - Strict ordering (messages are delivered to every client in the same order)
> >
> >   - Ability to pass file descriptors
> >
> >   - Ability to pass credentials
> >
> 
> I haven't had time to look at the code yet, but if you haven't already 
> I'd like to propose adding the ability for someone with suitable 
> privileges to eavesdrop on all communications.  We've been using 
> something similar to this (essentially a simplified multicast unix 
> datagram protocol) for many years now and having a tcpdump-like ability 
> is very useful for debugging.

It's in there (look for 'eavesdrop' in 08/15).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (16 preceding siblings ...)
  2012-06-29 18:45 ` Casey Schaufler
@ 2012-06-29 22:36 ` David Miller
  2012-06-29 23:12   ` Vincent Sanders
  2012-06-30 20:41 ` Hans-Peter Jansen
  2012-07-05  7:59 ` Linus Walleij
  19 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2012-06-29 22:36 UTC (permalink / raw)
  To: vincent.sanders; +Cc: netdev, linux-kernel


There is no extensive text describing why using IPv4 for this cannot
be done.  I can almost bet that nobody really, honestly, tried.

Basically this means all of our feedback from the last time we had
discussions on kernel IPC for DBUS are being completely ignored.

Therefore, I will completely ignore this patch submission.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 22:36 ` David Miller
@ 2012-06-29 23:12   ` Vincent Sanders
  2012-06-29 23:18     ` David Miller
  2012-07-05 21:06     ` Jan Engelhardt
  0 siblings, 2 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 23:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

On Fri, Jun 29, 2012 at 03:36:56PM -0700, David Miller wrote:
> 
> There is no extensive text describing why using IPv4 for this cannot
> be done.  I can almost bet that nobody really, honestly, tried.
> 

I can assure you that the team has tried no fewer than six differing
approaches, including using IP and attempting to bend several of the
existing address families.

> Basically this means all of our feedback from the last time we had
> discussions on kernel IPC for DBUS are being completely ignored.

Absolutely not, we listened hard and did extensive research, please do
not ascribe thoughtlessness to our actions. Certainly I would not
presume to waste your time and present something which has not been
thoroughly considered.

I had hoped you would have at least read the opening list where I
outlined the underlying features which explain why none of the
existing IPC match the requirements.

Firstly it is intended is an interprocess mechanism and not to rely on
a configured IP system, indeed one of its primary usages is to
provide mechanism for various tools to set up IP networking.

Leaving that aside the requirements for multicast, strict ordering, fd
passing and credential passing are simply not available in any other
single transport. It was made plain to us that AF_UNIX would not be
expanded to encompass multicast so we are left with adding AF_BUS.

If we are wrong I hope you will explain to me how we can achieve fd and
credential passing to multicast groups within existing protocols.

> 
> Therefore, I will completely ignore this patch submission.
> 

I do hope you will reconsider, or at least educate us appropriately. 
I understand you are a busy maintainer and appreciate your time in this matter.

Best regards
--
Vincent Sanders <vincent.sanders@collabora.co.uk>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:12   ` Vincent Sanders
@ 2012-06-29 23:18     ` David Miller
  2012-06-29 23:42       ` Vincent Sanders
  2012-07-02  4:49       ` Chris Friesen
  2012-07-05 21:06     ` Jan Engelhardt
  1 sibling, 2 replies; 53+ messages in thread
From: David Miller @ 2012-06-29 23:18 UTC (permalink / raw)
  To: vincent.sanders; +Cc: netdev, linux-kernel

From: Vincent Sanders <vincent.sanders@collabora.co.uk>
Date: Sat, 30 Jun 2012 00:12:37 +0100

> I had hoped you would have at least read the opening list where I
> outlined the underlying features which explain why none of the
> existing IPC match the requirements.

I had hoped that you had read the part we told you last time where
we explained why multicast and "reliable delivery" are fundamentally
incompatible attributes.

We are not creating a full address family in the kernel which exists
for one, and only one, specific and difficult user.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 18:45 ` Casey Schaufler
@ 2012-06-29 23:22   ` Vincent Sanders
  0 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 23:22 UTC (permalink / raw)
  To: Casey Schaufler; +Cc: netdev, linux-kernel, David S. Miller

On Fri, Jun 29, 2012 at 11:45:10AM -0700, Casey Schaufler wrote:
> On 6/29/2012 9:45 AM, Vincent Sanders wrote:

<snip>

> >
> > A socket created using BUS_PROTO_DBUS indicates that the messages
> > passed will be in the D-Bus format. The userspace libraries have been
> > updated to use this transport with an updated D-Bus daemon [2] as a bus
> > master.
> 
> Why don't you go whole hog and put all of D-Bus into the kernel?
> 

That would be ridiculously excessive. This work represents what we
feel is the minimum required functionlity for the underlying IPC
mechanism. 

The minimal filtering performed by the netfilter module is what is
required to enforce security as used in existing deployments and no more.

<snip>

> >
> > The tools for testing these assertions are available [3] and
> > consistently show a doubling in throughput and better than halving of
> > latency.
> 
> Please cross-post Patches 04/15 and 05/15 to the linux-security-module list.
> Please cross-post Patch 05/15 to the selinux list.
> 
> Where is the analogous patch for the Smack LSM?

we have not tested or built this with the Smack LSM, I would, of
course, be pleased to accept a patch to add this functionality if you
are knowladgeable in this area.

<snip>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:18     ` David Miller
@ 2012-06-29 23:42       ` Vincent Sanders
  2012-06-29 23:50         ` David Miller
  2012-06-30  0:13         ` Benjamin LaHaise
  2012-07-02  4:49       ` Chris Friesen
  1 sibling, 2 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-29 23:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

On Fri, Jun 29, 2012 at 04:18:21PM -0700, David Miller wrote:
> From: Vincent Sanders <vincent.sanders@collabora.co.uk>
> Date: Sat, 30 Jun 2012 00:12:37 +0100
> 
> > I had hoped you would have at least read the opening list where I
> > outlined the underlying features which explain why none of the
> > existing IPC match the requirements.
> 
> I had hoped that you had read the part we told you last time where
> we explained why multicast and "reliable delivery" are fundamentally
> incompatible attributes.
> 

I do not beleive we indicated reliable delivery, mearly ordered and
idempotent. eitehr everyone gets the message in the same order or
noone gets it.

> We are not creating a full address family in the kernel which exists
> for one, and only one, specific and difficult user.

Basically you are indicating you would be completely opposed to any
mechanism involving D-Bus IPC and the kernel? 

Is there were a way to convince you that this is of real value to a
great many of the users of Linux systems in use today. I can assert
with some confidence that there are many, many more users of D-Bus IPC
than there are for several of the other address families that are
present within the kernel already. 

The current users are suffering from the issues outlined in my
introductory mail all the time. These issues are caused by emulating an
IPC system over AF_UNIX in userspace.

All we are trying to do is make things better for our users, is there
a way to do that which will satisfy you technically and them? Honestly
I am just looking for a viable solution here.

-- 
Regards Vincent


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:42       ` Vincent Sanders
@ 2012-06-29 23:50         ` David Miller
  2012-06-30  0:09           ` Vincent Sanders
  2012-06-30 13:12           ` Alan Cox
  2012-06-30  0:13         ` Benjamin LaHaise
  1 sibling, 2 replies; 53+ messages in thread
From: David Miller @ 2012-06-29 23:50 UTC (permalink / raw)
  To: vincent.sanders; +Cc: netdev, linux-kernel

From: Vincent Sanders <vincent.sanders@collabora.co.uk>
Date: Sat, 30 Jun 2012 00:42:30 +0100

> Basically you are indicating you would be completely opposed to any
> mechanism involving D-Bus IPC and the kernel? 

I would not oppose existing mechanisms, which I do not believe is
impossible to use in your scenerio.

What you really don't get is that packet drops and event losses are
absolutely fundamental.

As long as receivers lack infinite receive queue this will always be
the case.

Multicast operates in non-reliable transports only so that one stuck
or malfunctioning receiver doesn't screw things over for everyone nor
unduly brudon the sender.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:50         ` David Miller
@ 2012-06-30  0:09           ` Vincent Sanders
  2012-06-30 13:12           ` Alan Cox
  1 sibling, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-06-30  0:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

On Fri, Jun 29, 2012 at 04:50:23PM -0700, David Miller wrote:
> From: Vincent Sanders <vincent.sanders@collabora.co.uk>
> Date: Sat, 30 Jun 2012 00:42:30 +0100
> 
> > Basically you are indicating you would be completely opposed to any
> > mechanism involving D-Bus IPC and the kernel? 
> 
> I would not oppose existing mechanisms, which I do not believe is
> impossible to use in your scenerio.
> 

You keep saying that yet have offered no concrete way to achive the
semantics we require. To pass fd and credentials currently *requires*
the use of AF_UNIX does it not? And D-Bus already emulates its IPC
over AF_UNIX because of that.

> What you really don't get is that packet drops and event losses are
> absolutely fundamental.

not within an IPC surely? there cannot be packet drops within AF_BUS
we simply do not do it. The rrecive queues are checked for capability
of reciving the message before it is delivered to them all or none.

> 
> As long as receivers lack infinite receive queue this will always be
> the case.

Indeed, I would not question that.

> 
> Multicast operates in non-reliable transports only so that one stuck
> or malfunctioning receiver doesn't screw things over for everyone nor
> unduly brudon the sender.
> 

We have addressed this within AF_BUS by the reciver and bus master
being told if all recepients cannot receive the message (and therefor
it cannot be sent). 

The policy decision of how to handle this situation is therefore
handled by the userspace clients on a protocol level. D-Bus *already*
has to handle this situation, its just currently done over AF_UNIX
sockets so once it occours the problem is harder to rectify as the
ordering constraint is broken (which causes even more issues).

I am afraid it is rather late here and I may not be able to continue
this conversation untill the morning, I apologise if this is
inconveniant, but I must sleep.

-- 
Regards Vincent


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:42       ` Vincent Sanders
  2012-06-29 23:50         ` David Miller
@ 2012-06-30  0:13         ` Benjamin LaHaise
  2012-06-30 12:52           ` Alan Cox
  1 sibling, 1 reply; 53+ messages in thread
From: Benjamin LaHaise @ 2012-06-30  0:13 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: David Miller, netdev, linux-kernel

On Sat, Jun 30, 2012 at 12:42:30AM +0100, Vincent Sanders wrote:
> The current users are suffering from the issues outlined in my
> introductory mail all the time. These issues are caused by emulating an
> IPC system over AF_UNIX in userspace.

Nothing in your introductory statements indicate how your requirements 
can't be met through a hybrid socket + shared memory solution.  The IPC 
facilities of the kernel are already quite rich, and sufficient for 
building many kinds of complex systems.  What's so different about DBus' 
requirements?

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-30  0:13         ` Benjamin LaHaise
@ 2012-06-30 12:52           ` Alan Cox
  2012-07-02 14:51             ` Vincent Sanders
  0 siblings, 1 reply; 53+ messages in thread
From: Alan Cox @ 2012-06-30 12:52 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Vincent Sanders, David Miller, netdev, linux-kernel

On Fri, 29 Jun 2012 20:13:50 -0400
Benjamin LaHaise <bcrl@kvack.org> wrote:

> On Sat, Jun 30, 2012 at 12:42:30AM +0100, Vincent Sanders wrote:
> > The current users are suffering from the issues outlined in my
> > introductory mail all the time. These issues are caused by emulating an
> > IPC system over AF_UNIX in userspace.
> 
> Nothing in your introductory statements indicate how your requirements 
> can't be met through a hybrid socket + shared memory solution.  The IPC 
> facilities of the kernel are already quite rich, and sufficient for 
> building many kinds of complex systems.  What's so different about DBus' 
> requirements?

dbus wants to
- multicast
- pass file handles
- never lose an event
- be fast
- have a security model

The security model makes a shared memory hack impractical, the file
handle passing means at least some of it needs to be AF_UNIX. The event
loss handling/speed argue for putting it in kernel.

I'm not convinced AF_BUS entirely sorts this either. In particular the
failure case dbus currently has to handle for not losing events allows it
to identify who in a "group" has jammed the bus by not listening (eg by
locking up). This information appears to be lost in the AF_BUS case and
that's slightly catastrophic for error recovery.

Alan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:50         ` David Miller
  2012-06-30  0:09           ` Vincent Sanders
@ 2012-06-30 13:12           ` Alan Cox
  2012-07-01  0:33             ` David Miller
  1 sibling, 1 reply; 53+ messages in thread
From: Alan Cox @ 2012-06-30 13:12 UTC (permalink / raw)
  To: David Miller; +Cc: vincent.sanders, netdev, linux-kernel

> What you really don't get is that packet drops and event losses are
> absolutely fundamental.

The world is full of "receiver reliable" multicast transport providers
which provide ordered defined message delivery properties.

They are reliable in the sense that a message is either queued to the
other ends or is not queued. They are not reliable in the sense of "we
wait forever".

In fact if you look up the stack you'll find a large number of multicast
messaging systems which do reliable transport built on top of IP. In fact
Red Hat provides a high level messaging cluster service that does exactly
this. (as well as dbus which does it on the deskop level) plus a ton of
stuff on top of that (JGroups etc)

Everybody at the application level has been using these 'receiver
reliable'  multicast services for years (Websphere MQ, TIBCO, RTPGM,
OpenPGM, MS-PGM, you name it). There are even accelerators for PGM based
protocols in things like Cisco routers and Solarflare can do much of it
on the card for 10Gbit.

> As long as receivers lack infinite receive queue this will always be
> the case.
> 
> Multicast operates in non-reliable transports only so that one stuck
> or malfunctioning receiver doesn't screw things over for everyone nor
> unduly brudon the sender.

All the world is not IP. Dealing with a malfunctioning receiver is
something dbus already has to deal with. "Unduly burden the sender" is
you talking out of your underwear. The sender is already implementing
this property set - in user space. So there can't be any more burdening,
in fact the point of this is to get rid of excess burdens caused by lack
of kernel support.

This is a latency issue not a throughput one so you can't hide it with
buffers. A few ms shaved off desktop behaviour here and there makes a
massive difference to perceived responsiveness. Less task switches and
daemons means a lot less tasks bouncing around processors which means
less power consumption.

Alan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (17 preceding siblings ...)
  2012-06-29 22:36 ` David Miller
@ 2012-06-30 20:41 ` Hans-Peter Jansen
  2012-07-02 16:46   ` Alban Crequy
  2012-07-05  7:59 ` Linus Walleij
  19 siblings, 1 reply; 53+ messages in thread
From: Hans-Peter Jansen @ 2012-06-30 20:41 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: netdev, linux-kernel, David S. Miller

Dear Vincent,

On Friday 29 June 2012, 18:45:39 Vincent Sanders wrote:
> This series adds the bus address family (AF_BUS) it is against
> net-next as of yesterday.
>
> AF_BUS is a message oriented inter process communication system.
>
> The principle features are:
>
>  - Reliable datagram based communication (all sockets are of type
>    SOCK_SEQPACKET)
>
>  - Multicast message delivery (one to many, unicast as a subset)
>
>  - Strict ordering (messages are delivered to every client in the
> same order)
>
>  - Ability to pass file descriptors
>
>  - Ability to pass credentials
>
> The basic concept is to provide a virtual bus on which multiple
> processes can communicate and policy is imposed by a "bus master".
>
> Introduction
> ------------
>
> AF_BUS is based upon AF_UNIX but extended for multicast operation and
> removes stream operation, responding to extensive feedback on
> previous approaches we have made the implementation as isolated as
> possible. There are opportunities in the future to integrate the
> socket garbage collector with that of the unix socket implementation.
>
> The impetus for creating this IPC mechanism is to replace the
> underlying transport for D-Bus. The D-Bus system currently emulates
> this IPC mechanism using AF_UNIX sockets in userspace and has
> numerous undesirable behaviours. D-Bus is now widely deployed in many
> areas and has become a de-facto IPC standard. Using this IPC
> mechanism as a transport gives a significant (100% or more)
> improvement to throughput with comparable improvement to latency.

Your introduction is missing a comprehensive "Discussion" section, where 
you compare the AF_UNIX based implementation with AF_BUS ones. 

You should elaborate on each of the above noted undesirable behaviours, 
why and how AF_BUS is advantageous. Show the workarounds, that are 
needed by AF_UNIX to operate (properly?!?) and how the new 
implementation is going to improve this situation.

This will help to get some progress into the indurated discussion here.

Please also note, that, while your aims are nice and sound, it's even 
more important for IPC mechanisms to operate properly - even during 
persisting error conditions (crashed bus master and clients, 
misbehaving or even abusing members). It would be cool to create a 
D-BUS test rig, that not only measures performance numbers, but also 
checks for dead locks, corner cases and abuse attempts in both IPC 
implementations.

It's a juggling act: while AF_UNIX might suffer from downsides, the code 
is heavily exercised in every aspect. Your implementation will only be 
exercised by a handful of users (basically one lib), but in order to 
rectify its existence in kernel space, such extensions need different 
kinds of users, and the basic concepts need to fit in the whole kernel 
picture as well, or you need to call it AF_DBUS with even less chance 
to get it into mainstream.

Wishing you all the best and good luck,
Pete

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-30 13:12           ` Alan Cox
@ 2012-07-01  0:33             ` David Miller
  2012-07-01 14:16               ` Alan Cox
  0 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2012-07-01  0:33 UTC (permalink / raw)
  To: alan; +Cc: vincent.sanders, netdev, linux-kernel

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Sat, 30 Jun 2012 14:12:22 +0100

> In fact if you look up the stack you'll find a large number of multicast
> messaging systems which do reliable transport built on top of IP. In fact
> Red Hat provides a high level messaging cluster service that does exactly
> this. (as well as dbus which does it on the deskop level) plus a ton of
> stuff on top of that (JGroups etc)
> 
> Everybody at the application level has been using these 'receiver
> reliable'  multicast services for years (Websphere MQ, TIBCO, RTPGM,
> OpenPGM, MS-PGM, you name it). There are even accelerators for PGM based
> protocols in things like Cisco routers and Solarflare can do much of it
> on the card for 10Gbit.

The issue is that what to do when a receiver goes deaf is a policy
issue.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family
  2012-06-29 16:45 ` [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family Vincent Sanders
@ 2012-07-01  2:15   ` Jan Engelhardt
  0 siblings, 0 replies; 53+ messages in thread
From: Jan Engelhardt @ 2012-07-01  2:15 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas

On Friday 2012-06-29 18:45, Vincent Sanders wrote:

>AF_BUS sockets add a netfilter NF_HOOK() on the packet sending path.
>This allows packet to be mangled by registered netfilter hooks.

If you do touch netfiler, consider adding that mailing list as well.

>diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
>index c613cf0..0698924 100644
>--- a/include/linux/netfilter.h
>+++ b/include/linux/netfilter.h
>@@ -67,6 +67,7 @@ enum {
> 	NFPROTO_BRIDGE =  7,
> 	NFPROTO_IPV6   = 10,
> 	NFPROTO_DECNET = 12,
>+	NFPROTO_BUS,
> 	NFPROTO_NUMPROTO,
> };

Make use of the holes that were left.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-01  0:33             ` David Miller
@ 2012-07-01 14:16               ` Alan Cox
  2012-07-01 21:45                 ` David Miller
  0 siblings, 1 reply; 53+ messages in thread
From: Alan Cox @ 2012-07-01 14:16 UTC (permalink / raw)
  To: David Miller; +Cc: vincent.sanders, netdev, linux-kernel

> The issue is that what to do when a receiver goes deaf is a policy
> issue.

Something all these protocols alredy recognize and have done for years.
The real issue for AF_BUS versus the current slow AF_UNIX approach is
that you really need to know *who* is blocked up.

That's no different to UDP multicast and needing to know how errored the
frame.

Alan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-01 14:16               ` Alan Cox
@ 2012-07-01 21:45                 ` David Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David Miller @ 2012-07-01 21:45 UTC (permalink / raw)
  To: alan; +Cc: vincent.sanders, netdev, linux-kernel

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Sun, 1 Jul 2012 15:16:59 +0100

>> The issue is that what to do when a receiver goes deaf is a policy
>> issue.
> 
> Something all these protocols alredy recognize and have done for years.
> The real issue for AF_BUS versus the current slow AF_UNIX approach is
> that you really need to know *who* is blocked up.
> 
> That's no different to UDP multicast and needing to know how errored the
> frame.

And the policy on what to do about such UDP multicast errors is in
userspace, which is precisely my point.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:18     ` David Miller
  2012-06-29 23:42       ` Vincent Sanders
@ 2012-07-02  4:49       ` Chris Friesen
  1 sibling, 0 replies; 53+ messages in thread
From: Chris Friesen @ 2012-07-02  4:49 UTC (permalink / raw)
  To: David Miller; +Cc: vincent.sanders, netdev, linux-kernel

On 06/29/2012 05:18 PM, David Miller wrote:
> From: Vincent Sanders<vincent.sanders@collabora.co.uk>
> Date: Sat, 30 Jun 2012 00:12:37 +0100
>
>> I had hoped you would have at least read the opening list where I
>> outlined the underlying features which explain why none of the
>> existing IPC match the requirements.
> I had hoped that you had read the part we told you last time where
> we explained why multicast and "reliable delivery" are fundamentally
> incompatible attributes.
>
> We are not creating a full address family in the kernel which exists
> for one, and only one, specific and difficult user.

For what it's worth, the company I work for (and a number of other 
companies) currently use an out-of-tree datagram multicast messaging 
protocol family based on AF_UNIX.

If AF_BUS were to be accepted, it would be essentially trivial for us to 
port our existing userspace messaging library to use it instead of our 
current protocol family, and we would almost certainly do so.

I'd love to see AF_BUS go in.

Chris Friesen


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-30 12:52           ` Alan Cox
@ 2012-07-02 14:51             ` Vincent Sanders
  0 siblings, 0 replies; 53+ messages in thread
From: Vincent Sanders @ 2012-07-02 14:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: Benjamin LaHaise, David Miller, netdev, linux-kernel

On Sat, Jun 30, 2012 at 01:52:40PM +0100, Alan Cox wrote:
> On Fri, 29 Jun 2012 20:13:50 -0400
> Benjamin LaHaise <bcrl@kvack.org> wrote:
> 
> > On Sat, Jun 30, 2012 at 12:42:30AM +0100, Vincent Sanders wrote:
> > > The current users are suffering from the issues outlined in my
> > > introductory mail all the time. These issues are caused by emulating an
> > > IPC system over AF_UNIX in userspace.
> > 
> > Nothing in your introductory statements indicate how your requirements 
> > can't be met through a hybrid socket + shared memory solution.  The IPC 
> > facilities of the kernel are already quite rich, and sufficient for 
> > building many kinds of complex systems.  What's so different about DBus' 
> > requirements?
> 
> dbus wants to
> - multicast
> - pass file handles
> - never lose an event
> - be fast
> - have a security model
> 
> The security model makes a shared memory hack impractical, the file
> handle passing means at least some of it needs to be AF_UNIX. The event
> loss handling/speed argue for putting it in kernel.

Thankyou for making this point more eloquently than I had previously
been able to.

> 
> I'm not convinced AF_BUS entirely sorts this either. In particular the
> failure case dbus currently has to handle for not losing events allows it
> to identify who in a "group" has jammed the bus by not listening (eg by
> locking up). This information appears to be lost in the AF_BUS case and
> that's slightly catastrophic for error recovery.
> 

The strategy the existing AF_UNIX D-Bus daemon implements is simply to
have huge queues and thus rarely encounters the situation. When It
does the bus daemon crafts an error message as a reply to the sender.

The AF_BUS solution is more direct in that the sender gets either
EAGAIN for a direct send or EPOLLOUT from poll. Whatever the response
the sender can use this information to implement a userspace policy
decision.

Your feedback sparked a discussion and we have considered this in more
depth and propose implementing a userspace policy of:

 - sending a message to the bus master and let it "deal" with the
   blocking client.

 - The bus master might choose to isolate the offending client or
    perhaps even cause a service restart etc. 

   The bus master is a privileged client and has state information
    about the bus allowing an optimal decision. Though we intend to
    add a socket option to query the queue lengths so it can make a
    better decisions.

   Regardless this is all userspace policy for the D-Bus client
    library / bus master daemon which I believe addresses David Miller's
    concerns about such decisions being made in userspace.

--
Best Regards 
Vincent Sanders <vincent.sanders@collabora.co.uk>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
  2012-06-29 17:11   ` Pablo Neira Ayuso
@ 2012-07-02 15:43     ` Javier Martinez Canillas
  2012-07-04 17:30       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 53+ messages in thread
From: Javier Martinez Canillas @ 2012-07-02 15:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller, Alban Crequy

On 06/29/2012 07:11 PM, Pablo Neira Ayuso wrote:
> On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
>> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
>> 
>> The netfilter D-Bus module needs to parse D-bus messages sent by
>> applications to decide whether a peer can receive or not a D-Bus
>> message. Add D-bus message parsing logic to be able to analyze.
> 
> Not talking about the entire patchset, only about the part I'm
> responsible for.
> 
> I don't see why you think this belong to netfilter at all.
> 
> This doesn't integrate into the existing filtering infrastructure,
> neither it extends it in any way.
> 

Hello Pablo,

Thanks a lot for your feedback.

This is the first of a set of patches that adds a netfilter module to parse
D-Bus messages, the complete patch-set is:

[PATCH 13/15] netfilter: nfdbus: Add D-bus message parsing
[PATCH 14/15] netfilter: nfdbus: Add D-bus match rule implementation
[PATCH 15/15] netfilter: add netfilter D-Bus module

patches 13 and 14 just include D-Bus helper code to be used by the netfilter
module (added on patch 15) and specially the dbus_filter netfilter hook function.

For the next post version we will reorganize the patches so first the D-Bus
netfilter module is added with an empty dbus_filter function and then added the
D-Bus helper code.

Also, we will move the nfdbus netfilter module to net/bus so is not inside the
netfilter core code.

Thanks a lot and best regards,
Javier

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-30 20:41 ` Hans-Peter Jansen
@ 2012-07-02 16:46   ` Alban Crequy
  0 siblings, 0 replies; 53+ messages in thread
From: Alban Crequy @ 2012-07-02 16:46 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller

Sat, 30 Jun 2012 22:41:08 +0200,
"Hans-Peter Jansen" <hpj@urpla.net> wrote :

> Dear Vincent,
> 
> On Friday 29 June 2012, 18:45:39 Vincent Sanders wrote:
> > This series adds the bus address family (AF_BUS) it is against
> > net-next as of yesterday.
> >
> > AF_BUS is a message oriented inter process communication system.
> >
> > The principle features are:
> >
> >  - Reliable datagram based communication (all sockets are of type
> >    SOCK_SEQPACKET)
> >
> >  - Multicast message delivery (one to many, unicast as a subset)
> >
> >  - Strict ordering (messages are delivered to every client in the
> > same order)
> >
> >  - Ability to pass file descriptors
> >
> >  - Ability to pass credentials
> >
> > The basic concept is to provide a virtual bus on which multiple
> > processes can communicate and policy is imposed by a "bus master".
> >
> > Introduction
> > ------------
> >
> > AF_BUS is based upon AF_UNIX but extended for multicast operation and
> > removes stream operation, responding to extensive feedback on
> > previous approaches we have made the implementation as isolated as
> > possible. There are opportunities in the future to integrate the
> > socket garbage collector with that of the unix socket implementation.
> >
> > The impetus for creating this IPC mechanism is to replace the
> > underlying transport for D-Bus. The D-Bus system currently emulates
> > this IPC mechanism using AF_UNIX sockets in userspace and has
> > numerous undesirable behaviours. D-Bus is now widely deployed in many
> > areas and has become a de-facto IPC standard. Using this IPC
> > mechanism as a transport gives a significant (100% or more)
> > improvement to throughput with comparable improvement to latency.
> 
> Your introduction is missing a comprehensive "Discussion" section, where 
> you compare the AF_UNIX based implementation with AF_BUS ones. 
> 
> You should elaborate on each of the above noted undesirable behaviours, 
> why and how AF_BUS is advantageous. Show the workarounds, that are 
> needed by AF_UNIX to operate (properly?!?) and how the new 
> implementation is going to improve this situation.

Hi Hans-Peter,

Thanks for your feedback. I would like to elaborate on the priority
inversion and on the latency.

Priority inversion:
===================

A bus can have users with different priorities. The classical example was
Nokia's N900 phone. A incoming phone call should query the contact 
database, start the correct ringtone, display the correct avatar very
quickly. Other background tasks don't have the same priority. Since all
messages go through dbus-daemon, it is a single bottleneck and the
kernel has no way to schedule the processes with the correct
priorities. Low priority messages are waking up dbus-daemon as much as
high priority messages.

A workaround was to set the nice level of dbus-daemon to -5. It didn't
really address the priority inversion, but it reduces the number of
context switches on multicast messages, and that helped a bit. The
diagram "Experiment #3" on this blog post shows dbus-daemon is no
longer context switched for every recipient of a multicast message:
http://alban-apinc.blogspot.co.uk/2011/12/importance-of-scheduling-priority-in-d.html

With AF_BUS, there is no single process who has to receive all messages
from low priority processes and high priority processes. The kernel can
schedule the high priority processes and they can progress in their
communication without having dbus-daemon involved.

Latency:
========

On AF_UNIX, a message round-trip would go like this:
- the sender sends a message to dbus-daemon
- dbus-daemon receives it and forward it to the correct recipient
- the recipient receives it and reply with a new message sent to
  dbus-daemon
- dbus-daemon receives the reply and forward it to the initial sender
- the sender receives the reply.
There is a total of 4 context switches.

On AF_BUS, the messages are most of the time not routed by dbus-daemon,
this halves the number of context switches. It reduced the latency and
brought the performance improvement mentioned by Vincent.

> This will help to get some progress into the indurated discussion here.
> 
> Please also note, that, while your aims are nice and sound, it's even 
> more important for IPC mechanisms to operate properly - even during 
> persisting error conditions (crashed bus master and clients, 
> misbehaving or even abusing members). It would be cool to create a 
> D-BUS test rig, that not only measures performance numbers, but also 
> checks for dead locks, corner cases and abuse attempts in both IPC 
> implementations.
> 
> It's a juggling act: while AF_UNIX might suffer from downsides, the code 
> is heavily exercised in every aspect. Your implementation will only be 
> exercised by a handful of users (basically one lib), but in order to 
> rectify its existence in kernel space, such extensions need different 
> kinds of users, and the basic concepts need to fit in the whole kernel 
> picture as well, or you need to call it AF_DBUS with even less chance 
> to get it into mainstream.

I am hoping there will be more users with different use-cases and it
should help to improve AF_BUS and fix the unavoidable bugs in a young
code. I would be happy if AF_BUS reduces the cost of maintaining the
out-of-tree multicast messaging protocol family based on AF_UNIX
mentioned by Chris Friesen.

Thank you!
Alban

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets.
  2012-06-29 16:45 ` [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets Vincent Sanders
@ 2012-07-02 17:44   ` Ben Hutchings
  2012-07-03 12:11     ` Alban Crequy
  0 siblings, 1 reply; 53+ messages in thread
From: Ben Hutchings @ 2012-07-02 17:44 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas

On Fri, 2012-06-29 at 17:45 +0100, Vincent Sanders wrote:
> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> 
> This patch adds a garbage collector for AF_BUS sockets.
[...]
> +struct sock *bus_get_socket(struct file *filp)
> +{
> +	struct sock *u_sock = NULL;
> +	struct inode *inode = filp->f_path.dentry->d_inode;
> +
> +	/*
> +	 *	Socket ?
> +	 */
> +	if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) {
> +		struct socket *sock = SOCKET_I(inode);
> +		struct sock *s = sock->sk;
> +
> +		/*
> +		 *	PF_BUS ?
> +		 */
> +		if (s && sock->ops && sock->ops->family == PF_BUS)
> +			u_sock = s;
> +	}
> +	return u_sock;
> +}
[...]

What about references cycles involving both AF_BUS and AF_UNIX sockets?
I think you must either specifically prevent passing AF_UNIX sockets
through AF_BUS sockets, or make a single garbage collector handle them
both.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets.
  2012-07-02 17:44   ` Ben Hutchings
@ 2012-07-03 12:11     ` Alban Crequy
  0 siblings, 0 replies; 53+ messages in thread
From: Alban Crequy @ 2012-07-03 12:11 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller,
	Javier Martinez Canillas

Mon, 2 Jul 2012 18:44:23 +0100,
Ben Hutchings <bhutchings@solarflare.com> wrote :

> On Fri, 2012-06-29 at 17:45 +0100, Vincent Sanders wrote:
> > From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> > 
> > This patch adds a garbage collector for AF_BUS sockets.
> [...]
> > +struct sock *bus_get_socket(struct file *filp)
> > +{
> > +	struct sock *u_sock = NULL;
> > +	struct inode *inode = filp->f_path.dentry->d_inode;
> > +
> > +	/*
> > +	 *	Socket ?
> > +	 */
> > +	if (S_ISSOCK(inode->i_mode) && !(filp->f_mode &
> > FMODE_PATH)) {
> > +		struct socket *sock = SOCKET_I(inode);
> > +		struct sock *s = sock->sk;
> > +
> > +		/*
> > +		 *	PF_BUS ?
> > +		 */
> > +		if (s && sock->ops && sock->ops->family == PF_BUS)
> > +			u_sock = s;
> > +	}
> > +	return u_sock;
> > +}
> [...]
> 
> What about references cycles involving both AF_BUS and AF_UNIX
> sockets? I think you must either specifically prevent passing AF_UNIX
> sockets through AF_BUS sockets, or make a single garbage collector
> handle them both.

Indeed. Thanks for the feedback.

As far as I know, the current users of fd passing in D-Bus are Bluez
and Ofono and they pass AF_BLUETOOTH sockets. There might be
others, I am not sure what is in the wild. Passing AF_UNIX sockets in
D-Bus would be useful for Telepathy (for Tubes and File Transfer).

So I would like to be able to pass AF_UNIX sockets through AF_BUS
sockets.

I wrote the following small program to test this bug based on the
previous code <https://lkml.org/lkml/2010/11/25/8> referred by commit
25888e30:
http://people.collabora.com/~alban/d/2012/07/fd-passing/cmsg.c

The effect of the bug is to reach the file limit
in /proc/sys/fs/file-max and print "VFS: file-max limit 101771 reached"
even when the maximum of file descriptors per process ("ulimit -n") is
small.

I am not sure what is the best way to fix this. The easiest could be to
move the garbage collector related fields (recursion_level,
gc_candidate, etc.) from struct unix_sock and struct bus_sock to
struct sock and make a generic garbage collector for all sockets.

Best regards,
Alban

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
  2012-07-02 15:43     ` Javier Martinez Canillas
@ 2012-07-04 17:30       ` Pablo Neira Ayuso
  2012-07-05 17:54         ` Javier Martinez Canillas
  0 siblings, 1 reply; 53+ messages in thread
From: Pablo Neira Ayuso @ 2012-07-04 17:30 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller, Alban Crequy

On Mon, Jul 02, 2012 at 05:43:43PM +0200, Javier Martinez Canillas wrote:
> On 06/29/2012 07:11 PM, Pablo Neira Ayuso wrote:
> > On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
> >> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> >> 
> >> The netfilter D-Bus module needs to parse D-bus messages sent by
> >> applications to decide whether a peer can receive or not a D-Bus
> >> message. Add D-bus message parsing logic to be able to analyze.
> > 
> > Not talking about the entire patchset, only about the part I'm
> > responsible for.
> > 
> > I don't see why you think this belong to netfilter at all.
> > 
> > This doesn't integrate into the existing filtering infrastructure,
> > neither it extends it in any way.
> > 
> 
> Hello Pablo,
> 
> Thanks a lot for your feedback.
> 
> This is the first of a set of patches that adds a netfilter module to parse
> D-Bus messages, the complete patch-set is:
> 
> [PATCH 13/15] netfilter: nfdbus: Add D-bus message parsing
> [PATCH 14/15] netfilter: nfdbus: Add D-bus match rule implementation
> [PATCH 15/15] netfilter: add netfilter D-Bus module
> 
> patches 13 and 14 just include D-Bus helper code to be used by the netfilter
> module (added on patch 15) and specially the dbus_filter netfilter hook function.

I see, the use of the netfilter hooks seems to be the only reason why
you consider these chunks belong to netfilter.

> For the next post version we will reorganize the patches so first the D-Bus
> netfilter module is added with an empty dbus_filter function and then added the
> D-Bus helper code.
> 
> Also, we will move the nfdbus netfilter module to net/bus so is not inside the
> netfilter core code.

Yes, please, remove this stuff from my directory tree, I believe this
filtering infrastructure has not much to do with Netfilter itself.

It uses the connector to communicate kernel <-> userspace instead of
nfnetlink and, as said, it does neither integrate into existing
filtering kernel/userspace infrastructure nor integrates into it.

So, please, if you plan to give another try to this patchset, move
this to your net/bus directory as you propose and find a different
(better) name for the filtering part (just to avoid confusion in the
future).

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
                   ` (18 preceding siblings ...)
  2012-06-30 20:41 ` Hans-Peter Jansen
@ 2012-07-05  7:59 ` Linus Walleij
  2012-07-05 16:01   ` Daniel Walker
  19 siblings, 1 reply; 53+ messages in thread
From: Linus Walleij @ 2012-07-05  7:59 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Arve Hjønnevåg,
	Daniel Walker, John Stultz, Anton Vorontsov, Greg Kroah-Hartman

2012/6/29 Vincent Sanders <vincent.sanders@collabora.co.uk>:

> AF_BUS is a message oriented inter process communication system.

We have a very huge and important in-kernel IPC message passer
in drivers/staging/android/binder.c

It's deployed in some 400 million devices according to latest reports.
John Stultz & Anton Vorontsov are trying to look after these Android
drivers a bit...

I and others discussed this in the past with the Android folks. Dianne
makes an excellent summary of how it works here:
https://lkml.org/lkml/2009/6/25/3

If we could all be convinced that this thing also fulfills the needs
of what binder does, this is a pretty solid case for it too. I can
sure see that some of the shortcuts that Android is taking with
binder try to address the same issue of high-speed IPC loopholes
through the kernel and some kind of security model.

Whether Android would actually use it (or wrap it) is a totally
different question, but what I think we need to know is whether it
*could*. And staging code has to move forward, maybe this
is the direction it should move?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-05  7:59 ` Linus Walleij
@ 2012-07-05 16:01   ` Daniel Walker
  0 siblings, 0 replies; 53+ messages in thread
From: Daniel Walker @ 2012-07-05 16:01 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Vincent Sanders, netdev, linux-kernel, David S. Miller,
	Arve Hjønnevåg, John Stultz, Anton Vorontsov,
	Greg Kroah-Hartman

On Thu, Jul 05, 2012 at 09:59:53AM +0200, Linus Walleij wrote:
> 2012/6/29 Vincent Sanders <vincent.sanders@collabora.co.uk>:
> 
> > AF_BUS is a message oriented inter process communication system.
> 
> We have a very huge and important in-kernel IPC message passer
> in drivers/staging/android/binder.c
> 
> It's deployed in some 400 million devices according to latest reports.
> John Stultz & Anton Vorontsov are trying to look after these Android
> drivers a bit...
> 
> I and others discussed this in the past with the Android folks. Dianne
> makes an excellent summary of how it works here:
> https://lkml.org/lkml/2009/6/25/3
> 
> If we could all be convinced that this thing also fulfills the needs
> of what binder does, this is a pretty solid case for it too. I can
> sure see that some of the shortcuts that Android is taking with
> binder try to address the same issue of high-speed IPC loopholes
> through the kernel and some kind of security model.
> 
> Whether Android would actually use it (or wrap it) is a totally
> different question, but what I think we need to know is whether it
> *could*. And staging code has to move forward, maybe this
> is the direction it should move?

I'm all for alternatives.. I haven't read this thread at all, but I read an LWN
article comparing Binder and other implementations .. So there are for sure
alternatives. It would be nice if things were included that did whatever binder
needs .. Since I did the logger performance analysis the big questions to me is
if Binder is actually fast (or faster than the alternatives). Whatever this
AF_BUS things is reviewing the performance of the major alternative(s) is probably a
good idea.

In terms of Android using anything we produce or incorporate, don't get
your hopes up .. They will always just use Binder .. (John is good cop, I'm bad
cop)

Daniel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
  2012-07-04 17:30       ` Pablo Neira Ayuso
@ 2012-07-05 17:54         ` Javier Martinez Canillas
  0 siblings, 0 replies; 53+ messages in thread
From: Javier Martinez Canillas @ 2012-07-05 17:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Javier Martinez Canillas, Vincent Sanders, netdev, linux-kernel,
	David S. Miller, Alban Crequy

On Wed, Jul 4, 2012 at 7:30 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Jul 02, 2012 at 05:43:43PM +0200, Javier Martinez Canillas wrote:
>> On 06/29/2012 07:11 PM, Pablo Neira Ayuso wrote:
>> > On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
>> >> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
>> >>
>> >> The netfilter D-Bus module needs to parse D-bus messages sent by
>> >> applications to decide whether a peer can receive or not a D-Bus
>> >> message. Add D-bus message parsing logic to be able to analyze.
>> >
>> > Not talking about the entire patchset, only about the part I'm
>> > responsible for.
>> >
>> > I don't see why you think this belong to netfilter at all.
>> >
>> > This doesn't integrate into the existing filtering infrastructure,
>> > neither it extends it in any way.
>> >
>>
>> Hello Pablo,
>>
>> Thanks a lot for your feedback.
>>
>> This is the first of a set of patches that adds a netfilter module to parse
>> D-Bus messages, the complete patch-set is:
>>
>> [PATCH 13/15] netfilter: nfdbus: Add D-bus message parsing
>> [PATCH 14/15] netfilter: nfdbus: Add D-bus match rule implementation
>> [PATCH 15/15] netfilter: add netfilter D-Bus module
>>
>> patches 13 and 14 just include D-Bus helper code to be used by the netfilter
>> module (added on patch 15) and specially the dbus_filter netfilter hook function.
>
> I see, the use of the netfilter hooks seems to be the only reason why
> you consider these chunks belong to netfilter.
>
>> For the next post version we will reorganize the patches so first the D-Bus
>> netfilter module is added with an empty dbus_filter function and then added the
>> D-Bus helper code.
>>
>> Also, we will move the nfdbus netfilter module to net/bus so is not inside the
>> netfilter core code.
>
> Yes, please, remove this stuff from my directory tree, I believe this
> filtering infrastructure has not much to do with Netfilter itself.
>
> It uses the connector to communicate kernel <-> userspace instead of
> nfnetlink and, as said, it does neither integrate into existing
> filtering kernel/userspace infrastructure nor integrates into it.
>
> So, please, if you plan to give another try to this patchset, move
> this to your net/bus directory as you propose and find a different
> (better) name for the filtering part (just to avoid confusion in the
> future).
>
> Thanks.

Hi Pablo,

Thanks a lot for your feedback and comments.

On a next patch-set post I'll move to net/bus and change the name as
you suggest.

Best regards,

-- 
Javier Martínez Canillas
(+34) 682 39 81 69
Barcelona, Spain

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-06-29 23:12   ` Vincent Sanders
  2012-06-29 23:18     ` David Miller
@ 2012-07-05 21:06     ` Jan Engelhardt
  2012-07-06 18:27       ` Chris Friesen
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Engelhardt @ 2012-07-05 21:06 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: David Miller, netdev, linux-kernel


On Saturday 2012-06-30 01:12, Vincent Sanders wrote:
>
>Firstly it is intended is an interprocess mechanism and not to rely on
>a configured IP system, indeed one of its primary usages is to
>provide mechanism for various tools to set up IP networking.

Using IP as a localhost IPC is not uncommon (independent of
software preferring AF_UNIX, if so available). Distro boot
scripts have been running `ip addr add ::1/128 dev lo`
all these years along.

And now we suddently need a DBUS program just to configure
IP-based localhost IPC? I can see the flaw in that.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-05 21:06     ` Jan Engelhardt
@ 2012-07-06 18:27       ` Chris Friesen
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Friesen @ 2012-07-06 18:27 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Vincent Sanders, David Miller, netdev, linux-kernel

> On Saturday 2012-06-30 01:12, Vincent Sanders wrote:
>> Firstly it is intended is an interprocess mechanism and not to rely on
>> a configured IP system, indeed one of its primary usages is to
>> provide mechanism for various tools to set up IP networking.
> Using IP as a localhost IPC is not uncommon (independent of
> software preferring AF_UNIX, if so available). Distro boot
> scripts have been running `ip addr add ::1/128 dev lo`
> all these years along.
>
> And now we suddently need a DBUS program just to configure
> IP-based localhost IPC? I can see the flaw in that.
>

I haven't tried it in a while but it used to be that you couldn't use IP 
multicast on the "lo" device.  Has that been fixed?

Chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets
  2012-06-29 16:45 ` [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets Vincent Sanders
@ 2012-07-09  3:32   ` James Morris
  2012-07-09 18:02   ` Paul Moore
  1 sibling, 0 replies; 53+ messages in thread
From: James Morris @ 2012-07-09  3:32 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas,
	Paul Moore

On Fri, 29 Jun 2012, Vincent Sanders wrote:

> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> 
> AF_BUS implements a security hook bus_connect() to be used by LSM to
> enforce connectivity security policies.

Nack.

Changes to LSM should be cc'd the LSM list and Paul Moore.



- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets
  2012-06-29 16:45 ` [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets Vincent Sanders
  2012-07-09  3:32   ` James Morris
@ 2012-07-09 18:02   ` Paul Moore
  1 sibling, 0 replies; 53+ messages in thread
From: Paul Moore @ 2012-07-09 18:02 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas, casey

On Friday, June 29, 2012 05:45:43 PM Vincent Sanders wrote:
> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> 
> AF_BUS implements a security hook bus_connect() to be used by LSM to
> enforce connectivity security policies.
> 
> Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>

In future postings, please reorder the patchset so that this patch (and the 
LSM specific patches) are applied after the actual AF_BUS implementation 
(patch 08/15 in this patchset).  This makes it easier to quickly understand 
how the LSM hooks/implementation interacts with the AF_BUS code.

A good rule of thumb that I try to follow when submitting large patchsets is 
that each patch should contain code that won't be optimized away during the 
build because there is no caller.  Sometimes that isn't possible without 
making things overly awkward, but in this particular case it shouldn't cause a 
problem.

> ---
>  include/linux/security.h |   11 +++++++++++
>  security/capability.c    |    7 +++++++
>  security/security.c      |    7 +++++++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 4e5a73c..d30dc4a 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h

...

> +static inline int security_bus_connect(struct socket *sock,
> +				       struct sock *other,
> +				       struct sock *newsk)
> +{
> +	return 0;
> +}
> +

Other than the AF_UNIX specific name, is there a reason why you chose not to 
reuse the unix_stream_connect() LSM hook?  The arguments are the same, and 
based on an initial quick review of the SELinux hook implementations they 
appear to do almost identical things; the permissions are different but it 
should be trivial to make that conditional on the parent socket's address 
family (SELinux does similar things with other socket operations).  Looking at 
the Smack implementation, I don't think it would be a problem there either 
(CC'd Casey for his thoughts).

I'm still reviewing the rest of the AF_BUS patches but wanted to ask this now 
in case I was missing something.

-- 
paul moore
www.paul-moore.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks
  2012-06-29 16:45 ` [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks Vincent Sanders
@ 2012-07-09 18:38   ` Paul Moore
  0 siblings, 0 replies; 53+ messages in thread
From: Paul Moore @ 2012-07-09 18:38 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Javier Martinez Canillas

On Friday, June 29, 2012 05:45:44 PM Vincent Sanders wrote:
> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> 
> Add Security-Enhanced Linux (SELinux) hook for AF_BUS socket address family.
> 
> Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
> Signed-off-by: Vincent Sanders <vincent.sanders@collabora.co.uk>

It would be very helpful to include a description of how the access controls 
would work.

>From looking at the other patches, it would appear that when a new socket 
tries to connect to the AF_BUS bus it is checked against the security label of 
the bus master, yes?  Further, if no bus master is present, the connect() is 
denied at the AF_BUS level in the bus_connect() function, yes?

Have you considered the socket_getpeersec_dgram() hook?  Since AF_BUS does not 
appear to be stream oriented I think you can safely ignore 
socket_getpeersec_stream().

Have you considered the unix_may_send() hook?  Ignoring the AF_UNIX specific 
name, it seems like a reasonable hook for AF_BUS; unless you don't expect to 
have any read-only AF_BUS clients in which case the connect() hook should be 
enough (it would implicitly grant read/write access to each socket in that 
case).

Finally, as others have said, you need to ensure that you CC the LSM and 
SELinux lists on the relevant patches as well as provide LSM hook 
implementations for LSMs other than SELinux where it makes sense (not all LSMs 
will require implementations for the new hooks). 

> ---
>  security/selinux/hooks.c |   35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 4ee6f23..5bacbe2 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -67,6 +67,7 @@
>  #include <linux/quota.h>
>  #include <linux/un.h>		/* for Unix socket types */
>  #include <net/af_unix.h>	/* for Unix socket types */
> +#include <net/af_bus.h>	/* for Bus socket types */
>  #include <linux/parser.h>
>  #include <linux/nfs_mount.h>
>  #include <net/ipv6.h>
> @@ -4101,6 +4102,39 @@ static int selinux_socket_unix_may_send(struct socket
> *sock, &ad);
>  }
> 
> +static int selinux_socket_bus_connect(struct sock *sock, struct sock
> *other, +				      struct sock *newsk)
> +{
> +	struct sk_security_struct *sksec_sock = sock->sk_security;
> +	struct sk_security_struct *sksec_other = other->sk_security;
> +	struct sk_security_struct *sksec_new = newsk->sk_security;
> +	struct common_audit_data ad;
> +	struct lsm_network_audit net = {0,};
> +	int err;
> +
> +	ad.type = LSM_AUDIT_DATA_NET;
> +	ad.u.net = &net;
> +	ad.u.net->sk = other;
> +
> +	err = avc_has_perm(sksec_sock->sid, sksec_other->sid,
> +			   sksec_other->sclass,
> +			   UNIX_STREAM_SOCKET__CONNECTTO, &ad);

See my earlier comments about the similarities between this new hook and the 
existing AF_UNIX hook.  The fact that you are reusing the 
UNIX_STREAM_SOCKET__CONNECTTO permission (which is likely a no-no BTW) only 
reinforces the similarities between the two.

> +	if (err)
> +		return err;
> +
> +	/* server child socket */
> +	sksec_new->peer_sid = sksec_sock->sid;
> +	err = security_sid_mls_copy(sksec_other->sid, sksec_sock->sid,
> +				    &sksec_new->sid);
> +	if (err)
> +		return err;
> +
> +	/* connecting socket */
> +	sksec_sock->peer_sid = sksec_new->sid;
> +
> +	return 0;
> +}
> +
>  static int selinux_inet_sys_rcv_skb(int ifindex, char *addrp, u16 family,
>  				    u32 peer_sid,
>  				    struct common_audit_data *ad)
> @@ -5643,6 +5677,7 @@ static struct security_operations selinux_ops = {
> 
>  	.unix_stream_connect =		selinux_socket_unix_stream_connect,
>  	.unix_may_send =		selinux_socket_unix_may_send,
> +	.bus_connect =		        selinux_socket_bus_connect,
> 
>  	.socket_create =		selinux_socket_create,
>  	.socket_post_create =		selinux_socket_post_create,

-- 
paul moore
www.paul-moore.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-03 16:52 ` Chris Friesen
@ 2012-07-03 17:18   ` Chris Friesen
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Friesen @ 2012-07-03 17:18 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: David Miller, vincent.sanders, netdev, linux-kernel

On 07/03/2012 10:52 AM, Chris Friesen wrote:

> To be fair, since it was implemented as a separate protocol family the
> maintenance burden actually hasn't been large--it's been fairly simple
> to port between versions. Also, we do embedded telecom stuff and don't
> jump kernel versions all that often. (It's a big headache, requires
> coordinating between multiple vendors, etc.)
>
> In our case we typically send small (100-200 byte) messages to a
> smallish (1-10) number of listeners, though there are exceptions of
> course. Back before I started the original implementation used a
> userspace daemon, but it had a number of issues. Originally I was
> focussed on the performance gains but I must admit that since then other
> factors have made that less of an issue.

I should point out that some of the other factors that have been 
discussed for AF_BUS also hold true for our implementation:

--strict ordering
--reliable (in our case, if the sender has space in the tx buffer then 
messages get to all recipients with buffer space, there are kernel logs 
if recipients don't have space)

Also, the fact that it's in the kernel rather than a userspace daemon 
reduces priority inversion type issues.  Presumably this would apply to 
an IP-multicast based solution as well.

One problem that I ran into back when I was experimenting with this 
stuff was trying to isolate host-local IP multicast from the rest of the 
network.  It would be suboptimal to need to set up filtering and such 
before being able to use the communication protocol.

Chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
  2012-07-02 15:18 Javier Martinez Canillas
@ 2012-07-03 16:52 ` Chris Friesen
  2012-07-03 17:18   ` Chris Friesen
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Friesen @ 2012-07-03 16:52 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: David Miller, vincent.sanders, netdev, linux-kernel

On 07/02/2012 09:18 AM, Javier Martinez Canillas wrote:

> We tried different approaches before developing the AF_BUS socket family and one
> of them was extending AF_UNIX to support multicast. We posted our patches [1]
> and the feedback was that the AF_UNIX code was already a complex and difficult
> code to maintain. So, we decided to implement a new family (AF_BUS) that is
> orthogonal to the rest of the networking stack and no added complexity nor
> performance penalty would pay a user not using our IPC solution.

That's what I ended up doing as well.  In our case it's basically a 
stripped-down AF_UNIX with only datagram support, no security, no fd 
passing, etc., but with with the addition of multicast and wildcard (for 
debugging).

> Looking at netdev archives I saw that you both raised the question about
> multicast on unix sockets and post an implementation on early 2003. So if I
> understand correctly you are maintaining an out-of-tree solution for around 9
> years now.

That's correct.

> It would be a great help if you can join the discussion and explain the
> arguments of your company (and the others companies you were talking about) in
> favor of a simpler multicast socket family.
>
> The fact that your company spent lots of engineering resources to maintain an
> out-of-tree patch-set for 9 years should raise some eyebrows and convince more
> than one people that a simpler local multicast solution is needed on the Linux
> kernel (which was one of the reasons why Google also developed Binder I guess).

To be fair, since it was implemented as a separate protocol family the 
maintenance burden actually hasn't been large--it's been fairly simple 
to port between versions.  Also, we do embedded telecom stuff and don't 
jump kernel versions all that often.  (It's a big headache, requires 
coordinating between multiple vendors, etc.)

In our case we typically send small (100-200 byte) messages to a 
smallish (1-10) number of listeners, though there are exceptions of 
course.  Back before I started the original implementation used a 
userspace daemon, but it had a number of issues.  Originally I was 
focussed on the performance gains but I must admit that since then other 
factors have made that less of an issue.

Among other things, this messaging is used on some systems to configure 
the IP addressing for the system, so it does simplify things to not use 
an IP-based protocol for this purpose.

Also, back when I did my original implementation IP multicast wasn't 
supported on the loopback device--David, has that changed since then? 
If it has, then we probably could figure out a way to make it work using 
IP multicast, but I don't know that it would be worth the effort given 
the minimal ongoing maintenance costs for our patch.

Chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: AF_BUS socket address family
@ 2012-07-02 15:18 Javier Martinez Canillas
  2012-07-03 16:52 ` Chris Friesen
  0 siblings, 1 reply; 53+ messages in thread
From: Javier Martinez Canillas @ 2012-07-02 15:18 UTC (permalink / raw)
  To: Chris Friesen, David Miller, vincent.sanders, netdev, linux-kernel



On Mon, Jul 2, 2012 at 6:49 AM, Chris Friesen <chris.friesen@genband.com> wrote:
> On 06/29/2012 05:18 PM, David Miller wrote:
>>
>> From: Vincent Sanders<vincent.sanders@collabora.co.uk>
>> Date: Sat, 30 Jun 2012 00:12:37 +0100
>>
>>> I had hoped you would have at least read the opening list where I
>>> outlined the underlying features which explain why none of the
>>> existing IPC match the requirements.
>>
>> I had hoped that you had read the part we told you last time where
>> we explained why multicast and "reliable delivery" are fundamentally
>> incompatible attributes.
>>
>> We are not creating a full address family in the kernel which exists
>> for one, and only one, specific and difficult user.
>
>
> For what it's worth, the company I work for (and a number of other
> companies) currently use an out-of-tree datagram multicast messaging
> protocol family based on AF_UNIX.
>
> If AF_BUS were to be accepted, it would be essentially trivial for us to
> port our existing userspace messaging library to use it instead of our
> current protocol family, and we would almost certainly do so.
>
> I'd love to see AF_BUS go in.
>
> Chris Friesen
>

Hi Chris,

Thanks a lot for your comments and feedback.

We tried different approaches before developing the AF_BUS socket family and one
of them was extending AF_UNIX to support multicast. We posted our patches [1]
and the feedback was that the AF_UNIX code was already a complex and difficult
code to maintain. So, we decided to implement a new family (AF_BUS) that is
orthogonal to the rest of the networking stack and no added complexity nor
performance penalty would pay a user not using our IPC solution.

Looking at netdev archives I saw that you both raised the question about
multicast on unix sockets and post an implementation on early 2003. So if I
understand correctly you are maintaining an out-of-tree solution for around 9
years now.

We developed AF_BUS to improve the performance of the D-Bus IPC system (and our
results show us a 2X speedup) but design it to be as generic as possible so
other users can take advantage of it.

It would be a great help if you can join the discussion and explain the
arguments of your company (and the others companies you were talking about) in
favor of a simpler multicast socket family.

The fact that your company spent lots of engineering resources to maintain an
out-of-tree patch-set for 9 years should raise some eyebrows and convince more
than one people that a simpler local multicast solution is needed on the Linux
kernel (which was one of the reasons why Google also developed Binder I guess).

[1]: https://lkml.org/lkml/2012/2/20/84
[2]: https://lkml.org/lkml/2003/2/27/150
[3]: http://lwn.net/Articles/27001/

Thanks a lot and best regards,

Javier

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2012-07-09 18:38 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-29 16:45 AF_BUS socket address family Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 01/15] net: bus: Add " Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 02/15] net: bus: Add documentation for AF_BUS Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 03/15] net: bus: Add AF_BUS socket and address definitions Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 04/15] security: Add Linux Security Modules hook for AF_BUS sockets Vincent Sanders
2012-07-09  3:32   ` James Morris
2012-07-09 18:02   ` Paul Moore
2012-06-29 16:45 ` [PATCH net-next 05/15] security: selinux: Add AF_BUS socket SELinux hooks Vincent Sanders
2012-07-09 18:38   ` Paul Moore
2012-06-29 16:45 ` [PATCH net-next 06/15] netfilter: Add NFPROTO_BUS hook constant for AF_BUS socket family Vincent Sanders
2012-07-01  2:15   ` Jan Engelhardt
2012-06-29 16:45 ` [PATCH net-next 07/15] scm: allow AF_BUS sockets to send ancillary data Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 08/15] net: bus: Add implementation of Bus domain sockets Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 09/15] net: bus: Add garbage collector for AF_BUS sockets Vincent Sanders
2012-07-02 17:44   ` Ben Hutchings
2012-07-03 12:11     ` Alban Crequy
2012-06-29 16:45 ` [PATCH net-next 10/15] net: bus: Add the AF_BUS socket address family to KBuild Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 11/15] netlink: connector: implement cn_netlink_reply Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 12/15] netlink: connector: Add idx and val identifiers for netfilter D-Bus Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing Vincent Sanders
2012-06-29 17:11   ` Pablo Neira Ayuso
2012-07-02 15:43     ` Javier Martinez Canillas
2012-07-04 17:30       ` Pablo Neira Ayuso
2012-07-05 17:54         ` Javier Martinez Canillas
2012-06-29 16:45 ` [PATCH net-next 14/15] netfilter: nfdbus: Add D-bus match rule implementation Vincent Sanders
2012-06-29 16:45 ` [PATCH net-next 15/15] netfilter: add netfilter D-Bus module Vincent Sanders
2012-06-29 18:16 ` AF_BUS socket address family Chris Friesen
2012-06-29 19:33   ` Ben Hutchings
2012-06-29 18:45 ` Casey Schaufler
2012-06-29 23:22   ` Vincent Sanders
2012-06-29 22:36 ` David Miller
2012-06-29 23:12   ` Vincent Sanders
2012-06-29 23:18     ` David Miller
2012-06-29 23:42       ` Vincent Sanders
2012-06-29 23:50         ` David Miller
2012-06-30  0:09           ` Vincent Sanders
2012-06-30 13:12           ` Alan Cox
2012-07-01  0:33             ` David Miller
2012-07-01 14:16               ` Alan Cox
2012-07-01 21:45                 ` David Miller
2012-06-30  0:13         ` Benjamin LaHaise
2012-06-30 12:52           ` Alan Cox
2012-07-02 14:51             ` Vincent Sanders
2012-07-02  4:49       ` Chris Friesen
2012-07-05 21:06     ` Jan Engelhardt
2012-07-06 18:27       ` Chris Friesen
2012-06-30 20:41 ` Hans-Peter Jansen
2012-07-02 16:46   ` Alban Crequy
2012-07-05  7:59 ` Linus Walleij
2012-07-05 16:01   ` Daniel Walker
2012-07-02 15:18 Javier Martinez Canillas
2012-07-03 16:52 ` Chris Friesen
2012-07-03 17:18   ` Chris Friesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.