* [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX @ 2012-02-20 15:57 Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 01/10] af_unix: Documentation on multicast unix sockets Javier Martinez Canillas ` (4 more replies) 0 siblings, 5 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-20 15:57 UTC (permalink / raw) To: David S. Miller Cc: Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Rodrigo Moya, Sjoerd Simons, netdev, linux-kernel This patch-set add multicast support to Unix domain socket familiy for datagram and seqpacket sockets. This work was made by Alban Crequy as a result of a research we have been doing to improve the performance of the D-bus IPC system. The first approach was to create a new AF_DBUS socket address family and move the routing logic of the D-bus daemon to the kernel. The motivations behind that approach and the thread of the patches post can be found in [1] and [2]. The feedback was that having D-bus specific code in the kernel is a bad idea so the second approach was to implement multicast Unix domain sockets so clients can directly send messages to peers bypassing the D-bus daemon. A previous version of the patches was already posted by Alban [3] who also has a good explanation of the implementation on his blog [4]. [1]http://alban-apinc.blogspot.com/2011/12/d-bus-in-kernel-faster.html [2]http://thread.gmane.org/gmane.linux.kernel/1040481 [3]http://thread.gmane.org/gmane.linux.network/178772 [4]http://alban-apinc.blogspot.com/2011/12/introducing-multicast-unix-sockets.html The patch-set is composed of the following patches: [PATCH 01/10] af_unix: Documentation on multicast unix sockets [PATCH 02/10] af_unix: Add constant for unix socket options level [PATCH 03/10] af_unix: add setsockopt on unix sockets [PATCH 04/10] af_unix: create, join and leave multicast groups with setsockopt [PATCH 05/10] af_unix: find the recipients of a multicast group [PATCH 06/10] af_unix: Deliver message to several recipients in case of multicast [PATCH 07/10] af_unix: implement poll(POLLOUT) for multicast sockets [PATCH 08/10] af_unix: Unsubscribe sockets from their multicast groups on RCV_SHUTDOWN [PATCH 09/10] Allow server side of SOCK_SEQPACKET sockets to accept a new member [PATCH 10/10] af_unix: Add a peer BPF for multicast Unix sockets ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 01/10] af_unix: Documentation on multicast unix sockets 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas @ 2012-02-20 15:57 ` Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 02/10] af_unix: Add constant for unix socket options level Javier Martinez Canillas ` (3 subsequent siblings) 4 siblings, 0 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-20 15:57 UTC (permalink / raw) To: David S. Miller Cc: Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Rodrigo Moya, Sjoerd Simons, netdev, linux-kernel From: Alban Crequy <alban.crequy@collabora.co.uk> Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> --- .../networking/multicast-unix-sockets.txt | 180 ++++++++++++++++++++ 1 files changed, 180 insertions(+), 0 deletions(-) create mode 100644 Documentation/networking/multicast-unix-sockets.txt diff --git a/Documentation/networking/multicast-unix-sockets.txt b/Documentation/networking/multicast-unix-sockets.txt new file mode 100644 index 0000000..ec9a19c --- /dev/null +++ b/Documentation/networking/multicast-unix-sockets.txt @@ -0,0 +1,180 @@ +Multicast Unix sockets +====================== + +Multicast is implemented on SOCK_DGRAM and SOCK_SEQPACKET Unix sockets. + +An userspace application can create a multicast group with: + + struct unix_mreq mreq = {0,}; + mreq.address.sun_family = AF_UNIX; + mreq.address.sun_path[0] = '\0'; + strcpy(mreq.address.sun_path + 1, "socket-address"); + + sockfd = socket(AF_UNIX, SOCK_DGRAM, 0); + ret = setsockopt(sockfd, SOL_UNIX, UNIX_CREATE_GROUP, &mreq, sizeof(mreq)); + +This allocates a struct unix_mcast_group, which is reference counted and exists +as long as the socket who created it exists or the group has at least one +member. + +SOCK_DGRAM sockets can join a multicast group with: + + ret = setsockopt(sockfd, SOL_UNIX, UNIX_JOIN_GROUP, &mreq, sizeof(mreq)); + +This allocates a struct unix_mcast, which holds the settings of the membership, +mainly whether loopback is enabled. A socket can be a member of several +multicast groups. + +Since SOCK_SEQPACKET sockets are connection-oriented the semantics are +different. A client cannot join a group but it can only connect and the +multicast accept socket is used to allow the peer to join the group with: + + ret = setsockopt(groupfd, SOL_UNIX, UNIX_CREATE_GROUP, &val, vallen); + ret = listen(groupfd, 10); + connfd = accept(sockfd, NULL, 0); + ret = setsockopt(connfd, SOL_UNIX, UNIX_ACCEPT_GROUP, &mreq, sizeof(mreq)); + +The socket is part of the multicast group until it is released, shutdown with +RCV_SHUTDOWN or it leaves explicitely the group: + + ret = setsockopt(sockfd, SOL_UNIX, UNIX_LEAVE_GROUP, &mreq, sizeof(mreq)); + +Struct unix_mcast nodes are linked in two RCU lists: +- (struct unix_sock)->mcast_subscriptions +- (struct unix_mcast_group)->mcast_members + + unix_mcast_group unix_mcast_group + | | + v v +unix_sock ----> unix_mcast ----> unix_mcast + | + v +unix_sock ----> unix_mcast + | + v +unix_sock ----> unix_mcast + + +SOCK_DGRAM semantics +==================== + + G The socket which created the group + / | \ + P1 P2 P3 The member sockets + +Messages sent to the group are received by all members except the sender itself +unless the sending socket has UNIX_MREQ_LOOPBACK set. + +Non-members can also send to the group socket G and the message will be +broadcast to the group members, however socket G does not receive messages sent +to the group, via it, itself. + + +SOCK_SEQPACKET semantics +======================== + +When a connection is performed on a SOCK_SEQPACKET multicast socket, a new +socket is created and its file descriptor is received by accept(). + + L The listening socket + / | \ + A1 A2 A3 The accepted sockets + | | | + C1 C2 C3 The connected sockets + +Messages sent on the C1 socket are received by: +- C1 itself if UNIX_MREQ_LOOPBACK is set. +- The peer socket A1 if UNIX_MREQ_SEND_TO_PEER is set. +- The other members of the multicast group C2 and C3. + +Only members can send to the group in this case. + + +Atomic delivery and ordering +============================ + +Each message sent is delivered atomically to either none of the recipients or +all the recipients, even with interruptions and errors. + +Locking is used in order to keep the ordering consistent on all recipients. We +want to avoid the following scenario. Two emitters A and B, and 2 recipients, C +and D: + + C D +A -------->| | Step 1: A's message is delivered to C +B -------->| | Step 2: B's message is delivered to C +B ---------|--->| Step 3: B's message is delivered to D +A ---------|--->| Step 4: A's message is delivered to D + +Result: - C received (A, B) + - D received (B, A) + +Although A and B had a list of recipients (C, D) in the same order, C and D +received the messages in a different order. To avoid this scenario, we need a +locking mechanism while the messages are being delivered with skb_queue_tail(). + +Solution 1: +The easiest implementation would be to use a global spinlock on the group, but +it creates an avoidable contention, especially when there are two independent +streams set up with socket filters; e.g. if A sends messages received only by +C, and B sends messages received only by D. + +Solution 2: +Fine-grained locking could be implemented with a spinlock on each recipient. +Before delivering the message to the recipients, the sender takes a spinlock on +each recipient at the same time. + +Taking several spinlocks on the same struct can be dangerous and leads to +deadlocks. This is prevented by sorting the list of sockets by memory address +and taking the spinlocks in that order. The ordered list of recipients is +computed on demand when a message is sent and the list is cached for +performance. When the group membership changes, the generation of the +membership is incremented and the ordered recipient list is invalidated. + +With this solution, the number of spinlocks taken simultaneously can be +arbitrary big. Whilst it works, it breaks the lockdep mechanism. + +Solution 3: +The current implementation is similar to solution 2 but with a limit on the +number of spinlocks taken simultaneously (8), so lockdep works fine. A hash +function and bit array with n=8 specifies which spinlocks to take. Contention +on independent streams can still happen but it is less likely. + + +Flow control +============ + +When a socket's receiving queue is full, the default behavior is to block +senders (or to return -EAGAIN on non-blocking sockets). The socket can also +join a multicast group with the flag UNIX_MREQ_DROP_WHEN_FULL. In this case, +messages sent to the group will not be delivered to that socket when its +receiving queue is full. + +Messages are still delivered atomically to all members who don't have the flag +UNIX_MREQ_DROP_WHEN_FULL. If send() returns -EAGAIN, nobody received the +message. If send() blocks because of one member, the other members don't +receive the message until all sockets (except those with +UNIX_MREQ_DROP_WHEN_FULL set) can receive at the same time. + +poll/epoll/select on POLLOUT events have a consistent behavior; they block if +at least one member of the multicast group without UNIX_MREQ_DROP_WHEN_FULL has +a full receiving queue. + + +Multicast socket reference counting +=================================== + +A poller for POLLOUT events can block for any member of the group. The poller +can use the wait queue "peer_wait" of any member. So it is important that Unix +sockets are not released before all pollers exit. This is achieved by: + +- Incrementing the reference counter of a socket when it joins a multicast + group. +- Decrementing it when the group is destroyed, that is when all + sockets keeping a reference on the group released their reference on the + group. + +struct unix_mcast_group keeps track of both current members and previous +members. When a socket leaves a group, it is removed from the members list and +put in the dead members list. This is done in order to take advantage of RCU +lists, which reduces lock contention. -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 02/10] af_unix: Add constant for unix socket options level 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 01/10] af_unix: Documentation on multicast unix sockets Javier Martinez Canillas @ 2012-02-20 15:57 ` Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 03/10] af_unix: add setsockopt on unix sockets Javier Martinez Canillas ` (2 subsequent siblings) 4 siblings, 0 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-20 15:57 UTC (permalink / raw) To: David S. Miller Cc: Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Rodrigo Moya, Sjoerd Simons, netdev, linux-kernel From: Alban Crequy <alban.crequy@collabora.co.uk> Assign the next free socket options level to be used by the unix protocol and address family. Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> --- include/linux/socket.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index d0e77f6..a6b8f35 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -312,6 +312,7 @@ struct ucred { #define SOL_IUCV 277 #define SOL_CAIF 278 #define SOL_ALG 279 +#define SOL_UNIX 280 /* IPX options */ #define IPX_TYPE 1 -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH 03/10] af_unix: add setsockopt on unix sockets 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 01/10] af_unix: Documentation on multicast unix sockets Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 02/10] af_unix: Add constant for unix socket options level Javier Martinez Canillas @ 2012-02-20 15:57 ` Javier Martinez Canillas 2012-02-20 16:20 ` David Miller 2012-02-20 19:13 ` [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Colin Walters 2012-02-24 20:36 ` David Miller 4 siblings, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-20 15:57 UTC (permalink / raw) To: David S. Miller Cc: Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Rodrigo Moya, Sjoerd Simons, netdev, linux-kernel From: Alban Crequy <alban.crequy@collabora.co.uk> unix_setsockopt() is called only on SOCK_DGRAM and SOCK_SEQPACKET unix sockets Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> --- net/unix/af_unix.c | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 85d3bb7..3537f20 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -515,6 +515,8 @@ static unsigned int unix_dgram_poll(struct file *, struct socket *, poll_table *); static int unix_ioctl(struct socket *, unsigned int, unsigned long); static int unix_shutdown(struct socket *, int); +static int unix_setsockopt(struct socket *, int, int, + char __user *, unsigned int); static int unix_stream_sendmsg(struct kiocb *, struct socket *, struct msghdr *, size_t); static int unix_stream_recvmsg(struct kiocb *, struct socket *, @@ -564,7 +566,7 @@ static const struct proto_ops unix_dgram_ops = { .ioctl = unix_ioctl, .listen = sock_no_listen, .shutdown = unix_shutdown, - .setsockopt = sock_no_setsockopt, + .setsockopt = unix_setsockopt, .getsockopt = sock_no_getsockopt, .sendmsg = unix_dgram_sendmsg, .recvmsg = unix_dgram_recvmsg, @@ -585,7 +587,7 @@ static const struct proto_ops unix_seqpacket_ops = { .ioctl = unix_ioctl, .listen = unix_listen, .shutdown = unix_shutdown, - .setsockopt = sock_no_setsockopt, + .setsockopt = unix_setsockopt, .getsockopt = sock_no_getsockopt, .sendmsg = unix_seqpacket_sendmsg, .recvmsg = unix_seqpacket_recvmsg, @@ -1583,6 +1585,13 @@ out: } +static int unix_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, unsigned int optlen) +{ + return -EOPNOTSUPP; +} + + static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock, struct msghdr *msg, size_t len) { -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH 03/10] af_unix: add setsockopt on unix sockets 2012-02-20 15:57 ` [PATCH 03/10] af_unix: add setsockopt on unix sockets Javier Martinez Canillas @ 2012-02-20 16:20 ` David Miller 0 siblings, 0 replies; 55+ messages in thread From: David Miller @ 2012-02-20 16:20 UTC (permalink / raw) To: javier Cc: eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, rodrigo.moya, sjoerd.simons, netdev, linux-kernel Well, where's the rest? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas ` (2 preceding siblings ...) 2012-02-20 15:57 ` [PATCH 03/10] af_unix: add setsockopt on unix sockets Javier Martinez Canillas @ 2012-02-20 19:13 ` Colin Walters 2012-02-21 8:07 ` Rodrigo Moya 2012-02-24 20:36 ` David Miller 4 siblings, 1 reply; 55+ messages in thread From: Colin Walters @ 2012-02-20 19:13 UTC (permalink / raw) To: Javier Martinez Canillas Cc: David S. Miller, Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Rodrigo Moya, Sjoerd Simons, netdev, linux-kernel On Mon, 2012-02-20 at 16:57 +0100, Javier Martinez Canillas wrote: > This patch-set add multicast support to Unix domain socket familiy for datagram > and seqpacket sockets. This work was made by Alban Crequy as a result of a > research we have been doing to improve the performance of the D-bus IPC system. Do you have links to any modifications to userspace dbus to take advantage of this? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-20 19:13 ` [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Colin Walters @ 2012-02-21 8:07 ` Rodrigo Moya 0 siblings, 0 replies; 55+ messages in thread From: Rodrigo Moya @ 2012-02-21 8:07 UTC (permalink / raw) To: Colin Walters Cc: Javier Martinez Canillas, David S. Miller, Eric Dumazet, Lennart Poettering, Kay Sievers, Alban Crequy, Bart Cerneels, Sjoerd Simons, netdev, linux-kernel On Mon, 2012-02-20 at 14:13 -0500, Colin Walters wrote: > On Mon, 2012-02-20 at 16:57 +0100, Javier Martinez Canillas wrote: > > This patch-set add multicast support to Unix domain socket familiy for datagram > > and seqpacket sockets. This work was made by Alban Crequy as a result of a > > research we have been doing to improve the performance of the D-bus IPC system. > > Do you have links to any modifications to userspace dbus to take > advantage of this? we have a work in progress at http://cgit.collabora.com/git/user/rodrigo/dbus.git/ in the unix-sockets-multicast branch ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas ` (3 preceding siblings ...) 2012-02-20 19:13 ` [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Colin Walters @ 2012-02-24 20:36 ` David Miller 2012-02-27 14:00 ` Javier Martinez Canillas 4 siblings, 1 reply; 55+ messages in thread From: David Miller @ 2012-02-24 20:36 UTC (permalink / raw) To: javier Cc: eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, rodrigo.moya, sjoerd.simons, netdev, linux-kernel My first impression is that I'm amazed at how much complicated new code you have to add to support groups of receivers of AF_UNIX messages. I can't see how this is better than doing multicast over ipv4 using UDP or something like that, code which we have already and has been tested for decades. I really don't want to apply this stuff, it looks bloated, complicated, and there is another avenue for doing what you want to do. Applications have to change to support the new multicast facilities, so they can equally be changed to use a real transport that already supports multicasting. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-24 20:36 ` David Miller @ 2012-02-27 14:00 ` Javier Martinez Canillas 2012-02-27 19:05 ` David Miller 0 siblings, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-27 14:00 UTC (permalink / raw) To: David Miller Cc: javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, rodrigo.moya, sjoerd.simons, netdev, linux-kernel On 02/24/2012 09:36 PM, David Miller wrote: > > My first impression is that I'm amazed at how much complicated new > code you have to add to support groups of receivers of AF_UNIX > messages. > > I can't see how this is better than doing multicast over ipv4 using > UDP or something like that, code which we have already and has been > tested for decades. > Primary for performance reasons. D-bus is an IPC system for processes in the same machine so traversing the whole TCP/IP stack seems a little overkill to me. We will try it though to have numbers on the actual overhead of using UDP multicast over IP instead of multicast Unix domain sockets. We also thought of using Netlink sockets since it already supports multicast and should be more lightweight than IP multicast. But even Netlink doesn't meet our needs since our multicast on Unix sockets implementation has different semantics needed for D-bus: - total order is guaranteed: If sender A sends a message before B, then receiver C and D should both get message A first and then B. - slow readers: dropping packets vs blocking the sender. Although datagrams are not reliable on IP, datagrams on Unix sockets are never lost. So if one receiver has its buffer full the sender is blocked instead of dropping packets. That way we guarantee a reliable communication channel. - multicast group acess control: controlling who can join the multicast group. - multicast on loopback is not supported: which means we have to use a NIC (i.e: eth0). > I really don't want to apply this stuff, it looks bloated, > complicated, and there is another avenue for doing what you want to > do. > We can work to reduce the implementation complexity and make it less bloated. Or you don't like the idea in general? > Applications have to change to support the new multicast facilities, > so they can equally be changed to use a real transport that already > supports multicasting. Yes, this is not about minimizing user-space application change but to improve the D-bus performance, or any other framework that relies on multicast communication on a single machine. Best regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-27 14:00 ` Javier Martinez Canillas @ 2012-02-27 19:05 ` David Miller 2012-02-28 10:47 ` Rodrigo Moya 0 siblings, 1 reply; 55+ messages in thread From: David Miller @ 2012-02-27 19:05 UTC (permalink / raw) To: javier.martinez Cc: javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, rodrigo.moya, sjoerd.simons, netdev, linux-kernel From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Date: Mon, 27 Feb 2012 15:00:06 +0100 > Primary for performance reasons. D-bus is an IPC system for processes in > the same machine so traversing the whole TCP/IP stack seems a little > overkill to me. You haven't actually tested what the cost of this actually is, so what you're saying is mere speculation. In many cases TCP/UDP over loopback is actually faster than AF_UNIX. Since this is the premise of your whole rebuttal, I'll simply stop reading here. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-27 19:05 ` David Miller @ 2012-02-28 10:47 ` Rodrigo Moya 2012-02-28 14:28 ` David Lamparter 2012-02-28 19:05 ` David Miller 0 siblings, 2 replies; 55+ messages in thread From: Rodrigo Moya @ 2012-02-28 10:47 UTC (permalink / raw) To: David Miller Cc: javier.martinez, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi David On Mon, 2012-02-27 at 14:05 -0500, David Miller wrote: > From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> > Date: Mon, 27 Feb 2012 15:00:06 +0100 > > > Primary for performance reasons. D-bus is an IPC system for processes in > > the same machine so traversing the whole TCP/IP stack seems a little > > overkill to me. > > You haven't actually tested what the cost of this actually is, so what > you're saying is mere speculation. In many cases TCP/UDP over > loopback is actually faster than AF_UNIX. > you're right we haven't tested this, but because of the other points in Javier's mail, which are the special semantics we need for this to fit the D-Bus usage: > - total order is guaranteed: If sender A sends a message before B, then > receiver C and D should both get message A first and then B. > > - slow readers: dropping packets vs blocking the sender. Although > datagrams are not reliable on IP, datagrams on Unix sockets are never > lost. So if one receiver has its buffer full the sender is blocked > instead of dropping packets. That way we guarantee a reliable > communication channel. > > - multicast group acess control: controlling who can join the multicast > group. > > - multicast on loopback is not supported: which means we have to use a > NIC (i.e: eth0). Because of all of this, UDP/IP multicast wasn't even considered as an option. We might be wrong in some/all of those, so could you please comment on them to check if that's so? thanks ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-28 10:47 ` Rodrigo Moya @ 2012-02-28 14:28 ` David Lamparter 2012-02-28 15:24 ` Javier Martinez Canillas 2012-02-28 19:05 ` David Miller 1 sibling, 1 reply; 55+ messages in thread From: David Lamparter @ 2012-02-28 14:28 UTC (permalink / raw) To: Rodrigo Moya Cc: David Miller, javier.martinez, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Tue, Feb 28, 2012 at 11:47:39AM +0100, Rodrigo Moya wrote: > > - slow readers: dropping packets vs blocking the sender. Although > > datagrams are not reliable on IP, datagrams on Unix sockets are > never > > lost. So if one receiver has its buffer full the sender is blocked > > instead of dropping packets. That way we guarantee a reliable > > communication channel. This sounds like a terribly nice way to f*ck the entire D-Bus system by having one broken (or malicious) desktop application. What's the intended way of coping with users that block the socket by not reading? -David L. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-28 14:28 ` David Lamparter @ 2012-02-28 15:24 ` Javier Martinez Canillas 2012-02-28 16:33 ` Javier Martinez Canillas 0 siblings, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-28 15:24 UTC (permalink / raw) To: David Lamparter Cc: Rodrigo Moya, David Miller, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 02/28/2012 03:28 PM, David Lamparter wrote: > On Tue, Feb 28, 2012 at 11:47:39AM +0100, Rodrigo Moya wrote: >> > - slow readers: dropping packets vs blocking the sender. Although >> > datagrams are not reliable on IP, datagrams on Unix sockets are >> never >> > lost. So if one receiver has its buffer full the sender is blocked >> > instead of dropping packets. That way we guarantee a reliable >> > communication channel. > > This sounds like a terribly nice way to f*ck the entire D-Bus system by > having one broken (or malicious) desktop application. What's the > intended way of coping with users that block the socket by not reading? > > > -David L. The problem is that D-bus expects a reliable transport method (TCP or SOCK_STREAM Unix socks) but this is not the case with multicast Unix sockets. Since our implementation is for SOCK_SEQPACKET and SOCK_DGRAM socket types. So, you have to either add another layer to the D-bus protocol to make it reliable (acks, retransmissions, flow control, etc) or avoid losing D-bus messages (by blocking the sender if one of the receivers has its buffer full). Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-28 15:24 ` Javier Martinez Canillas @ 2012-02-28 16:33 ` Javier Martinez Canillas 0 siblings, 0 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-02-28 16:33 UTC (permalink / raw) To: David Lamparter Cc: Rodrigo Moya, David Miller, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 02/28/2012 04:24 PM, Javier Martinez Canillas wrote: > On 02/28/2012 03:28 PM, David Lamparter wrote: >> On Tue, Feb 28, 2012 at 11:47:39AM +0100, Rodrigo Moya wrote: >>> > - slow readers: dropping packets vs blocking the sender. Although >>> > datagrams are not reliable on IP, datagrams on Unix sockets are >>> never >>> > lost. So if one receiver has its buffer full the sender is blocked >>> > instead of dropping packets. That way we guarantee a reliable >>> > communication channel. >> >> This sounds like a terribly nice way to f*ck the entire D-Bus system by >> having one broken (or malicious) desktop application. What's the >> intended way of coping with users that block the socket by not reading? >> >> >> -David L. > > The problem is that D-bus expects a reliable transport method (TCP or > SOCK_STREAM Unix socks) but this is not the case with multicast Unix > sockets. Since our implementation is for SOCK_SEQPACKET and SOCK_DGRAM > socket types. > > So, you have to either add another layer to the D-bus protocol to make > it reliable (acks, retransmissions, flow control, etc) or avoid losing > D-bus messages (by blocking the sender if one of the receivers has its > buffer full). > Also, this problem exists with current D-bus implementation. If a malicious desktop application doesn't read its socket then the messages sent to it will be buffered in the daemon: https://bugs.freedesktop.org/show_bug.cgi?id=33606 dbus-daemon memory usage will ballooning until max_incoming_bytes/max_outgoing_bytes limit is reached (1GB for session bus in default configuration) <limit name="max_incoming_bytes">1000000000</limit> <limit name="max_outgoing_bytes">1000000000</limit> It only works because not many applications are broken and user-space memory is virtualized. But if you bypass the daemon and use a multicast transport layer (as in our multicast Unix socket implementation) you don't have that much memory to buffer the packets. So you have to either block the senders or: - drop the slow reader - kill the spammer - have an infinite amount of memory Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-28 10:47 ` Rodrigo Moya 2012-02-28 14:28 ` David Lamparter @ 2012-02-28 19:05 ` David Miller 2012-03-01 11:57 ` Javier Martinez Canillas 1 sibling, 1 reply; 55+ messages in thread From: David Miller @ 2012-02-28 19:05 UTC (permalink / raw) To: rodrigo.moya Cc: javier.martinez, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Rodrigo Moya <rodrigo.moya@collabora.co.uk> Date: Tue, 28 Feb 2012 11:47:39 +0100 > Because of all of this, UDP/IP multicast wasn't even considered as an > option. We might be wrong in some/all of those, so could you please > comment on them to check if that's so? You guys seem to want something that isn't AF_UNIX, ordering guarentees and whatnot, it really has no place in these protocols. You've designed a userlevel subsystem with requirements that no existing socket layer can give, and you just figured you'd work that out later. I think you rather should have reconsidered these premises and designed something that could handle reality which is AF_UNIX can't do multicast and nobody guarentees those strange ordering requirements you seem to have. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-02-28 19:05 ` David Miller @ 2012-03-01 11:57 ` Javier Martinez Canillas 2012-03-01 12:26 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-01 11:57 UTC (permalink / raw) To: David Miller Cc: rodrigo.moya, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 02/28/2012 08:05 PM, David Miller wrote: > From: Rodrigo Moya <rodrigo.moya@collabora.co.uk> > Date: Tue, 28 Feb 2012 11:47:39 +0100 > >> Because of all of this, UDP/IP multicast wasn't even considered as an >> option. We might be wrong in some/all of those, so could you please >> comment on them to check if that's so? > > You guys seem to want something that isn't AF_UNIX, ordering guarentees > and whatnot, it really has no place in these protocols. > > You've designed a userlevel subsystem with requirements that no existing > socket layer can give, and you just figured you'd work that out later. > > I think you rather should have reconsidered these premises and designed > something that could handle reality which is AF_UNIX can't do multicast > and nobody guarentees those strange ordering requirements you seem to > have. Yes, you are right it doesn't follow AF_UNIX semantics so Unix sockets is not the best place to add our multicast implementation. So, now we are trying a different approach. To create a new address family AF_MCAST. That way we can have more control over the semantics of the socket interface for that family. We expect to have some patches in a few days and we will resend. Does this makes more sense to you? Best regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 11:57 ` Javier Martinez Canillas @ 2012-03-01 12:26 ` Eric Dumazet 2012-03-01 12:33 ` David Laight 2012-03-01 20:44 ` David Miller 2012-03-01 12:57 ` Luiz Augusto von Dentz 2012-03-01 20:42 ` David Miller 2 siblings, 2 replies; 55+ messages in thread From: Eric Dumazet @ 2012-03-01 12:26 UTC (permalink / raw) To: Javier Martinez Canillas Cc: David Miller, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Le jeudi 01 mars 2012 à 12:57 +0100, Javier Martinez Canillas a écrit : > Yes, you are right it doesn't follow AF_UNIX semantics so Unix sockets > is not the best place to add our multicast implementation. > Right, AF_UNIX is already a nightmare to maintain. > So, now we are trying a different approach. To create a new address > family AF_MCAST. That way we can have more control over the semantics of > the socket interface for that family. > > We expect to have some patches in a few days and we will resend. > > Does this makes more sense to you? > Why adding an obscure set of IPC mechanism in network tree, and not using (maybe extending) traditional IPC (Messages queues, semaphores, Shared memory, pipes, futexes, ...). ^ permalink raw reply [flat|nested] 55+ messages in thread
* RE: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 12:26 ` Eric Dumazet @ 2012-03-01 12:33 ` David Laight 2012-03-01 12:50 ` Rodrigo Moya 2012-03-01 20:44 ` David Miller 1 sibling, 1 reply; 55+ messages in thread From: David Laight @ 2012-03-01 12:33 UTC (permalink / raw) To: Eric Dumazet, Javier Martinez Canillas Cc: David Miller, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel > > So, now we are trying a different approach. To create a new address > > family AF_MCAST. That way we can have more control over the semantics of > > the socket interface for that family. > > > > We expect to have some patches in a few days and we will resend. > > > > Does this makes more sense to you? > > > > Why adding an obscure set of IPC mechanism in network tree, and not > using (maybe extending) traditional IPC (Messages queues, semaphores, > Shared memory, pipes, futexes, ...). If it isn't a totally silly suggestion, why not write a simple device driver that just does what you want? Which (I think) is named pipes with multiple readers. David ^ permalink raw reply [flat|nested] 55+ messages in thread
* RE: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 12:33 ` David Laight @ 2012-03-01 12:50 ` Rodrigo Moya 2012-03-01 12:59 ` Eric Dumazet 0 siblings, 1 reply; 55+ messages in thread From: Rodrigo Moya @ 2012-03-01 12:50 UTC (permalink / raw) To: David Laight Cc: Eric Dumazet, Javier Martinez Canillas, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Thu, 2012-03-01 at 12:33 +0000, David Laight wrote: > > > So, now we are trying a different approach. To create a new address > > > family AF_MCAST. That way we can have more control over the > semantics of > > > the socket interface for that family. > > > > > > We expect to have some patches in a few days and we will resend. > > > > > > Does this makes more sense to you? > > > > > > > Why adding an obscure set of IPC mechanism in network tree, and not > > using (maybe extending) traditional IPC (Messages queues, semaphores, > > Shared memory, pipes, futexes, ...). > > If it isn't a totally silly suggestion, why not write a simple > device driver that just does what you want? > Which (I think) is named pipes with multiple readers. > the main problem in D-Bus we are trying to solve is the context switches, since right now, there is a daemon, which listens on a UNIX socket, and all traffic in the bus goes through it, and then the daemon has to route the messages it gets on that socket to the corresponding place(s). So, every time someone sends a message to D-Bus, since all traffic goes through the daemon, dbus-daemon gets waked-up, which is one of the biggest bottlenecks we are trying to fix. That's why we are thinking about using multicast with socket filters, so that the daemon only gets traffic it cares about and thus is not waked up and context switches don't happen when not needed. Using message queues, AFAICS, we would have the same problem, as the daemon would create the message queue and would get all traffic, right? cheers ^ permalink raw reply [flat|nested] 55+ messages in thread
* RE: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 12:50 ` Rodrigo Moya @ 2012-03-01 12:59 ` Eric Dumazet 2012-03-01 13:56 ` Javier Martinez Canillas 0 siblings, 1 reply; 55+ messages in thread From: Eric Dumazet @ 2012-03-01 12:59 UTC (permalink / raw) To: Rodrigo Moya Cc: David Laight, Javier Martinez Canillas, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Le jeudi 01 mars 2012 à 13:50 +0100, Rodrigo Moya a écrit : > the main problem in D-Bus we are trying to solve is the context > switches, since right now, there is a daemon, which listens on a UNIX > socket, and all traffic in the bus goes through it, and then the daemon > has to route the messages it gets on that socket to the corresponding > place(s). So, every time someone sends a message to D-Bus, since all > traffic goes through the daemon, dbus-daemon gets waked-up, which is one > of the biggest bottlenecks we are trying to fix. > > That's why we are thinking about using multicast with socket filters, so > that the daemon only gets traffic it cares about and thus is not waked > up and context switches don't happen when not needed. > > Using message queues, AFAICS, we would have the same problem, as the > daemon would create the message queue and would get all traffic, right? > This is why I mentioned extensions. Anyway, if you think multicast sockets is the way to go, then you could setup a virtual network just to be able to use AF_INET multicast. Thats probably doable without kernel patching. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 12:59 ` Eric Dumazet @ 2012-03-01 13:56 ` Javier Martinez Canillas 2012-03-01 16:00 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-01 13:56 UTC (permalink / raw) To: Eric Dumazet Cc: Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 03/01/2012 01:59 PM, Eric Dumazet wrote: > Le jeudi 01 mars 2012 à 13:50 +0100, Rodrigo Moya a écrit : >> the main problem in D-Bus we are trying to solve is the context >> switches, since right now, there is a daemon, which listens on a UNIX >> socket, and all traffic in the bus goes through it, and then the daemon >> has to route the messages it gets on that socket to the corresponding >> place(s). So, every time someone sends a message to D-Bus, since all >> traffic goes through the daemon, dbus-daemon gets waked-up, which is one >> of the biggest bottlenecks we are trying to fix. >> >> That's why we are thinking about using multicast with socket filters, so >> that the daemon only gets traffic it cares about and thus is not waked >> up and context switches don't happen when not needed. >> >> Using message queues, AFAICS, we would have the same problem, as the >> daemon would create the message queue and would get all traffic, right? >> > > This is why I mentioned extensions. > > Anyway, if you think multicast sockets is the way to go, then you could > setup a virtual network just to be able to use AF_INET multicast. > > Thats probably doable without kernel patching. > We could use AF_INET multicast on a local machine but we need some ordering and control flow requirements that are not guaranteed on UDP multicast over IP. That's why we thought to add a new address family AF_MCAST. To make it a general local multicast solution and not being too specific we added some flags to control its behavior like MCAST_MREQ_DROP_WHEN_FULL to decide to either block the sender or drop the packet when one receiver has its queue full. Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 13:56 ` Javier Martinez Canillas @ 2012-03-01 16:00 ` Eric Dumazet 2012-03-01 16:02 ` Luiz Augusto von Dentz 2012-03-01 20:55 ` David Miller 2 siblings, 0 replies; 55+ messages in thread From: Eric Dumazet @ 2012-03-01 16:00 UTC (permalink / raw) To: Javier Martinez Canillas Cc: Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Le jeudi 01 mars 2012 à 14:56 +0100, Javier Martinez Canillas a écrit : > We could use AF_INET multicast on a local machine but we need some > ordering and control flow requirements that are not guaranteed on UDP > multicast over IP. That's why we thought to add a new address family > AF_MCAST. > It seems application logic and complexity pushed into kernel, for a very single user (even if used in a lot of products) : D-Bus > To make it a general local multicast solution and not being too specific > we added some flags to control its behavior like > MCAST_MREQ_DROP_WHEN_FULL to decide to either block the sender or drop > the packet when one receiver has its queue full. I am only wondering how many lines this is going to add in kernel for a complete implementation, given your performance expectations, flow control, reliability, not counting all security issues (ancillary messages and so on) In case of IP_MULTICAST_LOOP, we could allow the sender to sleep if receiver queue is full, with a bit of tweaking in stack (current implementation uses loopback re-inject, so requires softirq handling). In fact, we could use a new IP_MULTICAST_LOCAL option, so that sender processing doesnt trigger a softirq handler at all and is allowed to sleep if needed. For example skb allocations could use GFP_KERNEL instead of current GFP_ATOMIC ones in udp mcast . I dont know, maybe it would be a smaller patch. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 13:56 ` Javier Martinez Canillas 2012-03-01 16:00 ` Eric Dumazet @ 2012-03-01 16:02 ` Luiz Augusto von Dentz 2012-03-01 17:06 ` Javier Martinez Canillas ` (2 more replies) 2012-03-01 20:55 ` David Miller 2 siblings, 3 replies; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-01 16:02 UTC (permalink / raw) To: Javier Martinez Canillas Cc: Eric Dumazet, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi Javier, On Thu, Mar 1, 2012 at 3:56 PM, Javier Martinez Canillas <javier.martinez@collabora.co.uk> wrote: >> >> Anyway, if you think multicast sockets is the way to go, then you could >> setup a virtual network just to be able to use AF_INET multicast. >> >> Thats probably doable without kernel patching. >> > > We could use AF_INET multicast on a local machine but we need some > ordering and control flow requirements that are not guaranteed on UDP > multicast over IP. That's why we thought to add a new address family > AF_MCAST. I don't want to sound like a broken record, but Im afraid I have to, what about Ancillary Messages, how you are going to support passing fd? Actually the whole virtual network sounds like a bad idea, are we going to give ips to each and every application connected to the bus, actually it is necessary to have one virtual network for each bus. Contrary to someones believes I don't think AF_INET is that fast (e.g. http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 16:02 ` Luiz Augusto von Dentz @ 2012-03-01 17:06 ` Javier Martinez Canillas 2012-03-01 17:59 ` Eric Dumazet 2012-03-01 18:53 ` David Dillow 2 siblings, 0 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-01 17:06 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: Eric Dumazet, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 03/01/2012 05:02 PM, Luiz Augusto von Dentz wrote: > Hi Javier, > > On Thu, Mar 1, 2012 at 3:56 PM, Javier Martinez Canillas > <javier.martinez@collabora.co.uk> wrote: >>> >>> Anyway, if you think multicast sockets is the way to go, then you could >>> setup a virtual network just to be able to use AF_INET multicast. >>> >>> Thats probably doable without kernel patching. >>> >> >> We could use AF_INET multicast on a local machine but we need some >> ordering and control flow requirements that are not guaranteed on UDP >> multicast over IP. That's why we thought to add a new address family >> AF_MCAST. > > I don't want to sound like a broken record, but Im afraid I have to, > what about Ancillary Messages, how you are going to support passing > fd? Actually the whole virtual network sounds like a bad idea, are we > going to give ips to each and every application connected to the bus, > actually it is necessary to have one virtual network for each bus. > > Contrary to someones believes I don't think AF_INET is that fast (e.g. > http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) > > You are right. Ancillary messages are PF_UNIX specific and also some D-bus applications use fd passing for out-of-band communication. So, using multicast on AF_INET will break these applications. Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 16:02 ` Luiz Augusto von Dentz 2012-03-01 17:06 ` Javier Martinez Canillas @ 2012-03-01 17:59 ` Eric Dumazet 2012-03-01 18:10 ` Alan Cox 2012-03-01 19:02 ` Javier Martinez Canillas 2012-03-01 18:53 ` David Dillow 2 siblings, 2 replies; 55+ messages in thread From: Eric Dumazet @ 2012-03-01 17:59 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: Javier Martinez Canillas, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Le 1 mars 2012 08:02, Luiz Augusto von Dentz <luiz.dentz@gmail.com> a écrit : > > Contrary to someones believes I don't think AF_INET is that fast (e.g. > http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) > Oh you mention a recent zork it seems ;) Are we speaking of performance problems, apart from scheduler problems for D-Bus (each message wakeing all receivers, all receivers read and drop message but the target) ? I am actually one of the few people working to improve performance on both AF_INET and AF_UNIX parts. Just take a look at recent commits. Right now you can send/receive millions of udp messages per second on your linux machine, if you figured out how to avoid process scheduler costs. If D-Bus wants more, I highly suggest using shared memory instead of passing messages. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 17:59 ` Eric Dumazet @ 2012-03-01 18:10 ` Alan Cox 2012-03-01 19:02 ` Javier Martinez Canillas 1 sibling, 0 replies; 55+ messages in thread From: Alan Cox @ 2012-03-01 18:10 UTC (permalink / raw) To: Eric Dumazet Cc: Luiz Augusto von Dentz, Javier Martinez Canillas, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel > Right now you can send/receive millions of udp messages per second on > your linux machine, if you figured out how to avoid process scheduler > costs. If D-Bus wants more, I highly suggest using shared memory > instead of passing messages. Or some rather artful use of BPF ? Alan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 17:59 ` Eric Dumazet 2012-03-01 18:10 ` Alan Cox @ 2012-03-01 19:02 ` Javier Martinez Canillas 2012-03-01 19:29 ` Javier Martinez Canillas 1 sibling, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-01 19:02 UTC (permalink / raw) To: Eric Dumazet Cc: Luiz Augusto von Dentz, Javier Martinez Canillas, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Thu, Mar 1, 2012 at 6:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le 1 mars 2012 08:02, Luiz Augusto von Dentz <luiz.dentz@gmail.com> a écrit : >> >> Contrary to someones believes I don't think AF_INET is that fast (e.g. >> http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) >> > > Oh you mention a recent zork it seems ;) > > Are we speaking of performance problems, apart from scheduler problems > for D-Bus (each message wakeing all receivers, all receivers read and > drop message but the target) ? > Hi Eric, The only performance problem we are talking about is the scheduling for D-bus (context switch to the daemon for each message). With today implementation the receivers only gets messages that were sent to it but the D-bus daemon has to be wake it up for every message to he can do the routing. For multicast messages (i.e: D-bus signals) this is even worse since the daemon has to do a send() for each receiver. > I am actually one of the few people working to improve performance on > both AF_INET and AF_UNIX parts. Just take a look at recent commits. > > Right now you can send/receive millions of udp messages per second on > your linux machine, if you figured out how to avoid process scheduler > costs. If D-Bus wants more, I highly suggest using shared memory > instead of passing messages. > -- Yes, I also thought that AF_UNIX would be more efficient than AF_INET but I was wrong. Yesterday I wrote some tests using our multicast unix socket, UDP multicast over IP on a single machine and even multicast using AF_NETLINK sockets and got very similar performance results. The only problem is the ordering and control flow requirements for D-bus. Best regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 19:02 ` Javier Martinez Canillas @ 2012-03-01 19:29 ` Javier Martinez Canillas 0 siblings, 0 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-01 19:29 UTC (permalink / raw) To: Javier Martinez Canillas Cc: Eric Dumazet, Luiz Augusto von Dentz, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 03/01/2012 08:02 PM, Javier Martinez Canillas wrote: > On Thu, Mar 1, 2012 at 6:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Le 1 mars 2012 08:02, Luiz Augusto von Dentz <luiz.dentz@gmail.com> a écrit : >>> >>> Contrary to someones believes I don't think AF_INET is that fast (e.g. >>> http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) >>> >> >> Oh you mention a recent zork it seems ;) >> >> Are we speaking of performance problems, apart from scheduler problems >> for D-Bus (each message wakeing all receivers, all receivers read and >> drop message but the target) ? >> > > Hi Eric, > > The only performance problem we are talking about is the scheduling > for D-bus (context switch to the daemon for each message). With today > implementation the receivers only gets messages that were sent to it > but the D-bus daemon has to be wake it up for every message to he can > do the routing. For multicast messages (i.e: D-bus signals) this is > even worse since the daemon has to do a send() for each receiver. > >> I am actually one of the few people working to improve performance on >> both AF_INET and AF_UNIX parts. Just take a look at recent commits. >> >> Right now you can send/receive millions of udp messages per second on >> your linux machine, if you figured out how to avoid process scheduler >> costs. If D-Bus wants more, I highly suggest using shared memory >> instead of passing messages. >> -- > > Yes, I also thought that AF_UNIX would be more efficient than AF_INET > but I was wrong. Yesterday I wrote some tests using our multicast unix > socket, UDP multicast over IP on a single machine and even multicast > using AF_NETLINK sockets and got very similar performance results. > > The only problem is the ordering and control flow requirements for D-bus. > And the fd passing for out-ouf-band communication used for some D-bus application such as oFono and BlueZ. Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 16:02 ` Luiz Augusto von Dentz 2012-03-01 17:06 ` Javier Martinez Canillas 2012-03-01 17:59 ` Eric Dumazet @ 2012-03-01 18:53 ` David Dillow 2 siblings, 0 replies; 55+ messages in thread From: David Dillow @ 2012-03-01 18:53 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: Javier Martinez Canillas, Eric Dumazet, Rodrigo Moya, David Laight, David Miller, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Thu, 2012-03-01 at 18:02 +0200, Luiz Augusto von Dentz wrote: > Contrary to someones believes I don't think AF_INET is that fast (e.g. > http://scottmoonen.com/2008/04/05/a-performance-comparison-of-af_unix-with-loopback-on-linux/) There has been a huge amount of work on the stack in the four years since that was written, and even longer since 2.6.18 was considered current. Have anything more recent? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 13:56 ` Javier Martinez Canillas 2012-03-01 16:00 ` Eric Dumazet 2012-03-01 16:02 ` Luiz Augusto von Dentz @ 2012-03-01 20:55 ` David Miller 2012-03-02 4:40 ` Stephen Hemminger 2 siblings, 1 reply; 55+ messages in thread From: David Miller @ 2012-03-01 20:55 UTC (permalink / raw) To: javier.martinez Cc: eric.dumazet, rodrigo.moya, David.Laight, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Date: Thu, 01 Mar 2012 14:56:11 +0100 > We could use AF_INET multicast on a local machine but we need some > ordering and control flow requirements that are not guaranteed on UDP > multicast over IP. That's why we thought to add a new address family > AF_MCAST. None of this makes any sense to me. Unless you have infinite amounts of memory you have to handle packet drops, and the same things that handle packet drops on a protocol level can handle out-of-order delivery too. Stop reinventing the wheel, use facilities that exist already. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 20:55 ` David Miller @ 2012-03-02 4:40 ` Stephen Hemminger 0 siblings, 0 replies; 55+ messages in thread From: Stephen Hemminger @ 2012-03-02 4:40 UTC (permalink / raw) To: David Miller Cc: javier.martinez, eric.dumazet, rodrigo.moya, David.Laight, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Thu, 01 Mar 2012 15:55:05 -0500 (EST) David Miller <davem@davemloft.net> wrote: > From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> > Date: Thu, 01 Mar 2012 14:56:11 +0100 > > > We could use AF_INET multicast on a local machine but we need some > > ordering and control flow requirements that are not guaranteed on UDP > > multicast over IP. That's why we thought to add a new address family > > AF_MCAST. > > None of this makes any sense to me. > > Unless you have infinite amounts of memory you have to handle packet > drops, and the same things that handle packet drops on a protocol > level can handle out-of-order delivery too. > > Stop reinventing the wheel, use facilities that exist already. Look at ZeroMq http://www.zeromq.org/ library seems to be a good fit for what D-bus wants. And it supports multiple protocols. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 12:26 ` Eric Dumazet 2012-03-01 12:33 ` David Laight @ 2012-03-01 20:44 ` David Miller 2012-03-01 22:01 ` Luiz Augusto von Dentz 1 sibling, 1 reply; 55+ messages in thread From: David Miller @ 2012-03-01 20:44 UTC (permalink / raw) To: eric.dumazet Cc: javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 01 Mar 2012 04:26:42 -0800 > Why adding an obscure set of IPC mechanism in network tree, and not > using (maybe extending) traditional IPC (Messages queues, semaphores, > Shared memory, pipes, futexes, ...). I actually don't understand why there is so much resistence to using a real bonafide on-the-wire protocol, and that way if you ever wanted to connect dbus instances on multiple machines or log dbus transactions remotely for debugging, you could just do it. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 20:44 ` David Miller @ 2012-03-01 22:01 ` Luiz Augusto von Dentz 2012-03-01 22:08 ` David Miller 0 siblings, 1 reply; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-01 22:01 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi David, On Thu, Mar 1, 2012 at 10:44 PM, David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 01 Mar 2012 04:26:42 -0800 > >> Why adding an obscure set of IPC mechanism in network tree, and not >> using (maybe extending) traditional IPC (Messages queues, semaphores, >> Shared memory, pipes, futexes, ...). > > I actually don't understand why there is so much resistence to using a > real bonafide on-the-wire protocol, and that way if you ever wanted to > connect dbus instances on multiple machines or log dbus transactions > remotely for debugging, you could just do it. I don't think you understood the problem, we want something that scale for less powerful devices, why do you think Android have all the trouble to create binder? Besides what is really the point in having AF_UNIX if you can't use for what it is for? "The AF_UNIX (also known as AF_LOCAL) socket family is used to communicate between processes on the same machine efficiently." -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 22:01 ` Luiz Augusto von Dentz @ 2012-03-01 22:08 ` David Miller 2012-03-02 8:39 ` Luiz Augusto von Dentz 0 siblings, 1 reply; 55+ messages in thread From: David Miller @ 2012-03-01 22:08 UTC (permalink / raw) To: luiz.dentz Cc: eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Date: Fri, 2 Mar 2012 00:01:40 +0200 > I don't think you understood the problem, we want something that scale > for less powerful devices, why do you think Android have all the > trouble to create binder? So our protocol stack is so cpu hungry compared to AF_UNIX that it's unusable on low power devices? I can't take you seriously if you say this after showing us the thousands of lines of code you guys think we should add to the AF_UNIX socket layer. > Besides what is really the point in having AF_UNIX if you can't use > for what it is for? Because it doesn't have the handful of extra features you absolutely require of it. AF_UNIX is a complicated socket layer which is already extremely hard to maintain. We're still finding bugs in it even after all these years, and that's without adding major new functionality. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 22:08 ` David Miller @ 2012-03-02 8:39 ` Luiz Augusto von Dentz 2012-03-02 8:55 ` David Miller 0 siblings, 1 reply; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-02 8:39 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi David, On Fri, Mar 2, 2012 at 12:08 AM, David Miller <davem@davemloft.net> wrote: > From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> > Date: Fri, 2 Mar 2012 00:01:40 +0200 > >> I don't think you understood the problem, we want something that scale >> for less powerful devices, why do you think Android have all the >> trouble to create binder? > > So our protocol stack is so cpu hungry compared to AF_UNIX that it's > unusable on low power devices? I never said unusable, it will drastically increase latency of message which translates in less responsive applications. > I can't take you seriously if you say this after showing us the > thousands of lines of code you guys think we should add to the AF_UNIX > socket layer. But what you are suggesting transforms dbus-daemon in a ip router just to do multicast, actually how many lines of code do you think we gonna need to implement that? Probably much more than adding this much to the kernel and is not necessarily useful for anybody else. Like I said before there is many projects using AF_UNIX as IPC transport, the documentation actually induces people to use for this purpose, and many would benefit from being able to do multicast. Btw Im not involved with the implementation and perhaps it need some extra work, but IMO the idea is very useful. >> Besides what is really the point in having AF_UNIX if you can't use >> for what it is for? > > Because it doesn't have the handful of extra features you absolutely > require of it. You mean multicast, that is one and only, with many implementation details with that I agree. > AF_UNIX is a complicated socket layer which is already extremely hard > to maintain. We're still finding bugs in it even after all these > years, and that's without adding major new functionality. I understand your concern, this could make things even more unstable, but in the other hand hacking support of multicast to loopback would also mess with AF_INET, so in one way or the other the kernel will have to be involved. Also note that AF_UNIX has very key features of an efficient IPC, like the ability to pass fd to another process with SCM_RIGHTS. -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 8:39 ` Luiz Augusto von Dentz @ 2012-03-02 8:55 ` David Miller 2012-03-02 9:27 ` Javier Martinez Canillas ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: David Miller @ 2012-03-02 8:55 UTC (permalink / raw) To: luiz.dentz Cc: eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Date: Fri, 2 Mar 2012 10:39:24 +0200 > Like I said before there is many projects using AF_UNIX as IPC > transport, the documentation actually induces people to use for this > purpose, and many would benefit from being able to do multicast. You can't have it both ways. If it's useful for many applications, then many applications would benefit from a userland library that solved the problem using existing facilities such as IP multicast. If it's only useful for dbus that that absoltely means we should not add thousands of lines of code to the kernel specifically for that application. So either way, kernel changes are not justified. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 8:55 ` David Miller @ 2012-03-02 9:27 ` Javier Martinez Canillas 2012-03-02 9:39 ` David Miller ` (2 more replies) 2012-03-02 10:08 ` Luiz Augusto von Dentz 2012-03-02 22:19 ` david 2 siblings, 3 replies; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-02 9:27 UTC (permalink / raw) To: David Miller, shemminger, ying.xue Cc: luiz.dentz, eric.dumazet, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On 03/02/2012 09:55 AM, David Miller wrote: > From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> > Date: Fri, 2 Mar 2012 10:39:24 +0200 > >> Like I said before there is many projects using AF_UNIX as IPC >> transport, the documentation actually induces people to use for this >> purpose, and many would benefit from being able to do multicast. > > You can't have it both ways. > > If it's useful for many applications, then many applications would > benefit from a userland library that solved the problem using > existing facilities such as IP multicast. > > If it's only useful for dbus that that absoltely means we should > not add thousands of lines of code to the kernel specifically for > that application. > You are right that D-bus is the only one that will use it but D-bus is more than an application is an IPC system that is used for almost every single application that runs on your Linux desktop. > So either way, kernel changes are not justified. Yes, you are right that packets drops, out-of-order delivery and flow control could be handled in another layer (i.e: the D-bus library in user-space). Also I won't argue about performance since we did some stress test and found that AF_INET, AF_UNIX and AF_NETLINK performs very similar for multicast. > Stop reinventing the wheel, use facilities that exist already. We are the most interested in using a facility already found in the kernel, we will try ZeroMQ as Stephen suggested and TIPC but really didn't find an IPC mechanism that fits our needs. The most important issue right now is the fd passing for D-bus application doing out-of-band communication. Another approach that we are trying is to use Netlink sockets using the Generic Netlink kernel API and develop a kernel module that does the routing. That way if you don't accept our code at least it will be easier for us to maintain. Not sure if netlink supports fd passing though. Do you think that a simpler AF_UNIX multicast implementation without the locking to guarantee order delivery and the flow control that blocks the sender can be resend to you to reconsider merging it? Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 9:27 ` Javier Martinez Canillas @ 2012-03-02 9:39 ` David Miller 2012-03-02 13:13 ` Eric Dumazet 2012-03-05 18:55 ` David Lamparter 2 siblings, 0 replies; 55+ messages in thread From: David Miller @ 2012-03-02 9:39 UTC (permalink / raw) To: javier.martinez Cc: shemminger, ying.xue, luiz.dentz, eric.dumazet, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Date: Fri, 02 Mar 2012 10:27:16 +0100 > Do you think that a simpler AF_UNIX multicast implementation without the > locking to guarantee order delivery and the flow control that blocks the > sender can be resend to you to reconsider merging it? No. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 9:27 ` Javier Martinez Canillas 2012-03-02 9:39 ` David Miller @ 2012-03-02 13:13 ` Eric Dumazet 2012-03-02 16:34 ` Javier Martinez Canillas 2012-03-05 18:55 ` David Lamparter 2 siblings, 1 reply; 55+ messages in thread From: Eric Dumazet @ 2012-03-02 13:13 UTC (permalink / raw) To: Javier Martinez Canillas Cc: David Miller, shemminger, ying.xue, luiz.dentz, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Le vendredi 02 mars 2012 à 10:27 +0100, Javier Martinez Canillas a écrit : > We are the most interested in using a facility already found in the > kernel, we will try ZeroMQ as Stephen suggested and TIPC but really > didn't find an IPC mechanism that fits our needs. The most important > issue right now is the fd passing for D-bus application doing > out-of-band communication. Why on earth the needed D-Bus IPC should use a single kernel mechanism ? I mean, of course AF_INET cannot pass fd around and never will. Of course AF_UNIX cannot use multicast and never will. Of course shared memory wont pass fds around and never will. ... Add other impossible combinations as you want. There are reasons fd passing is hard to implement. I find stuffing this functionality in AF_UNIX was a bad design choice from the very beginning. Instead of pushing extra complexity to a single kernel component, why not trying to use a combination of existing, well designed and supported ones ? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 13:13 ` Eric Dumazet @ 2012-03-02 16:34 ` Javier Martinez Canillas 2012-03-02 17:08 ` Alan Cox 0 siblings, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-02 16:34 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, shemminger, ying.xue, luiz.dentz, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann On 03/02/2012 02:13 PM, Eric Dumazet wrote: > Le vendredi 02 mars 2012 à 10:27 +0100, Javier Martinez Canillas a > écrit : > >> We are the most interested in using a facility already found in the >> kernel, we will try ZeroMQ as Stephen suggested and TIPC but really >> didn't find an IPC mechanism that fits our needs. The most important >> issue right now is the fd passing for D-bus application doing >> out-of-band communication. > > Why on earth the needed D-Bus IPC should use a single kernel mechanism ? > > I mean, of course AF_INET cannot pass fd around and never will. > Of course AF_UNIX cannot use multicast and never will. > Of course shared memory wont pass fds around and never will. > ... Add other impossible combinations as you want. > > There are reasons fd passing is hard to implement. I find stuffing this > functionality in AF_UNIX was a bad design choice from the very > beginning. > Yes, can't say that everyone is happy with fd passing. It seems like a workaround since D-bus didn't scale for big chunks of data IMHO. > Instead of pushing extra complexity to a single kernel component, why > not trying to use a combination of existing, well designed and supported > ones ? > You are right, maybe a combination of IPC mechanism could be used. Basically we have this scenario: 1- Most applications today uses D-bus as an IPC system and is a central part of the Linux desktop. 2- The transport layer used by D-bus is not performance sensitive basically due: a) high number of context switches required to send messages between peer. b) the D-bus daemon doing the routing and being a bottleneck of the whole. c) amount of messages copied between kernel space and user space. 3- We still haven't found a single kernel IPC mechanism or a combination of IPC mechanism that can address this issue. This is a real concern in the Linux embedded world. Since Linux based products wants to use well probed software components found in Linux distros such as oFono, BlueZ, Pulseaudio, Connman and Telepathy to name a few. All of them uses D-bus to expose its API to other applications. I'm not saying that extending AF_UNIX for supporting multicast is the best approach but what I'm saying is that we should find a solution to this problem. PD: I'm cc'ing Marcel Holtmann so hopefully he can add his point of view to the problem (and possible solutions). I know that Marcel is also working on improving the D-bus system but moving to the kernel some tasks made by the D-bus daemon today. Regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 16:34 ` Javier Martinez Canillas @ 2012-03-02 17:08 ` Alan Cox 2012-03-05 8:38 ` Luiz Augusto von Dentz 0 siblings, 1 reply; 55+ messages in thread From: Alan Cox @ 2012-03-02 17:08 UTC (permalink / raw) To: Javier Martinez Canillas Cc: Eric Dumazet, David Miller, shemminger, ying.xue, luiz.dentz, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann > 2- The transport layer used by D-bus is not performance sensitive > basically due: > > a) high number of context switches required to send messages between peer. This is a user space design issue. The fact dbus wakes up so much stuff wants fixing at the dbus level. > b) the D-bus daemon doing the routing and being a bottleneck of the whole. This is a userspace design issue. > c) amount of messages copied between kernel space and user space. This is mostly a userspace design issue and fixing a would fix much of c because you wouldn't keep sending people crap they didn't need. You've already got multicast facilities in kernel (if dbus must work by shouting not state change subscription like saner setups), and you've got BPF filtering facilities to try and cure some of the wakeups even doing multicast. Beyond that I don't see what the kernel can do given its mostly an architectural problem. Your model appears to be "since its causing enormous amounts of work we should do the work faster". The right model would appear to me to be "We shouldn't cause enormous amounts of work" Alan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 17:08 ` Alan Cox @ 2012-03-05 8:38 ` Luiz Augusto von Dentz 2012-03-05 14:05 ` Martin Mares 0 siblings, 1 reply; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-05 8:38 UTC (permalink / raw) To: Alan Cox Cc: Javier Martinez Canillas, Eric Dumazet, David Miller, shemminger, ying.xue, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann Hi Alan, On Fri, Mar 2, 2012 at 7:08 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: >> 2- The transport layer used by D-bus is not performance sensitive >> basically due: >> >> a) high number of context switches required to send messages between peer. > > This is a user space design issue. The fact dbus wakes up so much > stuff wants fixing at the dbus level. Can you be more specific, afaik centralizing the message subscription on the daemon minimize the wakeups of the applications, in the other hand BPF might be a better solution to filter the packets but is more recent than D-Bus itself. If you have a suggestion of a better design could you please let us know. >> b) the D-bus daemon doing the routing and being a bottleneck of the whole. > > This is a userspace design issue. But do you think letting the clients manage their connections to each other client it talk would have been better? The number of fd per client would sky rocketed. >> c) amount of messages copied between kernel space and user space. > > This is mostly a userspace design issue and fixing a would fix much of c > because you wouldn't keep sending people crap they didn't need. Afaik this is not a problem in D-Bus, perhaps if you have eavesdrop enabled but that is your configuration, the client only gets signals they subscribe to and messages addressed to its connection (method call). That doesn't mean that are bad implement client who subscribe for everything and which translate in more data being copied and wakeups, but that is not D-Bus fault and even with BPF the client can do that too. > You've already got multicast facilities in kernel (if dbus must work by > shouting not state change subscription like saner setups), and you've got > BPF filtering facilities to try and cure some of the wakeups even doing > multicast. Isn't that what is this all about? The problem seems to be that with BPF alone it would not be possible to implement multicast without sacrificing security, at least method call and reply messages should be private to the peers involved without eavesdrop being enabled. Btw this was posted in detail here: http://blogs.gnome.org/rodrigo/2012/02/27/d-bus-optimizations/ > Beyond that I don't see what the kernel can do given its mostly an > architectural problem. > > Your model appears to be "since its causing enormous amounts of work we > should do the work faster". The right model would appear to me to be "We > shouldn't cause enormous amounts of work" Please check the link above and tell me if that different than the model you suggested using BPF, apparently we are talking about the very same solution but the implementation detail are getting in the way because a lot of code was added. -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-05 8:38 ` Luiz Augusto von Dentz @ 2012-03-05 14:05 ` Martin Mares 2012-03-05 15:11 ` Javier Martinez Canillas 0 siblings, 1 reply; 55+ messages in thread From: Martin Mares @ 2012-03-05 14:05 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: Alan Cox, Javier Martinez Canillas, Eric Dumazet, David Miller, shemminger, ying.xue, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann Hi! > Please check the link above and tell me if that different than the > model you suggested using BPF, apparently we are talking about the > very same solution but the implementation detail are getting in the > way because a lot of code was added. ... First of all, you should come up with some real data confirming that the problem you are trying to solve really exist -- i.e., that in some real (and sensible) setup, routing all messages through DBUS daemon is a bottleneck. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth More memory available, but not for you! ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-05 14:05 ` Martin Mares @ 2012-03-05 15:11 ` Javier Martinez Canillas 2012-03-05 15:49 ` Martin Mares 0 siblings, 1 reply; 55+ messages in thread From: Javier Martinez Canillas @ 2012-03-05 15:11 UTC (permalink / raw) To: Martin Mares Cc: Luiz Augusto von Dentz, Alan Cox, Eric Dumazet, David Miller, shemminger, ying.xue, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann On 03/05/2012 03:05 PM, Martin Mares wrote: > Hi! > >> Please check the link above and tell me if that different than the >> model you suggested using BPF, apparently we are talking about the >> very same solution but the implementation detail are getting in the >> way because a lot of code was added. > > ... > > First of all, you should come up with some real data confirming that > the problem you are trying to solve really exist -- i.e., that in some > real (and sensible) setup, routing all messages through DBUS daemon > is a bottleneck. > > Have a nice fortnight We still don't have performance numbers for D-bus using AF_UNIX multicast since our D-bus daemon branch is still not stable. But Alban did some tests for the first approach (creating a new socket address family AF_DBUS) and the performance gain was x1.8 for KVM/i386 and x3 for N900/ARM. Alban's blog entry can be found here: http://alban-apinc.blogspot.com/2011/12/d-bus-in-kernel-faster.html Yes, D-bus has many architectural flaws that has to be addressed. The out-of-order delivery requirement maybe is not even important in real world and the control flow is something that probably we can fix in user-space too. That every message has to pass through the D-bus daemon is something that can also be fixed without requiring any kernel modification. But there is one problem that we can't solve without Linux kernel support. The fact that multicast messages have to be directly sent to the receivers. The problem is that Linux lacks of an easy IPC mechanism to send multicast messages to processes in the same machine. We can use UDP multicast over IP but even when the sending/receiving performance is similar to our AF_UNIX multicast implementation, the connection setup is much more complex. We will investigate if we can use Netlink sockets as an multicast IPC mechanism even when it is designed for the kernel-space/user-space use case and not well suited to user-space/user-space communication. Best regards, Javier ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-05 15:11 ` Javier Martinez Canillas @ 2012-03-05 15:49 ` Martin Mares 0 siblings, 0 replies; 55+ messages in thread From: Martin Mares @ 2012-03-05 15:49 UTC (permalink / raw) To: Javier Martinez Canillas Cc: Luiz Augusto von Dentz, Alan Cox, Eric Dumazet, David Miller, shemminger, ying.xue, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, Marcel Holtmann Hello! > We still don't have performance numbers for D-bus using AF_UNIX > multicast since our D-bus daemon branch is still not stable. But Alban > did some tests for the first approach (creating a new socket address > family AF_DBUS) and the performance gain was x1.8 for KVM/i386 and x3 > for N900/ARM. I did not ask for the performance improvement in artificial benchmarks, they will obviously show some :) What I am interested in is a test showing that _in_real_world_, the system spends considerable amount of time by passing messages. That is, a reason for optimizing the thing at all. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth American patent law: two monkeys, fourteen days. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 9:27 ` Javier Martinez Canillas 2012-03-02 9:39 ` David Miller 2012-03-02 13:13 ` Eric Dumazet @ 2012-03-05 18:55 ` David Lamparter 2 siblings, 0 replies; 55+ messages in thread From: David Lamparter @ 2012-03-05 18:55 UTC (permalink / raw) To: Javier Martinez Canillas Cc: David Miller, shemminger, ying.xue, luiz.dentz, eric.dumazet, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Fri, Mar 02, 2012 at 10:27:16AM +0100, Javier Martinez Canillas wrote: > Do you think that a simpler AF_UNIX multicast implementation without the > locking to guarantee order delivery and the flow control that blocks the > sender can be resend to you to reconsider merging it? I still don't get how blocking the sender when the receiver doesn't empty his socket queue can possibly ever be a good idea. All I see is a very nice way to choke the entire D-Bus from one malicious or broken app. Note that originally we were talking about blocking delivery for _multicast_. In that case you can't even poll on writability on a granularity finer than group level. Yet, this still comes up here and there as a requirement for IPC mechanisms to back D-Bus. When the buffers at the receiver are fully filled, IMHO that's the point to cut off the client. If this becomes an issue, the buffers can be increased in size, but at some point it's a sign that you're using D-Bus for too much? -David ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 8:55 ` David Miller 2012-03-02 9:27 ` Javier Martinez Canillas @ 2012-03-02 10:08 ` Luiz Augusto von Dentz 2012-03-03 12:20 ` Martin Mares 2012-03-02 22:19 ` david 2 siblings, 1 reply; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-02 10:08 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi David, On Fri, Mar 2, 2012 at 10:55 AM, David Miller <davem@davemloft.net> wrote: > From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> > Date: Fri, 2 Mar 2012 10:39:24 +0200 > >> Like I said before there is many projects using AF_UNIX as IPC >> transport, the documentation actually induces people to use for this >> purpose, and many would benefit from being able to do multicast. > > You can't have it both ways. > > If it's useful for many applications, then many applications would > benefit from a userland library that solved the problem using > existing facilities such as IP multicast. > > If it's only useful for dbus that that absoltely means we should > not add thousands of lines of code to the kernel specifically for > that application. Instead we should add many times that into dbus-daemon and do IP multicast, am I missing something? > So either way, kernel changes are not justified. I respect your opinion, but I don't agree with it, you are pushing userspace to a much more complex solution. At this point it would probably better to just use shared memory and forget about any security, eavesdrop all the way. -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 10:08 ` Luiz Augusto von Dentz @ 2012-03-03 12:20 ` Martin Mares 0 siblings, 0 replies; 55+ messages in thread From: Martin Mares @ 2012-03-03 12:20 UTC (permalink / raw) To: Luiz Augusto von Dentz Cc: David Miller, eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hello! > Instead we should add many times that into dbus-daemon and do IP > multicast, am I missing something? I completely agree with Alan that if routing all messages through DBUS daemon is a bottleneck, then something is seriously wrong with the way the applications use the message bus. Also, you mentioned the need of passing fd's between applications. If I understand correctly, it is a rare case and if you handle such messages in the same way as before, it won't hurt performance. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Even nostalgia isn't what it used to be. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-02 8:55 ` David Miller 2012-03-02 9:27 ` Javier Martinez Canillas 2012-03-02 10:08 ` Luiz Augusto von Dentz @ 2012-03-02 22:19 ` david 2 siblings, 0 replies; 55+ messages in thread From: david @ 2012-03-02 22:19 UTC (permalink / raw) To: David Miller Cc: luiz.dentz, eric.dumazet, javier.martinez, rodrigo.moya, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel On Fri, 2 Mar 2012, David Miller wrote: > From: Luiz Augusto von Dentz <luiz.dentz@gmail.com> > Date: Fri, 2 Mar 2012 10:39:24 +0200 > >> Like I said before there is many projects using AF_UNIX as IPC >> transport, the documentation actually induces people to use for this >> purpose, and many would benefit from being able to do multicast. > > You can't have it both ways. > > If it's useful for many applications, then many applications would > benefit from a userland library that solved the problem using > existing facilities such as IP multicast. I missed the start of this discussion (but did see the lwn.net article on it) as I understand it, they are looking for some features that are not in IP multicast (or at least not as I understand it) 1. reliable delivery 2. in-order delivery 3. sender blocking on recipients rather than dropping messages when the channel is full. IP multicast definantly does not do #3, and as far as I understand it, is essentially UDP to multiple recipients, and UDP does not provide either #1 or #2 Yes, this could be done entirely in userspace (with something like 0MQ as I see others mentioning), and I don't understand the Android aversion to any userspace daemons, but with all of that being said, I do think that a kernel-based mechanism that supports having iptables type filters on it would be a very nice thing to have (and should be able to re-use a lot of existing code that would end up being duplicated if this is done in a userspace daemon) now it may be that some of the requirements may result in error O_PONY or O_SANITY (the sender blocking seems like a potential problem, but that may possibly make sense as a configurable option) David Lang ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 11:57 ` Javier Martinez Canillas 2012-03-01 12:26 ` Eric Dumazet @ 2012-03-01 12:57 ` Luiz Augusto von Dentz 2012-03-01 20:42 ` David Miller 2 siblings, 0 replies; 55+ messages in thread From: Luiz Augusto von Dentz @ 2012-03-01 12:57 UTC (permalink / raw) To: Javier Martinez Canillas Cc: David Miller, rodrigo.moya, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel Hi Javier, On Thu, Mar 1, 2012 at 1:57 PM, Javier Martinez Canillas <javier.martinez@collabora.co.uk> wrote: > On 02/28/2012 08:05 PM, David Miller wrote: >> From: Rodrigo Moya <rodrigo.moya@collabora.co.uk> >> Date: Tue, 28 Feb 2012 11:47:39 +0100 >> >>> Because of all of this, UDP/IP multicast wasn't even considered as an >>> option. We might be wrong in some/all of those, so could you please >>> comment on them to check if that's so? >> >> You guys seem to want something that isn't AF_UNIX, ordering guarentees >> and whatnot, it really has no place in these protocols. >> >> You've designed a userlevel subsystem with requirements that no existing >> socket layer can give, and you just figured you'd work that out later. >> >> I think you rather should have reconsidered these premises and designed >> something that could handle reality which is AF_UNIX can't do multicast >> and nobody guarentees those strange ordering requirements you seem to >> have. > > Yes, you are right it doesn't follow AF_UNIX semantics so Unix sockets > is not the best place to add our multicast implementation. > > So, now we are trying a different approach. To create a new address > family AF_MCAST. That way we can have more control over the semantics of > the socket interface for that family. > > We expect to have some patches in a few days and we will resend. Lets say AF_MCAST is acceptable, wouldn't it make AF_UNIX obsolete? >From what I can tell a lot, if not most, of users of AF_UNIX uses it to implement some kind of IPC being it D-Bus, chromium or wayland and eventually all of them run into the same problems. Actually the article in lwn put it nice together: http://lwn.net/Articles/466304/ What about SCM_RIGHTS and other Ancillary Messages, would that be acceptable in other socket families? -- Luiz Augusto von Dentz ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 11:57 ` Javier Martinez Canillas 2012-03-01 12:26 ` Eric Dumazet 2012-03-01 12:57 ` Luiz Augusto von Dentz @ 2012-03-01 20:42 ` David Miller 2 siblings, 0 replies; 55+ messages in thread From: David Miller @ 2012-03-01 20:42 UTC (permalink / raw) To: javier.martinez Cc: rodrigo.moya, javier, eric.dumazet, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel From: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Date: Thu, 01 Mar 2012 12:57:18 +0100 > Does this makes more sense to you? No, creating an entire new socket family for one user doesn't make any sense. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX @ 2012-03-01 14:25 Erik Hugne 2012-03-01 17:18 ` Rodrigo Moya 0 siblings, 1 reply; 55+ messages in thread From: Erik Hugne @ 2012-03-01 14:25 UTC (permalink / raw) To: netdev-owner Cc: rodrigo.moya, David.Laight, davem, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, eric.dumazet Hi Have you considered using TIPC instead? It already provides multicast messaging with guaranteed ordering, and reliable delivery (SOCK _RDM) //E --Original message--- Sender: "netdev-owner@vger.kernel.org" <netdev-owner@vger.kernel.org> Sent time: Thu Mar 01 14:56:00 CET 2012 To: eric.dumazet@gmail.com Cc: rodrigo.moya@collabora.co.uk, David.Laight@ACULAB.COM, davem@davemloft.net, javier@collabora.co.uk, lennart@poettering.net, kay.sievers@vrfy.org, alban.crequy@collabora.co.uk, bart.cerneels@collabora.co.uk, sjoerd.simons@collabora.co.uk, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX On 03/01/2012 01:59 PM, Eric Dumazet wrote: > Le jeudi 01 mars 2012 à 13:50 +0100, Rodrigo Moya a écrit : >> the main problem in D-Bus we are trying to solve is the context >> switches, since right now, there is a daemon, which listens on a UNIX >> socket, and all traffic in the bus goes through it, and then the daemon >> has to route the messages it gets on that socket to the corresponding >> place(s). So, every time someone sends a message to D-Bus, since all >> traffic goes through the daemon, dbus-daemon gets waked-up, which is one >> of the biggest bottlenecks we are trying to fix. >> >> That's why we are thinking about using multicast with socket filters, so >> that the daemon only gets traffic it cares about and thus is not waked >> up and context switches don't happen when not needed. >> >> Using message queues, AFAICS, we would have the same problem, as the >> daemon would create the message queue and would get all traffic, right? >> > > This is why I mentioned extensions. > > Anyway, if you think multicast sockets is the way to go, then you could > setup a virtual network just to be able to use AF_INET multicast. > > Thats probably doable without kernel patching. > We could use AF_INET multicast on a local machine but we need some ordering and control flow requirements that are not guaranteed on UDP multicast over IP. That's why we thought to add a new address family AF_MCAST. To make it a general local multicast solution and not being too specific we added some flags to control its behavior like MCAST_MREQ_DROP_WHEN_FULL to decide to either block the sender or drop the packet when one receiver has its queue full. Regards, Javier -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 14:25 Erik Hugne @ 2012-03-01 17:18 ` Rodrigo Moya 2012-03-02 7:01 ` Ying Xue [not found] ` <4F506ABC.8050807@windriver.com> 0 siblings, 2 replies; 55+ messages in thread From: Rodrigo Moya @ 2012-03-01 17:18 UTC (permalink / raw) To: Erik Hugne Cc: netdev-owner, David.Laight, davem, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, eric.dumazet Hi Erik On Thu, 2012-03-01 at 15:25 +0100, Erik Hugne wrote: > Hi > Have you considered using TIPC instead? > It already provides multicast messaging with guaranteed ordering, and reliable delivery (SOCK _RDM) > I didn't know about TIPC, so have been having a quick look over it, and have some questions about it: * since it's for cluster use, I guess it's based on AF_INET sockets? if so, see the messages from Luis Augusto and Javier about this breaking current D-Bus apps, that use fd passing, for out-of-band data * D-Bus works locally, with all processes on the same machine, but there are 2 buses (daemons), one for system-related interfaces, and one per user, so how would this work with TIPC. Can you create several clusters/networks (as in TIPC addressing semantics) on the same machine on the loopback device? * I installed tipcutils on my machine, and it asked me if I wanted to setup the machine as a TIPC node. Does this mean every machine needs to be setup as a TIPC node before any app makes use of it? That is, can I just create a AF_TIPC socket on this machine and just make it work without any further setup? * I guess it is easy to prevent any TIPC-enabled machine to get into the local communication channel, right? That is, what's the security mechanism for allowing local-only communications? I'll stop asking questions and have a deeper look at it :) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX 2012-03-01 17:18 ` Rodrigo Moya @ 2012-03-02 7:01 ` Ying Xue [not found] ` <4F506ABC.8050807@windriver.com> 1 sibling, 0 replies; 55+ messages in thread From: Ying Xue @ 2012-03-02 7:01 UTC (permalink / raw) To: Rodrigo Moya Cc: Erik Hugne, netdev-owner, David.Laight@ACULAB.COM, davem, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, eric.dumazet Hi Rodrigo, I try to answer your questions about TIPC, please look at comments inline. Rodrigo Moya wrote: > Hi Erik > > On Thu, 2012-03-01 at 15:25 +0100, Erik Hugne wrote: > >> Hi >> Have you considered using TIPC instead? >> It already provides multicast messaging with guaranteed ordering, and reliable delivery (SOCK _RDM) >> >> > I didn't know about TIPC, so have been having a quick look over it, and > have some questions about it: > > * since it's for cluster use, I guess it's based on AF_INET sockets? if > so, see the messages from Luis Augusto and Javier about this breaking > current D-Bus apps, that use fd passing, for out-of-band data > > No, TIPC doesn't depend on AF_INET socket, instead it uses a separate address family(AF_TIPC). > * D-Bus works locally, with all processes on the same machine, but there > are 2 buses (daemons), one for system-related interfaces, and one per > user, so how would this work with TIPC. Can you create several > clusters/networks (as in TIPC addressing semantics) on the same machine > on the loopback device? > TIPC can both support two modes: single node mode and network mode. If we hope all application can easily talk each other locally, let TIPC just work under single node mode. Of course, it is in network mode, it also supports single node. How to let TIPC in the single node mode? It's very easy, and no any specific configuration is needed. After insert TIPC module, it enters into the mode by default. As Erik specified, TIPC multicast mechanism is very useful for D-Bus. It has several cool and powerful special features: 1. It can guarantee multicast messages are reliably delivered in order. 2. It can support one-to-many and many-to-many real-time communication within node or network. 3. It also can support functional addressing which means location transparent addressing allows a client application to access a server without having to know its precise location in the node or the network. The basic unit of functional addressing within TIPC is the port name, which is typically denoted as {type,instance}. A port name consists of a 32-bit type field and a 32-bit instance field, both of which are chosen by the application. Often, the type field indicates the class of service provided by the port, while the instance field can be used as a sub-class indicator. Further support for service partitioning is provided by an address type called port name sequence. This is a three-integer structure defining a range of port names, i.e., a name type plus the lower and upper boundary of the instance range. This addressing schema is very useful for multicast communication. For instance, as you mentioned, for D-Bus may need two different buses, one for system, another for user. In this case, when using TIPC, it's very easy to meet its requirement. We can assign one name type to system bus, and another name type is to user bus. Under one bus, we also can divide it into many different sub-buses with lower and upper. For example, once one application publishes one service/port name like {1000, 0, 1000} as system bus channel, any application can send messages to {1000, 0, 100} simultaneously. Of course, for example, one application can publish {1000, 0, 500} as sub-bus of the system bus, another can publish {1000, 501, 1000} as another system sub-bus. At the moment, one application can send a message to {1000, 0, 1000} port, it means the two applications including published {1000, 0, 500} and {1000, 501, 1000} all can receive the message. If D-Bus uses this schema, I believe the central D-Bus daemons is not necessary any more. Any application can directly talk each other by one-to-one, one-to-many, and many-to-many way. 4. TIPC also has another important and useful feature which allows client applications to subscribe one service port name by receiving information about what port name exist within node or network. For example, if one application publishes one system bus service like {1000, 0, 500}, any client applications which subscribe the service can automatically detect its death in time once the application publishing {1000, 0, 500} is crashed accidentally. In all, it also have other useful features, about more detailed information, please refer its official web site: http://tipc.sourceforge.net/ > * I installed tipcutils on my machine, and it asked me if I wanted to > setup the machine as a TIPC node. Does this mean every machine needs to > be setup as a TIPC node before any app makes use of it? That is, can I > just create a AF_TIPC socket on this machine and just make it work > without any further setup? > > No, as I indicate before, it's no extra configuration if you expect it just works in single node mode. Actually there has several demos in tipcutils package, you can further learn about its functions and how to work etc. > * I guess it is easy to prevent any TIPC-enabled machine to get into the > local communication channel, right? That is, what's the security > mechanism for allowing local-only communications? > > When publishing service name, you can specify the level of visibility, or scope, that the name has within the TIPC network: either node scope, cluster scope, or zone scope. So if you want it is just valid locally, you can designated it as node scope, which TIPC then ensures that only applications within the same node can access the port using that name. Regards, Ying > I'll stop asking questions and have a deeper look at it :) > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 55+ messages in thread
[parent not found: <4F506ABC.8050807@windriver.com>]
* Re: [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX [not found] ` <4F506ABC.8050807@windriver.com> @ 2012-03-05 15:49 ` Erik Hugne 0 siblings, 0 replies; 55+ messages in thread From: Erik Hugne @ 2012-03-05 15:49 UTC (permalink / raw) To: Ying Xue Cc: Rodrigo Moya, netdev-owner, David.Laight@ACULAB.COM, davem, javier, lennart, kay.sievers, alban.crequy, bart.cerneels, sjoerd.simons, netdev, linux-kernel, eric.dumazet netdev is probably not the right channel to discuss how to use service partitioning in TIPC, but i think that Yings suggestion using a "system-bus" publication, and separate d-bus user publications is sound. One problem is that TIPC does not support passing FD's between processes (SCM_RIGHTS anc. data). But adding support for this in TIPC should have a relatively small code footprint. //E On 2012-03-02 07:37, Ying Xue wrote: > Hi Rodrigo, > > I try to answer your questions about TIPC, please look at comments inline. > > > Rodrigo Moya wrote: >> Hi Erik >> >> On Thu, 2012-03-01 at 15:25 +0100, Erik Hugne wrote: >> >>> Hi >>> Have you considered using TIPC instead? >>> It already provides multicast messaging with guaranteed ordering, and reliable delivery (SOCK _RDM) >>> >>> >> I didn't know about TIPC, so have been having a quick look over it, and >> have some questions about it: >> >> * since it's for cluster use, I guess it's based on AF_INET sockets? if >> so, see the messages from Luis Augusto and Javier about this breaking >> current D-Bus apps, that use fd passing, for out-of-band data >> >> > No, TIPC doesn't depend on AF_INET socket, instead it uses a separate address > family(AF_TIPC.). >> * D-Bus works locally, with all processes on the same machine, but there >> are 2 buses (daemons), one for system-related interfaces, and one per >> user, so how would this work with TIPC. Can you create several >> clusters/networks (as in TIPC addressing semantics) on the same machine >> on the loopback device? >> > > TIPC can both support two modes: single node mode and network mode. > If we hope all application can easily talk each other locally, let TIPC just > work under single node mode. > Of course, it is in network mode, it also supports single node. > > How to let TIPC in the single node mode? > It's very easy, and no any specific configuration is needed. After insert TIPC > module, it enters into the mode by default. > > As Erik specified, TIPC multicast mechanism is very useful for D-Bus. It has > several cool and powerful special features: > 1. It can guarantee multicast messages are reliably delivered in order. > 2. It can support one-to-many and many-to-many real-time communication within > node or network. > 3. It also can support functional addressing which means location transparent > addressing allows a client application to access a server without having to know > its precise location in the node or the network. The basic unit of functional > addressing within TIPC is the /port name/, which is typically denoted as > {type,instance}. A port name consists of a 32-bit type field and a 32-bit > instance field, both of which are chosen by the application. Often, the type > field indicates the class of service provided by the port, while the instance > field can be used as a sub-class indicator. > Further support for service partitioning is provided by an address type called > port name sequence. This is a three-integer structure defining a range of port > names, i.e., a name type plus the lower and upper boundary of the instance > range. This addressing schema is very useful for multicast communication. For > instance, as you mentioned, for D-Bus may need two different buses, one for > system, another for user. In this case, when using TIPC, it's very easy to meet > its requirement. We can assign one name type to system bus, and another name > type is to user bus. Under one bus, we also can divide it into many different > sub-buses with lower and upper. For example, once one application publishes one > service/port name like {1000, 0, 1000} as system bus channel, any application > can send messages to {1000, 0, 100} simultaneously. Of course, for example, one > application can publish {1000, 0, 500} as sub-bus of the system bus, another can > publish {1000, 501, 1000} as another system sub-bus. At the moment, one > application can send a message to {1000, 0, 1000} port, it means the two > applications including published {1000, 0, 500} and {1000, 501, 1000} all can > receive the message. > > If D-Bus uses this schema, I believe the central D-Bus daemons is not necessary > any more. Any application can directly talk each other by one-to-one, > one-to-many, and many-to-many way. > > 4. TIPC also has another important and useful feature which allows client > applications to subscribe one service port name by receiving information about > what port name exist within node or network. For example, if one application > publishes one system bus service like {1000, 0, 500}, any client applications > which subscribe the service can automatically detect its death in time once the > application publishing {1000, 0, 500} is crashed accidentally. > > In all, it also have other useful features, about more detailed information, > please refer its official web site: http://tipc.sourceforge.net/ > > >> * I installed tipcutils on my machine, and it asked me if I wanted to >> setup the machine as a TIPC node. Does this mean every machine needs to >> be setup as a TIPC node before any app makes use of it? That is, can I >> just create a AF_TIPC socket on this machine and just make it work >> without any further setup? >> > No, as I indicate before, it's no extra configuration if you expect it just > works in single node mode. > Actually there has several demos in tipcutils package, you can further lean > about its functions and how to work etc. > >> * I guess it is easy to prevent any TIPC-enabled machine to get into the >> local communication channel, right? That is, what's the security >> mechanism for allowing local-only communications? >> >> > When publishing service name, you can specify the level of visibility, or > /scope/, that the name has within the TIPC network: either /node scope/, > /cluster scope/, or /zone scope/. > So if you want it is just valid locally, you can designated it as node scope, > which TIPC then ensures that only applications within the same node can access > the port using that name. > > Regards, > Ying > >> I'll stop asking questions and have a deeper look at it :) >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message tomajordomo@vger.kernel.org >> More majordomo info athttp://vger.kernel.org/majordomo-info.html >> >> > ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2012-03-05 18:55 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-20 15:57 [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 01/10] af_unix: Documentation on multicast unix sockets Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 02/10] af_unix: Add constant for unix socket options level Javier Martinez Canillas 2012-02-20 15:57 ` [PATCH 03/10] af_unix: add setsockopt on unix sockets Javier Martinez Canillas 2012-02-20 16:20 ` David Miller 2012-02-20 19:13 ` [PATCH 0/10] af_unix: add multicast and filtering features to AF_UNIX Colin Walters 2012-02-21 8:07 ` Rodrigo Moya 2012-02-24 20:36 ` David Miller 2012-02-27 14:00 ` Javier Martinez Canillas 2012-02-27 19:05 ` David Miller 2012-02-28 10:47 ` Rodrigo Moya 2012-02-28 14:28 ` David Lamparter 2012-02-28 15:24 ` Javier Martinez Canillas 2012-02-28 16:33 ` Javier Martinez Canillas 2012-02-28 19:05 ` David Miller 2012-03-01 11:57 ` Javier Martinez Canillas 2012-03-01 12:26 ` Eric Dumazet 2012-03-01 12:33 ` David Laight 2012-03-01 12:50 ` Rodrigo Moya 2012-03-01 12:59 ` Eric Dumazet 2012-03-01 13:56 ` Javier Martinez Canillas 2012-03-01 16:00 ` Eric Dumazet 2012-03-01 16:02 ` Luiz Augusto von Dentz 2012-03-01 17:06 ` Javier Martinez Canillas 2012-03-01 17:59 ` Eric Dumazet 2012-03-01 18:10 ` Alan Cox 2012-03-01 19:02 ` Javier Martinez Canillas 2012-03-01 19:29 ` Javier Martinez Canillas 2012-03-01 18:53 ` David Dillow 2012-03-01 20:55 ` David Miller 2012-03-02 4:40 ` Stephen Hemminger 2012-03-01 20:44 ` David Miller 2012-03-01 22:01 ` Luiz Augusto von Dentz 2012-03-01 22:08 ` David Miller 2012-03-02 8:39 ` Luiz Augusto von Dentz 2012-03-02 8:55 ` David Miller 2012-03-02 9:27 ` Javier Martinez Canillas 2012-03-02 9:39 ` David Miller 2012-03-02 13:13 ` Eric Dumazet 2012-03-02 16:34 ` Javier Martinez Canillas 2012-03-02 17:08 ` Alan Cox 2012-03-05 8:38 ` Luiz Augusto von Dentz 2012-03-05 14:05 ` Martin Mares 2012-03-05 15:11 ` Javier Martinez Canillas 2012-03-05 15:49 ` Martin Mares 2012-03-05 18:55 ` David Lamparter 2012-03-02 10:08 ` Luiz Augusto von Dentz 2012-03-03 12:20 ` Martin Mares 2012-03-02 22:19 ` david 2012-03-01 12:57 ` Luiz Augusto von Dentz 2012-03-01 20:42 ` David Miller 2012-03-01 14:25 Erik Hugne 2012-03-01 17:18 ` Rodrigo Moya 2012-03-02 7:01 ` Ying Xue [not found] ` <4F506ABC.8050807@windriver.com> 2012-03-05 15:49 ` Erik Hugne
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).