linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* New Address Family: Inter Process Networking (IPN)
@ 2007-12-05 16:40 Renzo Davoli
  2007-12-05 21:55 ` Stephen Hemminger
  2007-12-05 23:39 ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Renzo Davoli @ 2007-12-05 16:40 UTC (permalink / raw)
  To: linux-kernel

Inter Process Networking: 
a kernel module (and some simple kernel patches) to provide 
AF_IPN: a new address family for process networking, i.e. multipoint,
multicast/broadcast communication among processes (and networks).

WHAT IS IT?
-----------
Berkeley socket have been designed for client server or point to point
communication. All existing Address Families implement this idea.
IPN is a new address family designed for one-to-many, many-to-many and 
peer-to-peer communication among processes.
IPN is an Inter Process Communication paradigm where all the processes
appear as they were connected by a networking bus.
On IPN, processes can interoperate using real networking protocols 
(e.g. ethernet) but also using application defined protocols (maybe 
just sending ascii strings, video or audio frames, etc).
IPN provides networking (in the broaden definition you can imagine) to
the processes. Processes can be ethernet nodes, run their own TCP-IP stacks
if they like (e.g. virtual machines), mount ATAonEthernet disks, etc.etc.
IPN networks can be interconnected with real networks or IPN networks
running on different computers can interoperate (can be connected by
virtual cables).
IPN is part of the Virtual Square Project (vde, lwipv6, view-os, 
umview/kmview, see wiki.virtualsquare.org).

WHY?
----
Many applications can benefit from IPN.
First of all VDE (Virtual Distributed Ethernet): one service of IPN is a
kernel implementation of VDE.
IPN can be useful for applications where one or some processes feed their 
data to several consuming processes (maybe joining the stream at run time).
IPN sockets can be also connected to tap (tuntap) like interfaces or
to real interfaces (like "brctl addif").
There are specific ioctls to define a tap interface or grab an existing
one.
Several existing services could be implemented (and often could have extended
features) on the top of IPN:
- kernel bridge
- tuntap
- macvlan
IPN could be used (IMHO) to provide multicast services to processes.
Audio frames or video frames could be multiplexed such that multiple
applications can use them. I think that something like Jack can be
implemented on the top of IPN. Something like a VideoJack can
provide video frames to several applications: e.g. the same image from a
camera can be viewed by xawtv, recorded and sent to a streaming service.
IPN sockets can be used wherever there is the idea of broadcasting channel 
i.e. where processes can "join (and leave) the information flow" at
runtime. 
Different delivery policies can be defined as IPN protocols (loaded 
as submodules of ipn.ko).
e.g. ethernet switch is a policy (kvde_switch.ko: packets are unicast 
delivered if the MAC address is already in the switching hash table), 
we are designing an extendended switch, full of interesting features like
our userland vde_switch (with vlan/fst/manamement etc..), and a layer3
switch, but other policies can be defined to implement the specific
requirements of other services. I feel that there is no limits to creativity 
about multicast services for processes.
Userspace services (like vde or jack) do exist, but IPN provides
a faster and unified support.

HOW?
----
The complete specifications for IPN can be found here:
http://wiki.virtualsquare.org/index.php/IPN

bind() creates the socket (if it does not already exist). When bind() succeeds, 
the process has the right to manage the "network". 
No data is received or can be send if the socket is not connected 
(only get/setsockopt and ioctl work on bound unconnected sockets).

connect() is used to join the network. When the socket is connected it 
is possible to send/receive data. If the socket is already bound it is
useless to specify the socket again (you can use NULL, or specify the same
address).
connect() can be also used without bind(). In this case the process sends and
receives data but it cannot manage the network (in this case the socket
address specification is required).

listen() and accept() are for servers, thus they does not exist for IPN.

Examples:
1- Peer-to-Peer Communication:
Several processes run the same code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path="/tmp/sockipn"};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST); 
  err=bind(s,(struct sockaddr *)&sun,sizeof(sun));
  err=connect(s,NULL,0);

In this case all the messages sent by each process get received by all the
other processes (IPN_BROADCAST). 
The processes need to be able to receive data when there are pending packets, 
e.g. by using poll/select and event driven programming or multithreading.

2- (One or) Some senders/many receivers
The sender runs the following code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path="/tmp/sockipn"};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST);
  err=shutdown(s,SHUT_RD);
  err=bind(s,(struct sockaddr *)&sun,sizeof(sun));
  err=connect(s,NULL,0);

The receivers do not need to define the network, thus they skip the bind():

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path="/tmp/sockipn"};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST); 
  err=shutdown(s,SHUT_WR);
  err=connect(s,(struct sockaddr *)&sun,sizeof(sun));

In the previous examples processes can send and receive every kind of
data. 

When messages are ethernet packets (maybe from virtual machines), IPN 
works like a Hub by using the IPN_BROADCAST protocol.
Different protocols (delivery policies) can be specified by changing 
IPN_BROADCAST with a different tag. 
A IPN protocol specific submodule must have been registered 
the protocol tag in advance. (e.g. when kvde_switch.ko is loaded 
IPN_VDESWITCH can be used too).
The basic broadcasting protocol IPN_BROADCAST is built-in (all the 
messages get delivered to all the connected processes but the sender). 

IPN sockets use the filesystem for naming and access control.
srwxr-xr-x 1 renzo renzo 0 2007-12-04 22:28 /tmp/sockipn
An IPN socket appear in the file like a UNIX socket.
r/w permissions represent the right to receive from/send data to the
socket. The 'x' permission represent the right to manage the socket.
connect() automatically shuts down SHUT_WR or SHUT_RD if the user has not
the correspondent right.

WHAT WE NEED FROM THE LINUX KERNEL COMMUNITY
--------------------------------------------
0- (Constructive) comments.

1- The "official" assignment of an Address Family.
(It is enough for everything but interface grabbing, see 2)

in include/linux/net.h:
- #define NPROTO          34              /* should be enough for now..  */
+ #define NPROTO          35              /* should be enough for now..  */

in include/linux/socket.h
+ #define AF_IPN 34
+ #define PF_IPN AF_IPN
- #define AF_MAX          34      /* For now.. */
+ #define AF_MAX          35      /* For now.. */

This seems to be quite simple.

2- Another "grabbing hook" for interfaces (like the ones already
existing for the kernel bridge and for the macvlan).

In include/linux/netdevice.h:
among the fields of struct net_device:

        /* bridge stuff */
	struct net_bridge_port  *br_port;
	/* macvlan */
	struct macvlan_port     *macvlan_port;
+        /* ipn */
+        struct ipn_node        *ipn_port;
		 
	/* class/net/name entry */
	struct device           dev;

In net/core/dev.c, we need another section for grabbing packets....
like the ones defined for CONFIG_BRIDGE and CONFIG_MACVLAN.
I can write the patch (it needs just tens of minutes of cut&paste).
We are studying some way to register/deregister grabbing services,
I feel this would be the cleanest way. 

WHERE?
------
There is an experimental version in the VDE svn tree.
http://sourceforge.net/projects/vde

The current implementation can be compiled as a module on linux >= 2.6.22.
We have currently "stolen" the AF_RXRPC and the kernel bridge hook, thus
this experimental implementation is incompatible with RXRPC and the kernel
bridge (sharing the same data structure). This is just to show the
effectiveness of this idea, in this way it can be compiled as a module
without patching the kernel. 
We'll migrate IPN to its specific AF and grabbing hook as soon as they
have been defined. 

renzo
(V^2 project)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-05 16:40 New Address Family: Inter Process Networking (IPN) Renzo Davoli
@ 2007-12-05 21:55 ` Stephen Hemminger
  2007-12-06  5:38   ` Renzo Davoli
  2007-12-05 23:39 ` Andi Kleen
  1 sibling, 1 reply; 31+ messages in thread
From: Stephen Hemminger @ 2007-12-05 21:55 UTC (permalink / raw)
  To: linux-kernel

On Wed, 5 Dec 2007 17:40:55 +0100
renzo@cs.unibo.it (Renzo Davoli) wrote:

> 
> WHAT WE NEED FROM THE LINUX KERNEL COMMUNITY
> --------------------------------------------
> 0- (Constructive) comments.
> 
> 1- The "official" assignment of an Address Family.
> (It is enough for everything but interface grabbing, see 2)
> 
> in include/linux/net.h:
> - #define NPROTO          34              /* should be enough for now..  */
> + #define NPROTO          35              /* should be enough for now..  */
> 
> in include/linux/socket.h
> + #define AF_IPN 34
> + #define PF_IPN AF_IPN
> - #define AF_MAX          34      /* For now.. */
> + #define AF_MAX          35      /* For now.. */
> 
> This seems to be quite simple.
> 
> 2- Another "grabbing hook" for interfaces (like the ones already
> existing for the kernel bridge and for the macvlan).
> 
> In include/linux/netdevice.h:
> among the fields of struct net_device:
> 
>         /* bridge stuff */
> 	struct net_bridge_port  *br_port;
> 	/* macvlan */
> 	struct macvlan_port     *macvlan_port;
> +        /* ipn */
> +        struct ipn_node        *ipn_port;
> 		 
> 	/* class/net/name entry */
> 	struct device           dev;
> 
> In net/core/dev.c, we need another section for grabbing packets....
> like the ones defined for CONFIG_BRIDGE and CONFIG_MACVLAN.
> I can write the patch (it needs just tens of minutes of cut&paste).
> We are studying some way to register/deregister grabbing services,
> I feel this would be the cleanest way. 
> 
> WHERE?
> ------
> There is an experimental version in the VDE svn tree.
> http://sourceforge.net/projects/vde
>

Post complete source code for kernel part to netdev@vger.kernel.org.
If you want the hooks, you need to include the full source code for inclusion
in mainline. All the Documentation/SubmittingPatches rules apply;
you can't just ask for "facilitators" and expect to keep your stuff out of tree.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-05 16:40 New Address Family: Inter Process Networking (IPN) Renzo Davoli
  2007-12-05 21:55 ` Stephen Hemminger
@ 2007-12-05 23:39 ` Andi Kleen
  2007-12-06  5:30   ` Renzo Davoli
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-05 23:39 UTC (permalink / raw)
  To: Renzo Davoli; +Cc: linux-kernel

renzo@cs.unibo.it (Renzo Davoli) writes:

> Berkeley socket have been designed for client server or point to point
> communication. All existing Address Families implement this idea.

Netlink is multicast/broadcast by default for once. And BC/MC certainly
works for IPv[46] and a couple of other protocols too.

> IPN is an Inter Process Communication paradigm where all the processes
> appear as they were connected by a networking bus.

Sounds like netlink. See also RFC 3549

Haven't read further I admit.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-05 23:39 ` Andi Kleen
@ 2007-12-06  5:30   ` Renzo Davoli
  2007-12-06  6:19     ` Kyle Moffett
  2007-12-06 16:35     ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Renzo Davoli @ 2007-12-06  5:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Thu, Dec 06, 2007 at 12:39:22AM +0100, Andi Kleen wrote:
> renzo@cs.unibo.it (Renzo Davoli) writes:
> 
> > Berkeley socket have been designed for client server or point to point
> > communication. All existing Address Families implement this idea.
> Netlink is multicast/broadcast by default for once. And BC/MC certainly
> works for IPv[46] and a couple of other protocols too.
> 
> > IPN is an Inter Process Communication paradigm where all the processes
> > appear as they were connected by a networking bus.
> 
> Sounds like netlink. See also RFC 3549

RFC 3549 says:
"This document describes Linux Netlink, which is used in Linux both as
   an intra-kernel messaging system as well as between kernel and user
   space."

We know AF_NETLINK, our user-space stack lwipv6 supports it.

AF_IPN is different. 
AF_IPN is the broadcast and peer-to-peer extension of AF_UNIX.
It supports communication among *user* processes. 

Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an
Ethernet Hub and communicate among themselves with the hosting computer 
and the world by a tap like interface.

You can also grab an interface (say eth1) and use eth0 for your hosting
computer and eth1 for the IPN network of virtual machines.

If you load the kvde_switch submodule IPN can be a virtual Ethernet switch.

This example is already working using the svn versions of ipn and
vdeplug.

Another Example:

You have a continuous stream of data packets generated by a process,
and you want to send this data to many processes.
Maybe the set of processes is not known in advance, you want to send the
data to any interested process. Some kind of publish&subscribe
communication service (among unix processes not on TCP-IP).
Without IPN you need a server. With IPN the sender creates the socket
connects to it and feed it with data packets. All the interested 
receivers connects to it and start reading. That's all.

I hope that this message can give a better undertanding of what IPN is.

	renzo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-05 21:55 ` Stephen Hemminger
@ 2007-12-06  5:38   ` Renzo Davoli
  2007-12-06  5:43     ` Renzo Davoli
  2007-12-06  6:04     ` Stephen Hemminger
  0 siblings, 2 replies; 31+ messages in thread
From: Renzo Davoli @ 2007-12-06  5:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel

On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
> On Wed, 5 Dec 2007 17:40:55 +0100
> renzo@cs.unibo.it (Renzo Davoli) wrote:
> > 0- (Constructive) comments.
> > 1- The "official" assignment of an Address Family.
> > 2- Another "grabbing hook" for interfaces (like the ones already
> > We are studying some way to register/deregister grabbing services,
> > I feel this would be the cleanest way. 
> 
> Post complete source code for kernel part to netdev@vger.kernel.org.
I'll do it as soon as possible.
> If you want the hooks, you need to include the full source code for inclusion
> in mainline. All the Documentation/SubmittingPatches rules apply;
> you can't just ask for "facilitators" and expect to keep your stuff out of tree.
I am sorry if I was misunderstood.
I did not want any "facilitator", nor I wanted to keep my code outside
the kernel, on the contrary.
It is perfectly okay for me to provide the entire code for inclusion.
The purposes of my message were the following:
- I wanted to introduce the idea and say to the linux kernel community
  that a team is working on it.
- Address family: is it okay to send a patch that add a new AF?
is there a "AF registry" somewhere? (like the device major/minor
registry or the well-known port assignment for TCP-IP).
- Hook: we have two different options. We can add another grabbing
inline function like those used by the bridge and macvlan or we can
design a grabbing service registration facility. Which one is preferrable?
The former is simpler, the latter is more elegant but it requires some 
changes in the kernel bridge code.
So the former choice is between less-invasive,safer,inelegant, the
latter is more-invasive,less safe,elegant.

We need a bit of time to stabilize the code: deeply testing the existing
features and implementing some more ideas we have on it.
In the meanwhile we would be grateful if the community could kindly ask to the
questions above.

renzo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  5:38   ` Renzo Davoli
@ 2007-12-06  5:43     ` Renzo Davoli
  2007-12-06  6:04     ` Stephen Hemminger
  1 sibling, 0 replies; 31+ messages in thread
From: Renzo Davoli @ 2007-12-06  5:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel

> In the meanwhile we would be grateful if the community could kindly ask to the
> questions above.
Obviously I meant:
In the meanwhile we would be grateful if the community could kindly *answer*
to the questions above

sorry (it is early morning here, it happens ;-)

	renzo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  5:38   ` Renzo Davoli
  2007-12-06  5:43     ` Renzo Davoli
@ 2007-12-06  6:04     ` Stephen Hemminger
  1 sibling, 0 replies; 31+ messages in thread
From: Stephen Hemminger @ 2007-12-06  6:04 UTC (permalink / raw)
  To: Renzo Davoli; +Cc: linux-kernel

On Thu, 6 Dec 2007 06:38:21 +0100
renzo@cs.unibo.it (Renzo Davoli) wrote:

> On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
> > On Wed, 5 Dec 2007 17:40:55 +0100
> > renzo@cs.unibo.it (Renzo Davoli) wrote:
> > > 0- (Constructive) comments.
> > > 1- The "official" assignment of an Address Family.
> > > 2- Another "grabbing hook" for interfaces (like the ones already
> > > We are studying some way to register/deregister grabbing services,
> > > I feel this would be the cleanest way. 
> > 
> > Post complete source code for kernel part to netdev@vger.kernel.org.
> I'll do it as soon as possible.
> > If you want the hooks, you need to include the full source code for inclusion
> > in mainline. All the Documentation/SubmittingPatches rules apply;
> > you can't just ask for "facilitators" and expect to keep your stuff out of tree.
> I am sorry if I was misunderstood.
> I did not want any "facilitator", nor I wanted to keep my code outside
> the kernel, on the contrary.

Greate

> It is perfectly okay for me to provide the entire code for inclusion.
> The purposes of my message were the following:
> - I wanted to introduce the idea and say to the linux kernel community
>   that a team is working on it.
> - Address family: is it okay to send a patch that add a new AF?
> is there a "AF registry" somewhere? (like the device major/minor
> registry or the well-known port assignment for TCP-IP).

The usual process is to just add the value as part of the patchset.
You then need to tell the glibc maintainers so it gets included appropriately
in userspace.

> - Hook: we have two different options. We can add another grabbing
> inline function like those used by the bridge and macvlan or we can
> design a grabbing service registration facility. Which one is preferrable?

The problem with making it a registration facilties are:
 * risk of making it easier for non-GPL out of tree abuse
 * possible ordering issues: ie. by hardcoding each hook, the
    behaviour is defined in the case of multiple usages on the same
    machine.

> The former is simpler, the latter is more elegant but it requires some 
> changes in the kernel bridge code.

Not a big deal, but see above

> So the former choice is between less-invasive,safer,inelegant, the
> latter is more-invasive,less safe,elegant.

 
> We need a bit of time to stabilize the code: deeply testing the existing
> features and implementing some more ideas we have on it.
> In the meanwhile we would be grateful if the community could kindly ask to the
> questions above.

I am a believer in review early and often. It is easier to just deal with
the nuisance issues (style, naming, configuration) at the beginning rather
than the final stage of the project.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  5:30   ` Renzo Davoli
@ 2007-12-06  6:19     ` Kyle Moffett
  2007-12-06  6:59       ` David Newall
  2007-12-06 16:35     ` Andi Kleen
  1 sibling, 1 reply; 31+ messages in thread
From: Kyle Moffett @ 2007-12-06  6:19 UTC (permalink / raw)
  To: Renzo Davoli; +Cc: Andi Kleen, linux-kernel

On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
> AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer  
> extension of AF_UNIX. It supports communication among *user*  
> processes.

Ok, you say it's different, but then you describe how IP unicast and  
broadcast work.  Both are frequently used for communication among  
"*user* processes".  Please provide significantly more details about  
exactly *how* it's different.


> Example:
>
> Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an  
> Ethernet Hub and communicate among themselves with the hosting  
> computer and the world by a tap like interface.

You say "tap like" interface, but people do this already with  
existing infrastructure.  You can connect Qemu, UML, and KVM to a  
standard linus "tap" interface, and then use the standard Linux  
bridging code to connect the "tap" interface to your existing network  
interfaces.  Alternatively you could use the standard and well-tested  
IP routing/firewalling/NAT code to move your packets around.  None of  
this requires new network infrastructure in the slightest.  If you  
have problems with the existing code, please improve it instead of  
creating a slightly incompatible replacement which has different bugs  
and workarounds.


> You can also grab an interface (say eth1) and use eth0 for your  
> hosting computer and eth1 for the IPN network of virtual machines.

You can do that already with the bridging code.


> If you load the kvde_switch submodule IPN can be a virtual Ethernet  
> switch.

As I described above, this can be done with the existing bridging and  
tun/tap code.


> Another Example:
>
> You have a continuous stream of data packets generated by a  
> process, and you want to send this data to many processes.  Maybe  
> the set of processes is not known in advance, you want to send the  
> data to any interested process. Some kind of publish&subscribe  
> communication service (among unix processes not on TCP-IP). Without  
> IPN you need a server. With IPN the sender creates the socket  
> connects to it and feed it with data packets. All the interested  
> receivers connects to it and start reading. That's all.

This is already done frequently in userspace.  Just register a port  
number with IANA on which to implement a "registration" server and  
write a little daemon to listen on 127.0.0.1:${YOUR_PORT}.  Your  
interconnecting programs then use either unicast or multicast sockets  
to bind, then report to the registration server what service you are  
offering and what port it's on.  Your "receivers" then connect to the  
registration server, ask what port a given service is on, and then  
multicast-listen or unicast-connect to access that service.  The best  
part is that all of the performance implications are already  
thoroughly understood.  Furthermore, if you want to extend your  
communication protocol to other hosts as well, you just have to  
replace the 127.0.0.1 bind with a global bind.  This is exactly how  
the standard-specified multiple-participant "SIP" protocol works, for  
example.


So if you really think this is something that belongs in the kernel  
you need to provide much more detailed descriptions and use-cases for  
why it cannot be implemented in user-space or with small  
modifications to existing UDP/TCP networking.

Cheers,
Kyle Moffett


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  6:19     ` Kyle Moffett
@ 2007-12-06  6:59       ` David Newall
  2007-12-06 16:34         ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: David Newall @ 2007-12-06  6:59 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Renzo Davoli, Andi Kleen, linux-kernel

Kyle Moffett wrote:
> On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
>> AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer 
>> extension of AF_UNIX. It supports communication among *user* processes.
>
> Ok, you say it's different, but then you describe how IP unicast and 
> broadcast work.

Renzo also described something new (in the socket() arena): the 
multi-reader, multi-writer is just not available in IP.

I wonder if this solves the same problem as d-bus?


> So if you really think this is something that belongs in the kernel 
> you need to provide much more detailed descriptions and use-cases for 
> why it cannot be implemented in user-space or with small modifications 
> to existing UDP/TCP networking. 

I would strengthen this sentiment: If you think something belongs in the 
kernel, you need to argue your case (provide much more detailed 
descriptions and use-cases.)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  6:59       ` David Newall
@ 2007-12-06 16:34         ` Andi Kleen
  2007-12-06 22:21           ` David Newall
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 16:34 UTC (permalink / raw)
  To: David Newall; +Cc: Kyle Moffett, Renzo Davoli, Andi Kleen, linux-kernel

> Renzo also described something new (in the socket() arena): the 
> multi-reader, multi-writer is just not available in IP.

How is that different from a multicast group?

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06  5:30   ` Renzo Davoli
  2007-12-06  6:19     ` Kyle Moffett
@ 2007-12-06 16:35     ` Andi Kleen
  2007-12-06 20:36       ` Chris Friesen
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 16:35 UTC (permalink / raw)
  To: Renzo Davoli; +Cc: Andi Kleen, linux-kernel

> "This document describes Linux Netlink, which is used in Linux both as
>    an intra-kernel messaging system as well as between kernel and user
>    space."

It can be used between user space daemons as well. In fact it is.
e.g. they often listen to each other's messages.

-Andi


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 16:35     ` Andi Kleen
@ 2007-12-06 20:36       ` Chris Friesen
  2007-12-06 21:26         ` Andi Kleen
  2007-12-07  3:41         ` David Miller
  0 siblings, 2 replies; 31+ messages in thread
From: Chris Friesen @ 2007-12-06 20:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Renzo Davoli, linux-kernel

Andi Kleen wrote:
>>"This document describes Linux Netlink, which is used in Linux both as
>>   an intra-kernel messaging system as well as between kernel and user
>>   space."
> 
> 
> It can be used between user space daemons as well. In fact it is.
> e.g. they often listen to each other's messages.

One problem we ran into was that there are only 32 multicast groups per 
netlink protocol family.

We had a situation where we could have used netlink, but we needed the 
equivalent of thousands of multicast groups.  Latency was very 
important, so we ended up doing essentially a multicast unix socket 
rather than taking the extra penalty for UDP multicast.

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 20:36       ` Chris Friesen
@ 2007-12-06 21:26         ` Andi Kleen
  2007-12-06 21:49           ` Chris Friesen
  2007-12-07  3:41         ` David Miller
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 21:26 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Andi Kleen, Renzo Davoli, linux-kernel

> Latency was very 
> important, so we ended up doing essentially a multicast unix socket 
> rather than taking the extra penalty for UDP multicast.

What extra penalty? Local UDP shouldn't be much more expensive than Unix.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 21:26         ` Andi Kleen
@ 2007-12-06 21:49           ` Chris Friesen
  2007-12-06 22:07             ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Friesen @ 2007-12-06 21:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Renzo Davoli, linux-kernel

Andi Kleen wrote:
>>Latency was very 
>>important, so we ended up doing essentially a multicast unix socket 
>>rather than taking the extra penalty for UDP multicast.
> 
> 
> What extra penalty? Local UDP shouldn't be much more expensive than Unix.

On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
datagram and a UDP datagram.

For UDP it has to go down the udp stack, then the ip stack, then through 
the routing tables and back up the receive side.

The unix socket just hashes to get the destination and delivers the message.

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 21:49           ` Chris Friesen
@ 2007-12-06 22:07             ` Andi Kleen
  2007-12-06 22:18               ` Renzo Davoli
  2007-12-06 23:02               ` Chris Friesen
  0 siblings, 2 replies; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 22:07 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Andi Kleen, Renzo Davoli, linux-kernel

On Thu, Dec 06, 2007 at 03:49:51PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> >>Latency was very 
> >>important, so we ended up doing essentially a multicast unix socket 
> >>rather than taking the extra penalty for UDP multicast.
> >
> >
> >What extra penalty? Local UDP shouldn't be much more expensive than Unix.
> 
> On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
> datagram and a UDP datagram.

That's weird.

> 
> For UDP it has to go down the udp stack, then the ip stack, then through 

UDP doesn't really have much stack. IP is also very little assuming
cached route (connect called first) 

I would expect the copies to dominate in both cases.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 22:07             ` Andi Kleen
@ 2007-12-06 22:18               ` Renzo Davoli
  2007-12-06 22:38                 ` Andi Kleen
  2007-12-06 23:02               ` Chris Friesen
  1 sibling, 1 reply; 31+ messages in thread
From: Renzo Davoli @ 2007-12-06 22:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Friesen, linux-kernel

Some more explanations trying to describe what IPN is and what it is
useful for.  We are writing the complete patch....

Summary:
* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.
* IPN itself is a *level 1* virtual physical network.  IPN services
* (like AF_UNIX) do not require root privileges.  TAP and GRAB are just
* extra features for for IPN deliverying Ethernet frames.
----

* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.

If you want you can call it Inter Process Bus Communication.  It is an
extension of AF_UNIX.  Comments saying that some services can be
implemented by using TCP-IP multicast protocols are unrelated to IPN.
All AF_UNIX services could be implemented as TCP-IP services on
127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet
headers, the stack would lose time to manage useless protocols.  If you
want just to send strings to set of local processes TCP-IP is an
overloading solution.  Even X-Window uses AF_UNIX sockets to talk with
local clients, it is a performance issue... I think Chris is right.

* IPN itself is a *level 1* virtual physical network.

Like any physical network you can run higher level protocols on it, thus
Ethernet, and then TCP-IP can be services you can run on IPN, but there
can be IPN networks running neither TCP-IP nor Ethernet.

* IPN services (like AF_UNIX) do not require root privileges.

There are many communication services where the user need broadcast or
p2p among user processes.  If a user (not root) wants to run several
User-Mode Linux, Qemu, Kvm VM the only way to have them connected
together is our Virtual Distributed Ethernet.  (For this reason VDE
exists in almost all the distros, it has been ported to other OSs, and
is already supported in the Linux Kernel for User-Mode Linux).  VDE is a
userland deamon, hence requires two context switches to deliver a
packet: VM1 -> K -> Daemon -> K -> VM2. Kvde running on IPN just one:
VM1 -> K ->VM2.  I think D-Bus can use IPN, too. The same cutoff of
context switches applies.  May I speculate that there will be a sensible
increase in performance?  *nix are multiuser. It means that there do
exist people that need to set up services without root access.  And even
if you have root access, the less you need to work as root, the safer is
you system.

* TAP and GRAB are just extra features for for IPN deliverying Ethernet frames.

Some IPN networks do use Ethernet as Data-Link protocol.  It is useful
to provide means to connect the IPN socket to a virtual (TAP) interface
or to a real (GRAB) interface.  I know that a lot of people use tap
interfaces, and the kernel bridge to connect Virtual Machines.  The
access can be resticted to some users or processes by itpables, but it
not as simple as a chmod to the socket.  A lot of people also use tunctl
to define a priori tap interfaces for users.  They must define as many
tuntap interfaces as the number of VM the users may want, each user has
his/her own taps.  Some users define a userland VDE switch to
interconnect their VM.  IPN itself could use a userland process to
define a standard TAP interface and loose its time and its cpu cycles to
move packets from tap to ipn and viceversa.  IPN is already kernel code
and then all its context switches and cpu cycles can be saved by
accessing the tap or grabbed interface diretly from the kernel.  (TAP
and GRAB obviously require CAP_NET_ADMIN).  Using IPN with TAP you can
define one single TAP interface connected to an IPN socket. Several VMs
can use that IPN socket, in this way the VMs are connected by a (virtual
ethernet) network which include the TAP interface.  The access control
to the network (and then to the TAP) is done by setting the permissions
to the socket.  Tunctl is *not* able to create a tap where all the users
belonging to a group can start their VM. IPN can.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 16:34         ` Andi Kleen
@ 2007-12-06 22:21           ` David Newall
  2007-12-06 22:42             ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: David Newall @ 2007-12-06 22:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kyle Moffett, Renzo Davoli, linux-kernel

Andi Kleen wrote:
>> Renzo also described something new (in the socket() arena): the 
>> multi-reader, multi-writer is just not available in IP.
>>     
>
> How is that different from a multicast group?
>   

Good question.  I don't know much about multicast IP.  It's a bit new 
for me.  I knew it uses Martian addresses!  After a little reading, I 
now know that it does allow many to many communication.

Renzo's IPN is a local protocol--you can't multicast to localhost.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 22:18               ` Renzo Davoli
@ 2007-12-06 22:38                 ` Andi Kleen
  2007-12-07  0:18                   ` Renzo Davoli
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 22:38 UTC (permalink / raw)
  To: Renzo Davoli; +Cc: Andi Kleen, Chris Friesen, linux-kernel

On Thu, Dec 06, 2007 at 11:18:37PM +0100, Renzo Davoli wrote:
> * IPN is for inter-process communication. It is *not* directly related 
> to TCP-IP or Ethernet.
> 
> If you want you can call it Inter Process Bus Communication.  It is an
> extension of AF_UNIX.  Comments saying that some services can be
> implemented by using TCP-IP multicast protocols are unrelated to IPN.
> All AF_UNIX services could be implemented as TCP-IP services on
> 127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
> TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet

No ethernet headers on localhost. Just to give you a perspective:
IP+TCP headers are 50 bytes (with timestamps) and IP+UDP is 28 bytes.
On the other hand the sk_buff+skb_shared_info header which are used for 
all socket communication in Linux and have to be mostly set up always
are 192+312bytes on 64bit [parts of the 312 bytes is an array that is 
typically only partly used] or 156+236 bytes on 32bit. So the network
headers dwarf the internal data structures.

There might be other reasons why TCP/IP is slower, but arguing 
with the size of the headers is just bogus.

My personal feeling would be that if TCP/IP is too slow for something
it is better to just improve the stack than to add a completely
new socket family. That will benefit much more applications without
requiring to change them.

About the only good reason to use UNIX sockets is when you need to use
file system permissions. 

> * IPN services (like AF_UNIX) do not require root privileges.
> 
> There are many communication services where the user need broadcast or
> p2p among user processes.  If a user (not root) wants to run several

IP Multicast when properly set up also doesn't need root.

Broadcast is kind of obsolete anyways.

> User-Mode Linux, Qemu, Kvm VM the only way to have them connected
> together is our Virtual Distributed Ethernet.  (For this reason VDE

They could easily just tunnel over a local multicast group for example.

-Andi


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 22:21           ` David Newall
@ 2007-12-06 22:42             ` Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 22:42 UTC (permalink / raw)
  To: David Newall; +Cc: Andi Kleen, Kyle Moffett, Renzo Davoli, linux-kernel

> Renzo's IPN is a local protocol--you can't multicast to localhost.

You don't need to. All local clients can join the same group without
using localhost.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 22:07             ` Andi Kleen
  2007-12-06 22:18               ` Renzo Davoli
@ 2007-12-06 23:02               ` Chris Friesen
  2007-12-06 23:06                 ` Andi Kleen
  1 sibling, 1 reply; 31+ messages in thread
From: Chris Friesen @ 2007-12-06 23:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Renzo Davoli, linux-kernel

Andi Kleen wrote:

>>On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
>>datagram and a UDP datagram.

> That's weird.

I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
increase.

For stream sockets, unix gives approximately a 62% bandwidth increase 
over tcp.   (Although tcp could probably be tuned to do better than this.)

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 23:02               ` Chris Friesen
@ 2007-12-06 23:06                 ` Andi Kleen
  2007-12-06 23:42                   ` Chris Friesen
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-06 23:06 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Andi Kleen, Renzo Davoli, linux-kernel

On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> 
> >>On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
> >>datagram and a UDP datagram.
> 
> >That's weird.
> 
> I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
> increase.

Sounds like something that should be looked into. I know of no
principal reasons for that.

> For stream sockets, unix gives approximately a 62% bandwidth increase 
> over tcp.   (Although tcp could probably be tuned to do better than this.)

How long a stream did you test? You might be measuring slow start.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 23:06                 ` Andi Kleen
@ 2007-12-06 23:42                   ` Chris Friesen
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Friesen @ 2007-12-06 23:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Renzo Davoli, linux-kernel

Andi Kleen wrote:
> On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:

>>I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
>>increase.

> Sounds like something that should be looked into. I know of no
> principal reasons for that.

>>For stream sockets, unix gives approximately a 62% bandwidth increase 
>>over tcp.   (Although tcp could probably be tuned to do better than this.)

> How long a stream did you test? You might be measuring slow start.

No idea.  These are just the standard local networking tests in lmbench 
v2.  In our case the latency was the big concern and we were using 
datagrams anyway.

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 22:38                 ` Andi Kleen
@ 2007-12-07  0:18                   ` Renzo Davoli
  0 siblings, 0 replies; 31+ messages in thread
From: Renzo Davoli @ 2007-12-07  0:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Friesen, linux-kernel

I have done some raw tests.
(you can read the code here: http://www.cs.unibo.it/~renzo/rawperftest/)

The programs are quite simple. The sender sends "Hello World" as fast as it
can, while the receiver prints time() for each 1 million message
received.

On my laptop, tests on 20000000 "Hello World" packets, 

One receiver:
multicast	244,000 msg/sec
IPN             333,000 msg/sec  (36% faster)

Two receivers:
multicast       174,000 msg/sec
IPN             250,000 msg/sec  (43% faster)

Apart from this, how could I implement policies over a multicast socket,
e.g. how does a Kernel VDE_switch work on multicast sockets?

If I send an ethernet packet over a multicast socket it can emulate just a
hub (Although it seems to me quite innatural to have to have TCP-UDP 
over IP over Ethernet over UDP over IP - okay we can skip the Ethernet 
on localhost, long ethernet frames get fragmentated but... details).

On multicast socket you cannot use policies, I mean a IPN network (or
bus or group) can have a policy reading some info on the packet to
decide the set of receipients.
For a vde_switch it is the destination mac address when found in the
MAC hash table to select the receipient port. For midi communication it 
could be the channel number....

Moving the switching fabric to the userland the performance figures are
quite different.

renzo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-06 20:36       ` Chris Friesen
  2007-12-06 21:26         ` Andi Kleen
@ 2007-12-07  3:41         ` David Miller
  2007-12-07  4:21           ` Chris Friesen
  1 sibling, 1 reply; 31+ messages in thread
From: David Miller @ 2007-12-07  3:41 UTC (permalink / raw)
  To: cfriesen; +Cc: andi, renzo, linux-kernel

From: "Chris Friesen" <cfriesen@nortel.com>
Date: Thu, 06 Dec 2007 14:36:54 -0600

> One problem we ran into was that there are only 32 multicast groups per 
> netlink protocol family.

I'm pretty sure we've removed this limitation.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-07  3:41         ` David Miller
@ 2007-12-07  4:21           ` Chris Friesen
  2007-12-07  4:54             ` Ben Pfaff
  2007-12-07  6:40             ` David Miller
  0 siblings, 2 replies; 31+ messages in thread
From: Chris Friesen @ 2007-12-07  4:21 UTC (permalink / raw)
  To: David Miller; +Cc: andi, renzo, linux-kernel

David Miller wrote:
> From: "Chris Friesen" <cfriesen@nortel.com>
> Date: Thu, 06 Dec 2007 14:36:54 -0600
> 
> 
>>One problem we ran into was that there are only 32 multicast groups per 
>>netlink protocol family.
> 
> 
> I'm pretty sure we've removed this limitation.

As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
Also, it appears that only root is allowed to use multicast netlink.

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-07  4:21           ` Chris Friesen
@ 2007-12-07  4:54             ` Ben Pfaff
  2007-12-07  6:40             ` David Miller
  1 sibling, 0 replies; 31+ messages in thread
From: Ben Pfaff @ 2007-12-07  4:54 UTC (permalink / raw)
  To: linux-kernel

"Chris Friesen" <cfriesen@nortel.com> writes:

> David Miller wrote:
>> From: "Chris Friesen" <cfriesen@nortel.com>
>>> One problem we ran into was that there are only 32 multicast groups
>>> per netlink protocol family.
>> I'm pretty sure we've removed this limitation.
> As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per
> group. 

Use setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, ...) to
join an arbitrary Netlink multicast group.
-- 
"A computer is a state machine.
 Threads are for people who cant [sic] program state machines."
--Alan Cox


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-07  4:21           ` Chris Friesen
  2007-12-07  4:54             ` Ben Pfaff
@ 2007-12-07  6:40             ` David Miller
  2007-12-07 10:03               ` Andi Kleen
  2007-12-10 16:05               ` New Address Family: Inter Process Networking (IPN) Chris Friesen
  1 sibling, 2 replies; 31+ messages in thread
From: David Miller @ 2007-12-07  6:40 UTC (permalink / raw)
  To: cfriesen; +Cc: andi, renzo, linux-kernel

From: "Chris Friesen" <cfriesen@nortel.com>
Date: Thu, 06 Dec 2007 22:21:39 -0600

> David Miller wrote:
> > From: "Chris Friesen" <cfriesen@nortel.com>
> > Date: Thu, 06 Dec 2007 14:36:54 -0600
> > 
> > 
> >>One problem we ran into was that there are only 32 multicast groups per 
> >>netlink protocol family.
> > 
> > 
> > I'm pretty sure we've removed this limitation.
> 
> As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
> Also, it appears that only root is allowed to use multicast netlink.

The kernel supports much more than 32 groups, see nlk->groups which is
a bitmap which can be sized to arbitrary sizes.  nlk->nl_groups is
for backwards compatability only.

netlink_change_ngroups() does the bitmap resizing when necessary.

The root multicast listening restriction can be relaxed in some
circumstances, whatever is needed to fill your needs.

Stop making excuses, with minor adjustments we have the facilities to
meet your needs.  There is no need for yet-another-protocol to do what
you're trying to do, we already have too much duplicated
functionality.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-07  6:40             ` David Miller
@ 2007-12-07 10:03               ` Andi Kleen
  2007-12-07 21:18                 ` AF_IPN: Inter Process Networking, try these Renzo Davoli
  2007-12-10 16:05               ` New Address Family: Inter Process Networking (IPN) Chris Friesen
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-12-07 10:03 UTC (permalink / raw)
  To: David Miller; +Cc: cfriesen, andi, renzo, linux-kernel

> Stop making excuses, with minor adjustments we have the facilities to
> meet your needs.  There is no need for yet-another-protocol to do what

I suspect they would be better of just using IP multicast. But the localhost 
latency penalty vs Unix Chris was talking about probably needs to be investigated.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* AF_IPN: Inter Process Networking, try these...
  2007-12-07 10:03               ` Andi Kleen
@ 2007-12-07 21:18                 ` Renzo Davoli
  2007-12-08  2:07                   ` David Miller
  0 siblings, 1 reply; 31+ messages in thread
From: Renzo Davoli @ 2007-12-07 21:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, cfriesen, linux-kernel

Andi, David,

I disagree. If you suspect we would be better using IP multicast, I think
your suspects are not supported.
Try the following exercises, please.... Can you provide better solutions
without IPN?

	renzo

Exercise #1.
I am a user (NOT ROOT), I like kvm, qemu etc. I want an efficient network
between my VM.

My solution:
I Create a IPN socket, with protocol IPN_VDESWITCH and all the VM can
communicate.

Your solution:
- I am condamned by two kernel developers to run the switch in the userland 
- I beg the sysadm to give me some pre-allocated taps connected together
by a kernel bridge.
- I create a multicast socket limited to this host (TTL=0) and I use it
like a hub. It cannot switch the packets.                               

Exercise #2.
I am a sysadm (maybe a lab administrator). I want my users (not root)
of the group "vmenabled" to run their VM connected to a network. 
I have hundreds of users in vmenabled(say students).

My Solution:
I create a IPN socket, with protocol IPN_VDESWITCH, connected to a virtual
interface say ipn0. I give to the socket permission 760 owner
root:vmenabled.

Your solution:
- I am condamned by two kernel developers to run the switch in the userland
- I create a multicast socket connected to a tap and then I define iptables
filters to avoid unauthorized users to join the net.
- I create hundreds of preallocated tap interfaces, at least one per user.

Exercise #3.
I am a user (NOT ROOT) and I have a heavy stream of *very private data* 
generated by some processes that must be received by several processes.
I am looking for an efficient solution.
Data can be ASCII strings, or a binary stream.
It is not a "networking" issue, it is just IPC.

My solution.
I Create a IPN socket with permission 700, IPN_BROADCAST protocol. All 
the processes connect to the socket either for writing or for reading (or both).

Your solution:
- I am condamned by two kernel developers to use userland inefficient
solutions like named pipes, tee, or a user daemon among AF_UNIX sockets.
- If I use multicast, others can read the stream.
(security by obscurity? the attacker do not know the address?)
- I use a multicast socket with SSL (it sounds funny to use encryption
  to talk with myself, exposing the stream to crypto attack).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: AF_IPN: Inter Process Networking, try these...
  2007-12-07 21:18                 ` AF_IPN: Inter Process Networking, try these Renzo Davoli
@ 2007-12-08  2:07                   ` David Miller
  0 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2007-12-08  2:07 UTC (permalink / raw)
  To: renzo; +Cc: andi, cfriesen, linux-kernel

From: renzo@cs.unibo.it (Renzo Davoli)
Date: Fri, 7 Dec 2007 22:18:05 +0100

> I disagree. If you suspect we would be better using IP multicast, I think
> your suspects are not supported.
> Try the following exercises, please.... Can you provide better solutions
> without IPN?

I personally have not purely advocated IP, although the performance
differences UDP and AF_UNIX should be investigated.

Instead I advocated using AF_NETLINK with some minor multicast
permission modifications to suit your needs.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: New Address Family: Inter Process Networking (IPN)
  2007-12-07  6:40             ` David Miller
  2007-12-07 10:03               ` Andi Kleen
@ 2007-12-10 16:05               ` Chris Friesen
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Friesen @ 2007-12-10 16:05 UTC (permalink / raw)
  To: David Miller; +Cc: andi, renzo, linux-kernel

David Miller wrote:

> The kernel supports much more than 32 groups, see nlk->groups which is
> a bitmap which can be sized to arbitrary sizes.  nlk->nl_groups is
> for backwards compatability only.
> 
> netlink_change_ngroups() does the bitmap resizing when necessary.

Thanks for the explanation.  Given that it's a bitmap doesn't that 
result in a cost of O(number of groups) when processing messages?  In 
our case we need potentially thousands of groups.

> The root multicast listening restriction can be relaxed in some
> circumstances, whatever is needed to fill your needs.

Also, good to know.

> Stop making excuses, with minor adjustments we have the facilities to
> meet your needs.  There is no need for yet-another-protocol to do what
> you're trying to do, we already have too much duplicated
> functionality.

You may have confused me with the OP...I just chimed in because of some 
of the limitations we found when we wanted to do similar things.  In our 
case we created a new unix-like protocol to allow multicast, and have 
been using it for a few years.

However, if we could use netlink instead in our next release that would 
be a good thing.  A couple questions:

1) Is it possible to register to receive all netlink messages for a 
particular netlink family?  This is useful for debugging--it allows a 
tcpdump equivalent.

2) Is there any up-to-date netlink programming guide?  I found this one:

http://people.redhat.com/nhorman/papers/netlink.pdf

but it's three years old now.


Thanks,

Chris

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-12-10 16:06 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-05 16:40 New Address Family: Inter Process Networking (IPN) Renzo Davoli
2007-12-05 21:55 ` Stephen Hemminger
2007-12-06  5:38   ` Renzo Davoli
2007-12-06  5:43     ` Renzo Davoli
2007-12-06  6:04     ` Stephen Hemminger
2007-12-05 23:39 ` Andi Kleen
2007-12-06  5:30   ` Renzo Davoli
2007-12-06  6:19     ` Kyle Moffett
2007-12-06  6:59       ` David Newall
2007-12-06 16:34         ` Andi Kleen
2007-12-06 22:21           ` David Newall
2007-12-06 22:42             ` Andi Kleen
2007-12-06 16:35     ` Andi Kleen
2007-12-06 20:36       ` Chris Friesen
2007-12-06 21:26         ` Andi Kleen
2007-12-06 21:49           ` Chris Friesen
2007-12-06 22:07             ` Andi Kleen
2007-12-06 22:18               ` Renzo Davoli
2007-12-06 22:38                 ` Andi Kleen
2007-12-07  0:18                   ` Renzo Davoli
2007-12-06 23:02               ` Chris Friesen
2007-12-06 23:06                 ` Andi Kleen
2007-12-06 23:42                   ` Chris Friesen
2007-12-07  3:41         ` David Miller
2007-12-07  4:21           ` Chris Friesen
2007-12-07  4:54             ` Ben Pfaff
2007-12-07  6:40             ` David Miller
2007-12-07 10:03               ` Andi Kleen
2007-12-07 21:18                 ` AF_IPN: Inter Process Networking, try these Renzo Davoli
2007-12-08  2:07                   ` David Miller
2007-12-10 16:05               ` New Address Family: Inter Process Networking (IPN) Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).