socket.c added support for unix domain socket datagram transport

All of lore.kernel.org
 help / color / mirror / Atom feed

* socket.c added support for unix domain socket datagram transport
@ 2021-04-23  6:56 Ralph Schmieder
  2021-04-23  9:16 ` Daniel P. Berrangé
  2021-04-23 15:29 ` Stefano Brivio
  0 siblings, 2 replies; 15+ messages in thread
From: Ralph Schmieder @ 2021-04-23  6:56 UTC (permalink / raw)
  To: qemu-devel

Hey...  new to this list.  I was looking for a way to use Unix domain sockets as a network transport between local VMs.

I'm part of a team where we run dozens if not hundreds of VMs on a single compute instance which are highly interconnected.

In the current implementation, I use UDP sockets (e.g. something like 

-netdev id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 

which works great.

The downside of this approach is that I need to keep track of all the UDP ports in use and that there's a potential for clashes.  Clearly, having Unix domain sockets where I could store the sockets in the VM's directory would be much easier to manage.

However, even though there is some AF_UNIX support in net/socket.c, it's

- not configurable
- it doesn't work

As a side note, I tried to pass in an already open FD, but that didn't work either.

So, I added some code which does work for me... e.g.

- can specify the socket paths like -netdev id=bla,type=socket,unix=/tmp/in:/tmp/out
- it does forward packets between two Qemu instances running back-to-back

I'm wondering if this is of interest for the wider community and, if so, how to proceed.

Thanks,
-ralph

Commit https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23  6:56 socket.c added support for unix domain socket datagram transport Ralph Schmieder
@ 2021-04-23  9:16 ` Daniel P. Berrangé
  2021-04-23 13:38   ` Ralph Schmieder
  2021-04-23 15:29 ` Stefano Brivio
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2021-04-23  9:16 UTC (permalink / raw)
  To: Ralph Schmieder; +Cc: qemu-devel

On Fri, Apr 23, 2021 at 08:56:48AM +0200, Ralph Schmieder wrote:
> Hey...  new to this list.  I was looking for a way to use Unix domain sockets as a network transport between local VMs.
> 
> I'm part of a team where we run dozens if not hundreds of VMs on a single compute instance which are highly interconnected.
> 
> In the current implementation, I use UDP sockets (e.g. something like 
> 
> -netdev id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> 
> which works great.
> 
> The downside of this approach is that I need to keep track of all the UDP ports in use and that there's a potential for clashes.  Clearly, having Unix domain sockets where I could store the sockets in the VM's directory would be much easier to manage.
> 
> However, even though there is some AF_UNIX support in net/socket.c, it's
> 
> - not configurable
> - it doesn't work
> 
> As a side note, I tried to pass in an already open FD, but that didn't work either.
> 
> So, I added some code which does work for me... e.g.
> 
> - can specify the socket paths like -netdev id=bla,type=socket,unix=/tmp/in:/tmp/out
> - it does forward packets between two Qemu instances running back-to-back
> 
> I'm wondering if this is of interest for the wider community and, if so, how to proceed.

As a general rule, any place in QEMU that supports sockets, ought to
support all of IPv4, IPv6 and UNIX sockets.   Where there are gaps
it generally just needs someone motivated to provide a patch.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23  9:16 ` Daniel P. Berrangé
@ 2021-04-23 13:38   ` Ralph Schmieder
  0 siblings, 0 replies; 15+ messages in thread
From: Ralph Schmieder @ 2021-04-23 13:38 UTC (permalink / raw)
  To: "Daniel P. Berrangé"; +Cc: qemu-devel



> On Apr 23, 2021, at 11:16, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Fri, Apr 23, 2021 at 08:56:48AM +0200, Ralph Schmieder wrote:
>> Hey...  new to this list.  I was looking for a way to use Unix domain sockets as a network transport between local VMs.
>> 
>> I'm part of a team where we run dozens if not hundreds of VMs on a single compute instance which are highly interconnected.
>> 
>> In the current implementation, I use UDP sockets (e.g. something like 
>> 
>> -netdev id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
>> 
>> which works great.
>> 
>> The downside of this approach is that I need to keep track of all the UDP ports in use and that there's a potential for clashes.  Clearly, having Unix domain sockets where I could store the sockets in the VM's directory would be much easier to manage.
>> 
>> However, even though there is some AF_UNIX support in net/socket.c, it's
>> 
>> - not configurable
>> - it doesn't work
>> 
>> As a side note, I tried to pass in an already open FD, but that didn't work either.
>> 
>> So, I added some code which does work for me... e.g.
>> 
>> - can specify the socket paths like -netdev id=bla,type=socket,unix=/tmp/in:/tmp/out
>> - it does forward packets between two Qemu instances running back-to-back
>> 
>> I'm wondering if this is of interest for the wider community and, if so, how to proceed.
> 
> As a general rule, any place in QEMU that supports sockets, ought to
> support all of IPv4, IPv6 and UNIX sockets.   Where there are gaps
> it generally just needs someone motivated to provide a patch.

OK, great... The code basically works afaict, so I am wondering if you guys are OK with the CLI "unix=path1:path2" approach and also whether patches are sent to this list or whether it's better to open a PR on github/gitlab?


Thanks,
-ralph



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23  6:56 socket.c added support for unix domain socket datagram transport Ralph Schmieder
  2021-04-23  9:16 ` Daniel P. Berrangé
@ 2021-04-23 15:29 ` Stefano Brivio
  2021-04-23 15:48   ` Ralph Schmieder
  2021-04-23 16:21   ` Daniel P. Berrangé
  1 sibling, 2 replies; 15+ messages in thread
From: Stefano Brivio @ 2021-04-23 15:29 UTC (permalink / raw)
  To: Ralph Schmieder; +Cc: Daniel P. Berrangé, qemu-devel

Hi Ralph,

On Fri, 23 Apr 2021 08:56:48 +0200
Ralph Schmieder <ralph.schmieder@gmail.com> wrote:

> Hey...  new to this list.  I was looking for a way to use Unix domain
> sockets as a network transport between local VMs.
> 
> I'm part of a team where we run dozens if not hundreds of VMs on a
> single compute instance which are highly interconnected.
> 
> In the current implementation, I use UDP sockets (e.g. something like 
> 
> -netdev
> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> 
> which works great.
> 
> The downside of this approach is that I need to keep track of all the
> UDP ports in use and that there's a potential for clashes.  Clearly,
> having Unix domain sockets where I could store the sockets in the
> VM's directory would be much easier to manage.
> 
> However, even though there is some AF_UNIX support in net/socket.c,
> it's
> 
> - not configurable
> - it doesn't work

I hate to say this, but I've been working on something very similar
just in the past days, with the notable difference that I'm using
stream-oriented AF_UNIX sockets instead of datagram-oriented.

I have a similar use case, and after some experiments I picked a
stream-oriented socket over datagram-oriented because:

- overhead appears to be the same

- message boundaries make little sense here -- you already have a
  32-bit vnet header with the message size defining the message
  boundaries

- datagram-oriented AF_UNIX sockets are actually reliable and there's
  no packet reordering on Linux, but this is not "formally" guaranteed

- it's helpful for me to know when a qemu instance disconnects for any
  reason

> As a side note, I tried to pass in an already open FD, but that
> didn't work either.

This actually worked for me as a quick work-around, either with:
	https://github.com/StevenVanAcker/udstools

or with a trivial C implementation of that, that does essentially:

	fd = strtol(argv[1], NULL, 0);
	if (fd < 3 || fd > INT_MAX || errno)
		usage(argv[0]);

	s = socket(AF_UNIX, SOCK_STREAM, 0);
	if (s < 0) {
		perror("socket");
		exit(EXIT_FAILURE);
	}

	if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
		perror("connect");
		exit(EXIT_FAILURE);
	}

	if (dup2(s, (int)fd) < 0) {
		perror("dup");
		exit(EXIT_FAILURE);
	}

	close(s);

	execvp(argv[2], argv + 2);
	perror("execvp");

where argv[1] is the socket number you pass in the qemu command line
(-net socket,fd=X) and argv[2] is the path to qemu.

> So, I added some code which does work for me... e.g.
> 
> - can specify the socket paths like -netdev
> id=bla,type=socket,unix=/tmp/in:/tmp/out
> - it does forward packets between two Qemu instances running
> back-to-back
> 
> I'm wondering if this is of interest for the wider community and, if
> so, how to proceed.
> 
> Thanks,
> -ralph
> 
> Commit
> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362

I think your patch could be a bit simpler, as you could mostly reuse
net_socket_udp_init() for your initialisation, and perhaps rename it to
net_socket_dgram_init().

Anyway, I wonder, would my approach work for you instead? I'm posting my
patch in a minute.

-- 
Stefano



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 15:29 ` Stefano Brivio
@ 2021-04-23 15:48   ` Ralph Schmieder
  2021-04-23 16:39     ` Stefano Brivio
  2021-04-23 16:21   ` Daniel P. Berrangé
  1 sibling, 1 reply; 15+ messages in thread
From: Ralph Schmieder @ 2021-04-23 15:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: "Daniel P. Berrangé", qemu-devel

Hi, Stefano... Thanks for the detailed response... inline
Thanks,
-ralph


> On Apr 23, 2021, at 17:29, Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> Hi Ralph,
> 
> On Fri, 23 Apr 2021 08:56:48 +0200
> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> 
>> Hey...  new to this list.  I was looking for a way to use Unix domain
>> sockets as a network transport between local VMs.
>> 
>> I'm part of a team where we run dozens if not hundreds of VMs on a
>> single compute instance which are highly interconnected.
>> 
>> In the current implementation, I use UDP sockets (e.g. something like 
>> 
>> -netdev
>> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
>> 
>> which works great.
>> 
>> The downside of this approach is that I need to keep track of all the
>> UDP ports in use and that there's a potential for clashes.  Clearly,
>> having Unix domain sockets where I could store the sockets in the
>> VM's directory would be much easier to manage.
>> 
>> However, even though there is some AF_UNIX support in net/socket.c,
>> it's
>> 
>> - not configurable
>> - it doesn't work
> 
> I hate to say this, but I've been working on something very similar
> just in the past days, with the notable difference that I'm using
> stream-oriented AF_UNIX sockets instead of datagram-oriented.
> 
> I have a similar use case, and after some experiments I picked a
> stream-oriented socket over datagram-oriented because:
> 
> - overhead appears to be the same
> 
> - message boundaries make little sense here -- you already have a
>  32-bit vnet header with the message size defining the message
>  boundaries
> 
> - datagram-oriented AF_UNIX sockets are actually reliable and there's
>  no packet reordering on Linux, but this is not "formally" guaranteed
> 
> - it's helpful for me to know when a qemu instance disconnects for any
>  reason
> 

IMO, dgram is the right tool for this as it is symmetrical to using a UDP transport... Since I need to pick up those packets from outside of Qemu (inside of a custom networking fabric) I'd have to make assumptions about the data and don't know the length of the packet in advance. Using the datagram approach fits nicely into this concept.  So, yes, in my instance the transport IS truly connectionless and VMs just keep sending packets if the fabric isn't there or doesn't pick up their packets.

And maybe there's use for both, as there's currently already support for connection oriented (TCP) and connectionless (UDP) inet transports. 

>> As a side note, I tried to pass in an already open FD, but that
>> didn't work either.
> 
> This actually worked for me as a quick work-around, either with:
> 	https://github.com/StevenVanAcker/udstools
> 
> or with a trivial C implementation of that, that does essentially:
> 
> 	fd = strtol(argv[1], NULL, 0);
> 	if (fd < 3 || fd > INT_MAX || errno)
> 		usage(argv[0]);
> 
> 	s = socket(AF_UNIX, SOCK_STREAM, 0);
> 	if (s < 0) {
> 		perror("socket");
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
> 		perror("connect");
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (dup2(s, (int)fd) < 0) {
> 		perror("dup");
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	close(s);
> 
> 	execvp(argv[2], argv + 2);
> 	perror("execvp");
> 
> where argv[1] is the socket number you pass in the qemu command line
> (-net socket,fd=X) and argv[2] is the path to qemu.
> 


As I was looking for dgram support I didn't even try with a stream socket ;)


>> So, I added some code which does work for me... e.g.
>> 
>> - can specify the socket paths like -netdev
>> id=bla,type=socket,unix=/tmp/in:/tmp/out
>> - it does forward packets between two Qemu instances running
>> back-to-back
>> 
>> I'm wondering if this is of interest for the wider community and, if
>> so, how to proceed.
>> 
>> Thanks,
>> -ralph
>> 
>> Commit
>> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362
> 
> I think your patch could be a bit simpler, as you could mostly reuse
> net_socket_udp_init() for your initialisation, and perhaps rename it to
> net_socket_dgram_init().

Thanks... I agree that my code can likely be shortened... it was just a PoC that I cobbled together yesterday and it still has a lot of to-be-removed lines.


> Anyway, I wonder, would my approach work for you instead? I'm posting my
> patch in a minute.
> 
> -- 
> Stefano
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 15:29 ` Stefano Brivio
  2021-04-23 15:48   ` Ralph Schmieder
@ 2021-04-23 16:21   ` Daniel P. Berrangé
  2021-04-23 16:54     ` Stefano Brivio
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2021-04-23 16:21 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Ralph Schmieder, qemu-devel

On Fri, Apr 23, 2021 at 05:29:40PM +0200, Stefano Brivio wrote:
> Hi Ralph,
> 
> On Fri, 23 Apr 2021 08:56:48 +0200
> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> 
> > Hey...  new to this list.  I was looking for a way to use Unix domain
> > sockets as a network transport between local VMs.
> > 
> > I'm part of a team where we run dozens if not hundreds of VMs on a
> > single compute instance which are highly interconnected.
> > 
> > In the current implementation, I use UDP sockets (e.g. something like 
> > 
> > -netdev
> > id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> > 
> > which works great.
> > 
> > The downside of this approach is that I need to keep track of all the
> > UDP ports in use and that there's a potential for clashes.  Clearly,
> > having Unix domain sockets where I could store the sockets in the
> > VM's directory would be much easier to manage.
> > 
> > However, even though there is some AF_UNIX support in net/socket.c,
> > it's
> > 
> > - not configurable
> > - it doesn't work
> 
> I hate to say this, but I've been working on something very similar
> just in the past days, with the notable difference that I'm using
> stream-oriented AF_UNIX sockets instead of datagram-oriented.
> 
> I have a similar use case, and after some experiments I picked a
> stream-oriented socket over datagram-oriented because:
> 
> - overhead appears to be the same
> 
> - message boundaries make little sense here -- you already have a
>   32-bit vnet header with the message size defining the message
>   boundaries
> 
> - datagram-oriented AF_UNIX sockets are actually reliable and there's
>   no packet reordering on Linux, but this is not "formally" guaranteed
> 
> - it's helpful for me to know when a qemu instance disconnects for any
>   reason

The current IP socket impl for the net socket backend uses SOCK_DGRAM,
so from a consistency POV it feels sensible todo the same for UNIX
sockets too.

None the less, your last point in particular about wanting to know
about disconnects feels valid, and if its useful to you for UNIX
sockets, then it ought to be useful for IP sockets too.

IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
to match current behaviour, but then also add a CLI option that allows
choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 15:48   ` Ralph Schmieder
@ 2021-04-23 16:39     ` Stefano Brivio
  2021-04-26 11:14       ` Ralph Schmieder
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2021-04-23 16:39 UTC (permalink / raw)
  To: Ralph Schmieder; +Cc: Daniel P. Berrangé, qemu-devel

On Fri, 23 Apr 2021 17:48:08 +0200
Ralph Schmieder <ralph.schmieder@gmail.com> wrote:

> Hi, Stefano... Thanks for the detailed response... inline
> Thanks,
> -ralph
> 
> 
> > On Apr 23, 2021, at 17:29, Stefano Brivio <sbrivio@redhat.com>
> > wrote:
> > 
> > Hi Ralph,
> > 
> > On Fri, 23 Apr 2021 08:56:48 +0200
> > Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> >   
> >> Hey...  new to this list.  I was looking for a way to use Unix
> >> domain sockets as a network transport between local VMs.
> >> 
> >> I'm part of a team where we run dozens if not hundreds of VMs on a
> >> single compute instance which are highly interconnected.
> >> 
> >> In the current implementation, I use UDP sockets (e.g. something
> >> like 
> >> 
> >> -netdev
> >> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> >> 
> >> which works great.
> >> 
> >> The downside of this approach is that I need to keep track of all
> >> the UDP ports in use and that there's a potential for clashes.
> >> Clearly, having Unix domain sockets where I could store the
> >> sockets in the VM's directory would be much easier to manage.
> >> 
> >> However, even though there is some AF_UNIX support in net/socket.c,
> >> it's
> >> 
> >> - not configurable
> >> - it doesn't work  
> > 
> > I hate to say this, but I've been working on something very similar
> > just in the past days, with the notable difference that I'm using
> > stream-oriented AF_UNIX sockets instead of datagram-oriented.
> > 
> > I have a similar use case, and after some experiments I picked a
> > stream-oriented socket over datagram-oriented because:
> > 
> > - overhead appears to be the same
> > 
> > - message boundaries make little sense here -- you already have a
> >  32-bit vnet header with the message size defining the message
> >  boundaries
> > 
> > - datagram-oriented AF_UNIX sockets are actually reliable and
> > there's no packet reordering on Linux, but this is not "formally"
> > guaranteed
> > 
> > - it's helpful for me to know when a qemu instance disconnects for
> > any reason
> >   
> 
> IMO, dgram is the right tool for this as it is symmetrical to using a
> UDP transport... Since I need to pick up those packets from outside
> of Qemu (inside of a custom networking fabric) I'd have to make
> assumptions about the data and don't know the length of the packet in
> advance.

Okay, so it doesn't seem to fit your case, but this specific point is
where you actually have a small advantage using a stream-oriented
socket. If you receive a packet and have a smaller receive buffer, you
can read the length of the packet from the vnet header and then read
the rest of the packet at a later time.

With a datagram-oriented socket, you would need to know the maximum
packet size in advance, and use a receive buffer that's large enough to
contain it, because if you don't, you'll discard data.

The same reasoning applies to a receive buffer that's larger than the
maximum packet size you can get -- you can then read multiple packets at
a time, filling your buffer, partially reading a packet at the end of
it, and reading the rest later.

With a datagram-oriented socket you need to resort to recvmmsg() to
receive multiple packets with one syscall (nothing against it, it's
just slightly more tedious).

> Using the datagram approach fits nicely into this concept.
> So, yes, in my instance the transport IS truly connectionless and VMs
> just keep sending packets if the fabric isn't there or doesn't pick
> up their packets.

I see, in that case I guess you really need a datagram-oriented
socket... even though what happens with my patch (just like with the
existing TCP support) is that your fabric would need to be there when
qemu starts, but if it disappears later, qemu will simply close the
socket. Indeed, it's not "hotplug", which is probably what you need.

> And maybe there's use for both, as there's currently already support
> for connection oriented (TCP) and connectionless (UDP) inet
> transports. 

Yes, I think so.

> >> As a side note, I tried to pass in an already open FD, but that
> >> didn't work either.  
> > 
> > This actually worked for me as a quick work-around, either with:
> > 	https://github.com/StevenVanAcker/udstools
> > 
> > or with a trivial C implementation of that, that does essentially:
> > 
> > 	fd = strtol(argv[1], NULL, 0);
> > 	if (fd < 3 || fd > INT_MAX || errno)
> > 		usage(argv[0]);
> > 
> > 	s = socket(AF_UNIX, SOCK_STREAM, 0);
> > 	if (s < 0) {
> > 		perror("socket");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
> >		perror("connect");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	if (dup2(s, (int)fd) < 0) {
> > 		perror("dup");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	close(s);
> > 
> > 	execvp(argv[2], argv + 2);
> > 	perror("execvp");
> > 
> > where argv[1] is the socket number you pass in the qemu command line
> > (-net socket,fd=X) and argv[2] is the path to qemu.
> 
> As I was looking for dgram support I didn't even try with a stream
> socket ;)

Mind that it also works with a SOCK_DGRAM ;) ...that was my original
attempt, actually.

> >> So, I added some code which does work for me... e.g.
> >> 
> >> - can specify the socket paths like -netdev
> >> id=bla,type=socket,unix=/tmp/in:/tmp/out
> >> - it does forward packets between two Qemu instances running
> >> back-to-back
> >> 
> >> I'm wondering if this is of interest for the wider community and,
> >> if so, how to proceed.
> >> 
> >> Thanks,
> >> -ralph
> >> 
> >> Commit
> >> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362
> >>  
> > 
> > I think your patch could be a bit simpler, as you could mostly reuse
> > net_socket_udp_init() for your initialisation, and perhaps rename
> > it to net_socket_dgram_init().  
> 
> Thanks... I agree that my code can likely be shortened... it was just
> a PoC that I cobbled together yesterday and it still has a lot of
> to-be-removed lines.

I'm not sure if it helps, but I guess you could "conceptually" recycle
my patch and in some sense "extend" the UDP parts to a generic datagram
interface, just like mine extends the TCP implementation to a generic
stream interface.

About command line and documentation, I guess it's clear that
"connect=" implies something stream-oriented, so I would prefer to
leave it like that for a stream-oriented AF_UNIX socket -- it behaves
just like TCP.

On the other hand, you can't recycle the current UDP "mcast=" stuff
because it's not multicast (AF_UNIX multicast support for Linux was
proposed some years ago, https://lwn.net/Articles/482523/, but not
merged), and of course not "udp="... would "unix_dgram=" make sense
to you?

On a side note, I wonder why you need two named sockets instead of
one -- I mean, they're bidirectional...

-- 
Stefano

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 16:21   ` Daniel P. Berrangé
@ 2021-04-23 16:54     ` Stefano Brivio
  2021-04-26 12:05       ` Ralph Schmieder
  2021-04-26 12:56       ` Daniel P. Berrangé
  0 siblings, 2 replies; 15+ messages in thread
From: Stefano Brivio @ 2021-04-23 16:54 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Ralph Schmieder, qemu-devel

On Fri, 23 Apr 2021 17:21:38 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Fri, Apr 23, 2021 at 05:29:40PM +0200, Stefano Brivio wrote:
> > Hi Ralph,
> > 
> > On Fri, 23 Apr 2021 08:56:48 +0200
> > Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> >   
> > > Hey...  new to this list.  I was looking for a way to use Unix domain
> > > sockets as a network transport between local VMs.
> > > 
> > > I'm part of a team where we run dozens if not hundreds of VMs on a
> > > single compute instance which are highly interconnected.
> > > 
> > > In the current implementation, I use UDP sockets (e.g. something like 
> > > 
> > > -netdev
> > > id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
> > > 
> > > which works great.
> > > 
> > > The downside of this approach is that I need to keep track of all the
> > > UDP ports in use and that there's a potential for clashes.  Clearly,
> > > having Unix domain sockets where I could store the sockets in the
> > > VM's directory would be much easier to manage.
> > > 
> > > However, even though there is some AF_UNIX support in net/socket.c,
> > > it's
> > > 
> > > - not configurable
> > > - it doesn't work  
> > 
> > I hate to say this, but I've been working on something very similar
> > just in the past days, with the notable difference that I'm using
> > stream-oriented AF_UNIX sockets instead of datagram-oriented.
> > 
> > I have a similar use case, and after some experiments I picked a
> > stream-oriented socket over datagram-oriented because:
> > 
> > - overhead appears to be the same
> > 
> > - message boundaries make little sense here -- you already have a
> >   32-bit vnet header with the message size defining the message
> >   boundaries
> > 
> > - datagram-oriented AF_UNIX sockets are actually reliable and there's
> >   no packet reordering on Linux, but this is not "formally" guaranteed
> > 
> > - it's helpful for me to know when a qemu instance disconnects for any
> >   reason  
> 
> The current IP socket impl for the net socket backend uses SOCK_DGRAM,
> so from a consistency POV it feels sensible todo the same for UNIX
> sockets too.

That's just for UDP though -- it also supports TCP with the "connect="
parameter, and given that a stream-oriented AF_UNIX socket behaves very
similarly, I recycled that parameter and just extended that bit of
documentation.

> None the less, your last point in particular about wanting to know
> about disconnects feels valid, and if its useful to you for UNIX
> sockets, then it ought to be useful for IP sockets too.
> 
> IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
> to match current behaviour, but then also add a CLI option that allows
> choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.

The choice would only apply to AF_UNIX (that is, not to TCP and UDP).

The current situation isn't entirely consistent, because for TCP you
have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
"connect=" case to support stream-oriented AF_UNIX, which I think is
consistent.

However, to have it symmetric for the datagram-oriented case
(UDP and AF_UNIX), ideally it should be changed to
"dgram=host:port|path" -- which I guess we can't do.

I see two alternatives:

1.
  - "connect=" (TCP only)
  - "unix=path,type=dgram|stream"
  - "udp=" (UDP only)

2.
  - "connect=" (TCP and AF_UNIX stream)
  - "unix_dgram="
  - "udp=" (UDP only)

The major thing I like of 2. is that we save some code and a further
option, but other than that I don't have a strong preference.

-- 
Stefano



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 16:39     ` Stefano Brivio
@ 2021-04-26 11:14       ` Ralph Schmieder
  2021-04-27 21:51         ` Stefano Brivio
  0 siblings, 1 reply; 15+ messages in thread
From: Ralph Schmieder @ 2021-04-26 11:14 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: "Daniel P. Berrangé", qemu-devel



> On Apr 23, 2021, at 18:39, Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> On Fri, 23 Apr 2021 17:48:08 +0200
> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
> 
>> Hi, Stefano... Thanks for the detailed response... inline
>> Thanks,
>> -ralph
>> 
>> 
>>> On Apr 23, 2021, at 17:29, Stefano Brivio <sbrivio@redhat.com>
>>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> On Fri, 23 Apr 2021 08:56:48 +0200
>>> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
>>> 
>>>> Hey...  new to this list.  I was looking for a way to use Unix
>>>> domain sockets as a network transport between local VMs.
>>>> 
>>>> I'm part of a team where we run dozens if not hundreds of VMs on a
>>>> single compute instance which are highly interconnected.
>>>> 
>>>> In the current implementation, I use UDP sockets (e.g. something
>>>> like 
>>>> 
>>>> -netdev
>>>> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
>>>> 
>>>> which works great.
>>>> 
>>>> The downside of this approach is that I need to keep track of all
>>>> the UDP ports in use and that there's a potential for clashes.
>>>> Clearly, having Unix domain sockets where I could store the
>>>> sockets in the VM's directory would be much easier to manage.
>>>> 
>>>> However, even though there is some AF_UNIX support in net/socket.c,
>>>> it's
>>>> 
>>>> - not configurable
>>>> - it doesn't work  
>>> 
>>> I hate to say this, but I've been working on something very similar
>>> just in the past days, with the notable difference that I'm using
>>> stream-oriented AF_UNIX sockets instead of datagram-oriented.
>>> 
>>> I have a similar use case, and after some experiments I picked a
>>> stream-oriented socket over datagram-oriented because:
>>> 
>>> - overhead appears to be the same
>>> 
>>> - message boundaries make little sense here -- you already have a
>>> 32-bit vnet header with the message size defining the message
>>> boundaries
>>> 
>>> - datagram-oriented AF_UNIX sockets are actually reliable and
>>> there's no packet reordering on Linux, but this is not "formally"
>>> guaranteed
>>> 
>>> - it's helpful for me to know when a qemu instance disconnects for
>>> any reason
>>> 
>> 
>> IMO, dgram is the right tool for this as it is symmetrical to using a
>> UDP transport... Since I need to pick up those packets from outside
>> of Qemu (inside of a custom networking fabric) I'd have to make
>> assumptions about the data and don't know the length of the packet in
>> advance.
> 
> Okay, so it doesn't seem to fit your case, but this specific point is
> where you actually have a small advantage using a stream-oriented
> socket. If you receive a packet and have a smaller receive buffer, you
> can read the length of the packet from the vnet header and then read
> the rest of the packet at a later time.
> 
> With a datagram-oriented socket, you would need to know the maximum
> packet size in advance, and use a receive buffer that's large enough to
> contain it, because if you don't, you'll discard data.

For me, the maximum packet size is a jumbo frame (e.g. 9x1024) anyway -- everything must fit into an atomic write of that size.

> 
> The same reasoning applies to a receive buffer that's larger than the
> maximum packet size you can get -- you can then read multiple packets at
> a time, filling your buffer, partially reading a packet at the end of
> it, and reading the rest later.
> 
> With a datagram-oriented socket you need to resort to recvmmsg() to
> receive multiple packets with one syscall (nothing against it, it's
> just slightly more tedious).
> 
>> Using the datagram approach fits nicely into this concept.
>> So, yes, in my instance the transport IS truly connectionless and VMs
>> just keep sending packets if the fabric isn't there or doesn't pick
>> up their packets.
> 
> I see, in that case I guess you really need a datagram-oriented
> socket... even though what happens with my patch (just like with the
> existing TCP support) is that your fabric would need to be there when
> qemu starts, but if it disappears later, qemu will simply close the
> socket. Indeed, it's not "hotplug", which is probably what you need.

That's the point.  This is peer-to-peer/point-to-point and not client/server.

> 
>> And maybe there's use for both, as there's currently already support
>> for connection oriented (TCP) and connectionless (UDP) inet
>> transports. 
> 
> Yes, I think so.
> 
>>>> As a side note, I tried to pass in an already open FD, but that
>>>> didn't work either.  
>>> 
>>> This actually worked for me as a quick work-around, either with:
>>> 	https://github.com/StevenVanAcker/udstools
>>> 
>>> or with a trivial C implementation of that, that does essentially:
>>> 
>>> 	fd = strtol(argv[1], NULL, 0);
>>> 	if (fd < 3 || fd > INT_MAX || errno)
>>> 		usage(argv[0]);
>>> 
>>> 	s = socket(AF_UNIX, SOCK_STREAM, 0);
>>> 	if (s < 0) {
>>> 		perror("socket");
>>> 		exit(EXIT_FAILURE);
>>> 	}
>>> 
>>> 	if (connect(s, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
>>> 		perror("connect");
>>> 		exit(EXIT_FAILURE);
>>> 	}
>>> 
>>> 	if (dup2(s, (int)fd) < 0) {
>>> 		perror("dup");
>>> 		exit(EXIT_FAILURE);
>>> 	}
>>> 
>>> 	close(s);
>>> 
>>> 	execvp(argv[2], argv + 2);
>>> 	perror("execvp");
>>> 
>>> where argv[1] is the socket number you pass in the qemu command line
>>> (-net socket,fd=X) and argv[2] is the path to qemu.
>> 
>> As I was looking for dgram support I didn't even try with a stream
>> socket ;)
> 
> Mind that it also works with a SOCK_DGRAM ;) ...that was my original
> attempt, actually.
> 
>>>> So, I added some code which does work for me... e.g.
>>>> 
>>>> - can specify the socket paths like -netdev
>>>> id=bla,type=socket,unix=/tmp/in:/tmp/out
>>>> - it does forward packets between two Qemu instances running
>>>> back-to-back
>>>> 
>>>> I'm wondering if this is of interest for the wider community and,
>>>> if so, how to proceed.
>>>> 
>>>> Thanks,
>>>> -ralph
>>>> 
>>>> Commit
>>>> https://github.com/rschmied/qemu/commit/73f02114e718ec898c7cd8e855d0d5d5d7abe362
>>>> 
>>> 
>>> I think your patch could be a bit simpler, as you could mostly reuse
>>> net_socket_udp_init() for your initialisation, and perhaps rename
>>> it to net_socket_dgram_init().  
>> 
>> Thanks... I agree that my code can likely be shortened... it was just
>> a PoC that I cobbled together yesterday and it still has a lot of
>> to-be-removed lines.
> 
> I'm not sure if it helps, but I guess you could "conceptually" recycle
> my patch and in some sense "extend" the UDP parts to a generic datagram
> interface, just like mine extends the TCP implementation to a generic
> stream interface.
> 
> About command line and documentation, I guess it's clear that
> "connect=" implies something stream-oriented, so I would prefer to
> leave it like that for a stream-oriented AF_UNIX socket -- it behaves
> just like TCP.
> 
> On the other hand, you can't recycle the current UDP "mcast=" stuff
> because it's not multicast (AF_UNIX multicast support for Linux was
> proposed some years ago, https://lwn.net/Articles/482523/, but not
> merged), and of course not "udp="... would "unix_dgram=" make sense
> to you?
> 
> On a side note, I wonder why you need two named sockets instead of
> one -- I mean, they're bidirectional...


Hmm... each peer needs to send unsolicited frames/packets to the other end... and thus needs to bind to their socket.  Pretty much for the same reason as the UDP transport requires you to specify a local and a remote 5-tuple.  Even though for AF_INET, the local port does not have to be specified, the OS would assign an ephemeral port to make it unique. Am I missing something?

Another thing: on Windows, there's a AF_UNIX/SOCK_STREAM implementation... So, technically it should be possible to use that code path on Windows, too.  Not a windows guy, though... So, can't say whether it would simply work or not:

https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/



> 
> -- 
> Stefano
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 16:54     ` Stefano Brivio
@ 2021-04-26 12:05       ` Ralph Schmieder
  2021-04-26 12:56       ` Daniel P. Berrangé
  1 sibling, 0 replies; 15+ messages in thread
From: Ralph Schmieder @ 2021-04-26 12:05 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: "Daniel P. Berrangé", qemu-devel



> On Apr 23, 2021, at 18:54, Stefano Brivio <sbrivio@redhat.com> wrote:
> 
> On Fri, 23 Apr 2021 17:21:38 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
>> On Fri, Apr 23, 2021 at 05:29:40PM +0200, Stefano Brivio wrote:
>>> Hi Ralph,
>>> 
>>> On Fri, 23 Apr 2021 08:56:48 +0200
>>> Ralph Schmieder <ralph.schmieder@gmail.com> wrote:
>>> 
>>>> Hey...  new to this list.  I was looking for a way to use Unix domain
>>>> sockets as a network transport between local VMs.
>>>> 
>>>> I'm part of a team where we run dozens if not hundreds of VMs on a
>>>> single compute instance which are highly interconnected.
>>>> 
>>>> In the current implementation, I use UDP sockets (e.g. something like 
>>>> 
>>>> -netdev
>>>> id=bla,type=socket,udp=localhost:1234,localaddr=localhost:5678) 
>>>> 
>>>> which works great.
>>>> 
>>>> The downside of this approach is that I need to keep track of all the
>>>> UDP ports in use and that there's a potential for clashes.  Clearly,
>>>> having Unix domain sockets where I could store the sockets in the
>>>> VM's directory would be much easier to manage.
>>>> 
>>>> However, even though there is some AF_UNIX support in net/socket.c,
>>>> it's
>>>> 
>>>> - not configurable
>>>> - it doesn't work  
>>> 
>>> I hate to say this, but I've been working on something very similar
>>> just in the past days, with the notable difference that I'm using
>>> stream-oriented AF_UNIX sockets instead of datagram-oriented.
>>> 
>>> I have a similar use case, and after some experiments I picked a
>>> stream-oriented socket over datagram-oriented because:
>>> 
>>> - overhead appears to be the same
>>> 
>>> - message boundaries make little sense here -- you already have a
>>>  32-bit vnet header with the message size defining the message
>>>  boundaries
>>> 
>>> - datagram-oriented AF_UNIX sockets are actually reliable and there's
>>>  no packet reordering on Linux, but this is not "formally" guaranteed
>>> 
>>> - it's helpful for me to know when a qemu instance disconnects for any
>>>  reason  
>> 
>> The current IP socket impl for the net socket backend uses SOCK_DGRAM,
>> so from a consistency POV it feels sensible todo the same for UNIX
>> sockets too.
> 
> That's just for UDP though -- it also supports TCP with the "connect="
> parameter, and given that a stream-oriented AF_UNIX socket behaves very
> similarly, I recycled that parameter and just extended that bit of
> documentation.
> 
>> None the less, your last point in particular about wanting to know
>> about disconnects feels valid, and if its useful to you for UNIX
>> sockets, then it ought to be useful for IP sockets too.
>> 
>> IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
>> to match current behaviour, but then also add a CLI option that allows
>> choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.
> 
> The choice would only apply to AF_UNIX (that is, not to TCP and UDP).
> 
> The current situation isn't entirely consistent, because for TCP you
> have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
> "connect=" case to support stream-oriented AF_UNIX, which I think is
> consistent.
> 
> However, to have it symmetric for the datagram-oriented case
> (UDP and AF_UNIX), ideally it should be changed to
> "dgram=host:port|path" -- which I guess we can't do.

That's what I thought, too. It would be canonical but would break backwards compatibility... hence impossible to change the existing parameter from udp to dgram.

However, an alternative would be to use the dgram approach for both UDP and UDS dgram and keep the UDP option for backwards compatibility.

The code would have to ensure that the dgram param type matches the localaddr param type and that localaddr has to be present if dgram is provided (as with UDP currently).


> 
> I see two alternatives:
> 
> 1.
>  - "connect=" (TCP only)
>  - "unix=path,type=dgram|stream"
>  - "udp=" (UDP only)
> 
> 2.
>  - "connect=" (TCP and AF_UNIX stream)
>  - "unix_dgram="
>  - "udp=" (UDP only)
> 

> The major thing I like of 2. is that we save some code and a further
> option, but other than that I don't have a strong preference.
> 
> -- 
> Stefano



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-23 16:54     ` Stefano Brivio
  2021-04-26 12:05       ` Ralph Schmieder
@ 2021-04-26 12:56       ` Daniel P. Berrangé
  2021-04-27 21:52         ` Stefano Brivio
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2021-04-26 12:56 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Ralph Schmieder, qemu-devel, Markus Armbruster

On Fri, Apr 23, 2021 at 06:54:08PM +0200, Stefano Brivio wrote:
> On Fri, 23 Apr 2021 17:21:38 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > The current IP socket impl for the net socket backend uses SOCK_DGRAM,
> > so from a consistency POV it feels sensible todo the same for UNIX
> > sockets too.
> 
> That's just for UDP though -- it also supports TCP with the "connect="
> parameter, and given that a stream-oriented AF_UNIX socket behaves very
> similarly, I recycled that parameter and just extended that bit of
> documentation.
> 
> > None the less, your last point in particular about wanting to know
> > about disconnects feels valid, and if its useful to you for UNIX
> > sockets, then it ought to be useful for IP sockets too.
> > 
> > IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
> > to match current behaviour, but then also add a CLI option that allows
> > choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.
> 
> The choice would only apply to AF_UNIX (that is, not to TCP and UDP).
> 
> The current situation isn't entirely consistent, because for TCP you
> have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
> "connect=" case to support stream-oriented AF_UNIX, which I think is
> consistent.
> 
> However, to have it symmetric for the datagram-oriented case
> (UDP and AF_UNIX), ideally it should be changed to
> "dgram=host:port|path" -- which I guess we can't do.
> 
> I see two alternatives:
> 
> 1.
>   - "connect=" (TCP only)
>   - "unix=path,type=dgram|stream"
>   - "udp=" (UDP only)

This doesn't work when you need the UNIX server to be a
listener socket, as we've no way to express that, without
adding yet another parameter.

> 2.
>   - "connect=" (TCP and AF_UNIX stream)
>   - "unix_dgram="
>   - "udp=" (UDP only)

Also needs

   "listen=" (TCP and AF_UNIX stream)

"udp" has a corresponding optional "localaddr" for the sending
address.

Also overloading "connect" means we have to parse the value
to guess whether its a UNIX path or a IP addr:port pair.

I doubt people will have UNIX paths called "127.0.0.1:3333"
but if we can avoid such ambiguity by design, it is better.

> The major thing I like of 2. is that we save some code and a further
> option, but other than that I don't have a strong preference.

The pain we're feeling is largely because the design of the net
option syntax is one of the oldest parts of QEMU and has only
been very partially improved upon. It is certainly not using
QAPI best practice, if we look at this:

  { 'struct': 'NetdevSocketOptions',
    'data': {
      '*fd':        'str',
      '*listen':    'str',
      '*connect':   'str',
      '*mcast':     'str',
      '*localaddr': 'str',
      '*udp':       'str' } }

Then some things come to mind

 - We're not provinding a way to say what kind of "fd" is
   passed in - is it a UDP/TCP FD, is it a listener or
   client FD, is it unicast or multicast FD. Though QEMU
   can interogate the socket to discover this I guess.

 - All of the properties there except "fd" are encoding two values
   in a single property - address + port. This is an anti-pattern

 - No support for ipv4=on|off and ipv6=on|off flags to control
   dual-stack usage.

 - Redundancy of separate parameters for "mcast" and "udp" when
   it is distinguishable based on the address given AFAIR.

 - No support for UNIX sockets


The "right" way to fix most of this long term is a radical change
to introduce use of the SocketAddress struct.

I could envisage something like this

  { 'enum': 'NetdevSocketMode',
    'data':  ['dgram', 'client', 'server'] }

  { 'struct': 'NetdevSocketOptions',
    'data': {
      'addr':      'SocketAddress',
      '*localaddr': 'SocketAddress',
      '*mode':      'NetdevSocketMode' } }


 - A TCP client

      addr.type = inet
      addr.host = 192.168.1.1
      mode = client

 - A TCP server on all interfaces

      addr.type = inet
      addr.host = 
      mode = server

 - A TCP server on a specific interface

      addr.type = inet
      addr.host = 192.168.1.1
      mode = server

 - A TCP server on all interfaces, without IPv4

      addr.type = inet
      addr.host = 
      addr.has_ipv4 = true
      addr.ipv4 = false
      mode = server

 - A UDP unicast transport

      addr.type = inet
      addr.host = 192.168.1.1
      mode = dgram

 - A UDP unicast transport with local addr

      addr.type = inet
      addr.host = 192.168.1.1
      localaddr.type = inet
      localaddr.host = 192.168.1.2
      mode = dgram

 - A UDP multicast transport

     addr.type = inet
     addr.host = 224.0.23.166
     mode = dgram

 - A UNIX stream client

      addr.type = unix
      addr.path = /some/socket
      mode = client

 - A UNIX stream server

      addr.type = unix
      addr.path = /some/socket
      mode = server

 - A UNIX dgram transport

      addr.type = unix
      addr.path = /some/socket
      mode = dgram


Now, of course you're just interested in adding UNIX socket support.

This design I've outlined above is very much "boiling the ocean".
Thus I'm not requesting you implement this, unless you have a strong
desire to get heavily involved in some QEMU refactoring work.

The key question is whether we try to graft UNIX socket support onto
the existing args ("connect"/"listen") or try to do something more
explicit.

Given the desire to have both dgram + stream support, I'd be inclined
to do something more explicit, that's slightly more aligned with a
possible future best praactice QAPI design


IOW, we could take a simplified variant of the above as follows:


  { 'enum': 'NetdevSocketMode',
    'data':  ['dgram', 'client', 'server'] }

  { 'struct': 'NetdevSocketOptions',
    'data': {
      '*fd':        'str',
      '*listen':    'str',
      '*connect':   'str',
      '*mcast':     'str',
      '*localaddr': 'str',
      '*udp':       'str',
      '*path':      'str' } }
      '*localpath': 'str' } }


Cli examples:

 * Unix stream client

  -netdev socket,path=/wibble,mode=client


 * Unix stream server
 
  -netdev socket,path=/wibble,mode=server

 * Unix datagram 

  -netdev socket,path=/wibble,mode=dgram
  -netdev socket,path=/wibble,localpath=/fish,mode=dgram




Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-26 11:14       ` Ralph Schmieder
@ 2021-04-27 21:51         ` Stefano Brivio
  0 siblings, 0 replies; 15+ messages in thread
From: Stefano Brivio @ 2021-04-27 21:51 UTC (permalink / raw)
  To: Ralph Schmieder; +Cc: Daniel P. Berrangé, qemu-devel

On Mon, 26 Apr 2021 13:14:48 +0200
Ralph Schmieder <ralph.schmieder@gmail.com> wrote:

> > On Apr 23, 2021, at 18:39, Stefano Brivio <sbrivio@redhat.com>
> > wrote:
> > 
> > [...]
> >
> > Okay, so it doesn't seem to fit your case, but this specific point
> > is where you actually have a small advantage using a stream-oriented
> > socket. If you receive a packet and have a smaller receive buffer,
> > you can read the length of the packet from the vnet header and then
> > read the rest of the packet at a later time.
> > 
> > With a datagram-oriented socket, you would need to know the maximum
> > packet size in advance, and use a receive buffer that's large
> > enough to contain it, because if you don't, you'll discard data.  
> 
> For me, the maximum packet size is a jumbo frame (e.g. 9x1024) anyway
> -- everything must fit into an atomic write of that size.

Well, the day you want to do some batching... ;) but sure, I see your
point.

> > [...]
> > 
> > On a side note, I wonder why you need two named sockets instead of
> > one -- I mean, they're bidirectional...  
> 
> Hmm... each peer needs to send unsolicited frames/packets to the
> other end... and thus needs to bind to their socket.  Pretty much for
> the same reason as the UDP transport requires you to specify a local
> and a remote 5-tuple.  Even though for AF_INET, the local port does
> not have to be specified, the OS would assign an ephemeral port to
> make it unique. Am I missing something?

I see your point now. Well, I think it's different from the AF_INET case
due to the way AF_UNIX works: UNIX domain sockets don't necessarily
need to make the endpoint known or visible, see a more detailed
explanation at:
	https://comp.unix.admin.narkive.com/AhAOKP1s/lsof-find-both-endpoints-of-a-unix-socket

Even though, nowadays on Linux:

$ nc -luU my_path & (sleep 1; nc.openbsd -uU my_path & lsof +E -aUc nc)
[1] 373285
COMMAND      PID    USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
nc        373285 sbrivio    3u  unix 0x000000004076431a      0t0 3957568 my_path type=DGRAM ->INO=3956394 373288,nc.openbs,4u
nc.openbs 373288 sbrivio    4u  unix 0x00000000f5b2e2e1      0t0 3956394 /tmp/nc.XXXXC0whUu type=DGRAM ->INO=3957568 373285,nc,3u

for datagram sockets, the endpoint is exported, and lsof can report that
the endpoint for "my_path" here (-luU binds to a UNIX domain datagram
socket, -uU connects to it). With a stream socket, by the way:

$ nc -lU my_path & (sleep 1; nc.openbsd -U my_path & lsof +E -aUc nc)
[1] 375445
COMMAND      PID    USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
nc        375445 sbrivio    3u  unix 0x0000000053abf57c      0t0 3969787 my_path type=STREAM
nc        375445 sbrivio    4u  unix 0x000000001960c1ef      0t0 3969788 my_path type=STREAM ->INO=3970624 375448,nc.openbs,3u
nc.openbs 375448 sbrivio    3u  unix 0x000000000538fa63      0t0 3970624 type=STREAM ->INO=3969788 375445,nc,4u

so I think it should be optional. Even with datagram sockets, just like
the example above (I'm not suggesting that you do this, it's just
another possible choice), only one peer needs to bind to a named
socket, and yet they can exchange data.

> Another thing: on Windows, there's a AF_UNIX/SOCK_STREAM
> implementation... So, technically it should be possible to use that
> code path on Windows, too.  Not a windows guy, though... So, can't
> say whether it would simply work or not:
> 
> https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

Thanks for the pointer. I can't test this, so I wouldn't remove that
#ifndef, but perhaps I could add a link to this, in case somebody needs
it and stumbles upon this code path.

-- 
Stefano



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-26 12:56       ` Daniel P. Berrangé
@ 2021-04-27 21:52         ` Stefano Brivio
  2021-04-28  9:02           ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Brivio @ 2021-04-27 21:52 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Ralph Schmieder, qemu-devel, Markus Armbruster

On Mon, 26 Apr 2021 13:56:40 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Fri, Apr 23, 2021 at 06:54:08PM +0200, Stefano Brivio wrote:
> > On Fri, 23 Apr 2021 17:21:38 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:  
> > > The current IP socket impl for the net socket backend uses SOCK_DGRAM,
> > > so from a consistency POV it feels sensible todo the same for UNIX
> > > sockets too.  
> > 
> > That's just for UDP though -- it also supports TCP with the "connect="
> > parameter, and given that a stream-oriented AF_UNIX socket behaves very
> > similarly, I recycled that parameter and just extended that bit of
> > documentation.
> >   
> > > None the less, your last point in particular about wanting to know
> > > about disconnects feels valid, and if its useful to you for UNIX
> > > sockets, then it ought to be useful for IP sockets too.
> > > 
> > > IOW, I wonder if  we should use DGRAM for UNIX sockets too by default
> > > to match current behaviour, but then also add a CLI option that allows
> > > choice of DGRAM vs STREAM, and wire that up for IP & UNIX sockets.  
> > 
> > The choice would only apply to AF_UNIX (that is, not to TCP and UDP).
> > 
> > The current situation isn't entirely consistent, because for TCP you
> > have "connect=", for UDP it's "udp=" or "mcast=", and I'm extending the
> > "connect=" case to support stream-oriented AF_UNIX, which I think is
> > consistent.
> > 
> > However, to have it symmetric for the datagram-oriented case
> > (UDP and AF_UNIX), ideally it should be changed to
> > "dgram=host:port|path" -- which I guess we can't do.
> > 
> > I see two alternatives:
> > 
> > 1.
> >   - "connect=" (TCP only)
> >   - "unix=path,type=dgram|stream"
> >   - "udp=" (UDP only)  
> 
> This doesn't work when you need the UNIX server to be a
> listener socket, as we've no way to express that, without
> adding yet another parameter.

Ah, right.

> > 2.
> >   - "connect=" (TCP and AF_UNIX stream)
> >   - "unix_dgram="
> >   - "udp=" (UDP only)  
> 
> Also needs
> 
>    "listen=" (TCP and AF_UNIX stream)

Yes, I forgot about this here, but it's actually already in my patch
(see the changes to net_socket_listen_init() and documentation).

> "udp" has a corresponding optional "localaddr" for the sending
> address.
> 
> Also overloading "connect" means we have to parse the value
> to guess whether its a UNIX path or a IP addr:port pair.
> 
> I doubt people will have UNIX paths called "127.0.0.1:3333"
> but if we can avoid such ambiguity by design, it is better.

Agreed... I didn't actually consider that.

> > The major thing I like of 2. is that we save some code and a further
> > option, but other than that I don't have a strong preference.  
> 
> The pain we're feeling is largely because the design of the net
> option syntax is one of the oldest parts of QEMU and has only
> been very partially improved upon. It is certainly not using
> QAPI best practice, if we look at this:
> 
>   { 'struct': 'NetdevSocketOptions',
>     'data': {
>       '*fd':        'str',
>       '*listen':    'str',
>       '*connect':   'str',
>       '*mcast':     'str',
>       '*localaddr': 'str',
>       '*udp':       'str' } }
> 
> Then some things come to mind
> 
>  - We're not provinding a way to say what kind of "fd" is
>    passed in - is it a UDP/TCP FD, is it a listener or
>    client FD, is it unicast or multicast FD. Though QEMU
>    can interogate the socket to discover this I guess.

Some form of probing was already added in commit 894022e61601 ("net:
check if the file descriptor is valid before using it"). Does qemu need
to care, though, once the socket is connected? That is, what would it
do with that information?

>  - All of the properties there except "fd" are encoding two values
>    in a single property - address + port. This is an anti-pattern
> 
>  - No support for ipv4=on|off and ipv6=on|off flags to control
>    dual-stack usage.

I wonder if this needs to be explicit -- it might simply derive from
addressing.

>  - Redundancy of separate parameters for "mcast" and "udp" when
>    it is distinguishable based on the address given AFAIR.

Strictly speaking, for IPv4, addresses are reserved for multicast usage
(by RFC 5771), but as far as I can tell, also other addresses could
legitimately be used for multicast. I've never seen that in practice
and it's unlikely to be of any use, though.

For IPv6, things seem to be defined more strictly (RFC 5771 and
updates).

All in all, yes, I guess inferring this from the address would make the
usage more practical.

>  - No support for UNIX sockets
> 
> 
> The "right" way to fix most of this long term is a radical change
> to introduce use of the SocketAddress struct.
> 
> I could envisage something like this
> 
>   { 'enum': 'NetdevSocketMode',
>     'data':  ['dgram', 'client', 'server'] }
> 
>   { 'struct': 'NetdevSocketOptions',
>     'data': {
>       'addr':      'SocketAddress',
>       '*localaddr': 'SocketAddress',
>       '*mode':      'NetdevSocketMode' } }
> 
> 
>  - A TCP client
> 
>       addr.type = inet
>       addr.host = 192.168.1.1
>       mode = client
>
>  - A TCP server on all interfaces
> 
>       addr.type = inet
>       addr.host = 
>       mode = server
> 
>  - A TCP server on a specific interface
> 
>       addr.type = inet
>       addr.host = 192.168.1.1
>       mode = server
> 
>  - A TCP server on all interfaces, without IPv4
> 
>       addr.type = inet
>       addr.host = 
>       addr.has_ipv4 = true
>       addr.ipv4 = false
>       mode = server

...perhaps it would be more consistent with other consolidated
practices to have addr.type = inet | inet6... and perhaps call it
addr.family.

Also, for "mode" (that could be called "type" to reflect
parameters for socket(2)), we should probably support SOCK_SEQPACKET as
well ("seq"?).

>  - A UDP unicast transport
> 
>       addr.type = inet
>       addr.host = 192.168.1.1
>       mode = dgram
> 
>  - A UDP unicast transport with local addr
> 
>       addr.type = inet
>       addr.host = 192.168.1.1
>       localaddr.type = inet
>       localaddr.host = 192.168.1.2
>       mode = dgram
> 
>  - A UDP multicast transport
> 
>      addr.type = inet
>      addr.host = 224.0.23.166
>      mode = dgram
> 
>  - A UNIX stream client
> 
>       addr.type = unix
>       addr.path = /some/socket
>       mode = client
> 
>  - A UNIX stream server
> 
>       addr.type = unix
>       addr.path = /some/socket
>       mode = server
> 
>  - A UNIX dgram transport
> 
>       addr.type = unix
>       addr.path = /some/socket
>       mode = dgram
> 
> 
> Now, of course you're just interested in adding UNIX socket support.
> 
> This design I've outlined above is very much "boiling the ocean".
> Thus I'm not requesting you implement this, unless you have a strong
> desire to get heavily involved in some QEMU refactoring work.

I don't really have a strong desire to do that, but to my naive eyes it
doesn't look that complicated, I'll give it a try.

> The key question is whether we try to graft UNIX socket support onto
> the existing args ("connect"/"listen") or try to do something more
> explicit.
> 
> Given the desire to have both dgram + stream support, I'd be inclined
> to do something more explicit, that's slightly more aligned with a
> possible future best praactice QAPI design
> 
> 
> IOW, we could take a simplified variant of the above as follows:
> 
> 
>   { 'enum': 'NetdevSocketMode',
>     'data':  ['dgram', 'client', 'server'] }
> 
>   { 'struct': 'NetdevSocketOptions',
>     'data': {
>       '*fd':        'str',
>       '*listen':    'str',
>       '*connect':   'str',
>       '*mcast':     'str',
>       '*localaddr': 'str',
>       '*udp':       'str',
>       '*path':      'str' } }
>       '*localpath': 'str' } }
> 
> 
> Cli examples:
> 
>  * Unix stream client
> 
>   -netdev socket,path=/wibble,mode=client
> 
>  * Unix stream server
>  
>   -netdev socket,path=/wibble,mode=server
> 
>  * Unix datagram 
> 
>   -netdev socket,path=/wibble,mode=dgram
>   -netdev socket,path=/wibble,localpath=/fish,mode=dgram

I think this looks reasonable, I'm not sure about "localpath",
because also /wibble is local in some sense.

I don't know what would be a good implementation practice to keep the
current options available -- should this be done with some new code
that applies a translation to the new parameters?

-- 
Stefano



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-27 21:52         ` Stefano Brivio
@ 2021-04-28  9:02           ` Daniel P. Berrangé
  2021-04-29 12:07             ` Markus Armbruster
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2021-04-28  9:02 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Ralph Schmieder, qemu-devel, Markus Armbruster

On Tue, Apr 27, 2021 at 11:52:29PM +0200, Stefano Brivio wrote:
> On Mon, 26 Apr 2021 13:56:40 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > The pain we're feeling is largely because the design of the net
> > option syntax is one of the oldest parts of QEMU and has only
> > been very partially improved upon. It is certainly not using
> > QAPI best practice, if we look at this:
> > 
> >   { 'struct': 'NetdevSocketOptions',
> >     'data': {
> >       '*fd':        'str',
> >       '*listen':    'str',
> >       '*connect':   'str',
> >       '*mcast':     'str',
> >       '*localaddr': 'str',
> >       '*udp':       'str' } }
> > 
> > Then some things come to mind
> > 
> >  - We're not provinding a way to say what kind of "fd" is
> >    passed in - is it a UDP/TCP FD, is it a listener or
> >    client FD, is it unicast or multicast FD. Though QEMU
> >    can interogate the socket to discover this I guess.
> 
> Some form of probing was already added in commit 894022e61601 ("net:
> check if the file descriptor is valid before using it"). Does qemu need
> to care, though, once the socket is connected? That is, what would it
> do with that information?

The only thing it really has to care about is the distinction between
a listener socket and a data socket.

> >  - All of the properties there except "fd" are encoding two values
> >    in a single property - address + port. This is an anti-pattern
> > 
> >  - No support for ipv4=on|off and ipv6=on|off flags to control
> >    dual-stack usage.
> 
> I wonder if this needs to be explicit -- it might simply derive from
> addressing.

This is explicitly everywhere we use sockets in QEMU - it is part
of the SocketAddress QAPI schema.

Consider an address "::", this is an IPv6 address, but depending on
how you configure the socket it can accept either IPv6-only or both
IPv6 and IPv4 incoming connections.

If passing a hostname instead of a raw address, then  the ipv4/ipv6
flags control whether we use all the returned DNS entries.

> > The "right" way to fix most of this long term is a radical change
> > to introduce use of the SocketAddress struct.
> > 
> > I could envisage something like this
> > 
> >   { 'enum': 'NetdevSocketMode',
> >     'data':  ['dgram', 'client', 'server'] }
> > 
> >   { 'struct': 'NetdevSocketOptions',
> >     'data': {
> >       'addr':      'SocketAddress',
> >       '*localaddr': 'SocketAddress',
> >       '*mode':      'NetdevSocketMode' } }
> > 
> > 
> >  - A TCP client
> > 
> >       addr.type = inet
> >       addr.host = 192.168.1.1
> >       mode = client
> >
> >  - A TCP server on all interfaces
> > 
> >       addr.type = inet
> >       addr.host = 
> >       mode = server
> > 
> >  - A TCP server on a specific interface
> > 
> >       addr.type = inet
> >       addr.host = 192.168.1.1
> >       mode = server
> > 
> >  - A TCP server on all interfaces, without IPv4
> > 
> >       addr.type = inet
> >       addr.host = 
> >       addr.has_ipv4 = true
> >       addr.ipv4 = false
> >       mode = server
> 
> ...perhaps it would be more consistent with other consolidated
> practices to have addr.type = inet | inet6... and perhaps call it
> addr.family.
>
> Also, for "mode" (that could be called "type" to reflect
> parameters for socket(2)), we should probably support SOCK_SEQPACKET as
> well ("seq"?).

The naming I use here is determined by the QAPI 'SocketAddress'
struct which has a 'type' field, and separate 'ipv4' and 'ipv6'
flags.

> 
> >  - A UDP unicast transport
> > 
> >       addr.type = inet
> >       addr.host = 192.168.1.1
> >       mode = dgram
> > 
> >  - A UDP unicast transport with local addr
> > 
> >       addr.type = inet
> >       addr.host = 192.168.1.1
> >       localaddr.type = inet
> >       localaddr.host = 192.168.1.2
> >       mode = dgram
> > 
> >  - A UDP multicast transport
> > 
> >      addr.type = inet
> >      addr.host = 224.0.23.166
> >      mode = dgram
> > 
> >  - A UNIX stream client
> > 
> >       addr.type = unix
> >       addr.path = /some/socket
> >       mode = client
> > 
> >  - A UNIX stream server
> > 
> >       addr.type = unix
> >       addr.path = /some/socket
> >       mode = server
> > 
> >  - A UNIX dgram transport
> > 
> >       addr.type = unix
> >       addr.path = /some/socket
> >       mode = dgram
> > 
> > 
> > Now, of course you're just interested in adding UNIX socket support.
> > 
> > This design I've outlined above is very much "boiling the ocean".
> > Thus I'm not requesting you implement this, unless you have a strong
> > desire to get heavily involved in some QEMU refactoring work.
> 
> I don't really have a strong desire to do that, but to my naive eyes it
> doesn't look that complicated, I'll give it a try.

The hard bit is always the backwards compatibility for existing usage....


> > The key question is whether we try to graft UNIX socket support onto
> > the existing args ("connect"/"listen") or try to do something more
> > explicit.
> > 
> > Given the desire to have both dgram + stream support, I'd be inclined
> > to do something more explicit, that's slightly more aligned with a
> > possible future best praactice QAPI design
> > 
> > 
> > IOW, we could take a simplified variant of the above as follows:
> > 
> > 
> >   { 'enum': 'NetdevSocketMode',
> >     'data':  ['dgram', 'client', 'server'] }
> > 
> >   { 'struct': 'NetdevSocketOptions',
> >     'data': {
> >       '*fd':        'str',
> >       '*listen':    'str',
> >       '*connect':   'str',
> >       '*mcast':     'str',
> >       '*localaddr': 'str',
> >       '*udp':       'str',
> >       '*path':      'str' } }
> >       '*localpath': 'str' } }
> > 
> > 
> > Cli examples:
> > 
> >  * Unix stream client
> > 
> >   -netdev socket,path=/wibble,mode=client
> > 
> >  * Unix stream server
> >  
> >   -netdev socket,path=/wibble,mode=server
> > 
> >  * Unix datagram 
> > 
> >   -netdev socket,path=/wibble,mode=dgram
> >   -netdev socket,path=/wibble,localpath=/fish,mode=dgram
> 
> I think this looks reasonable, I'm not sure about "localpath",
> because also /wibble is local in some sense.

"local" as in local to the process, rather than "local" as in
local to the host.

> I don't know what would be a good implementation practice to keep the
> current options available -- should this be done with some new code
> that applies a translation to the new parameters?

At the CLI parser side we'd just do translation to the new QAPI style
usually, but I'm not sure how to handle the existing QAPI stuff though.

Perhaps just add new fields to "NetdevSocketOptions" and deprecate
existing ones that become obsolete.

The only other alternative is a parallel type to completely obsolete
NetdevSocketOptions, but I'm not sure what we'd call that.

I had added Markus / Eric to CC to get advice on QAPI side here..

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: socket.c added support for unix domain socket datagram transport
  2021-04-28  9:02           ` Daniel P. Berrangé
@ 2021-04-29 12:07             ` Markus Armbruster
  0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2021-04-29 12:07 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Stefano Brivio, Ralph Schmieder, qemu-devel

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Apr 27, 2021 at 11:52:29PM +0200, Stefano Brivio wrote:
>> On Mon, 26 Apr 2021 13:56:40 +0100
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> > The pain we're feeling is largely because the design of the net
>> > option syntax is one of the oldest parts of QEMU and has only
>> > been very partially improved upon. It is certainly not using
>> > QAPI best practice, if we look at this:
>> > 
>> >   { 'struct': 'NetdevSocketOptions',
>> >     'data': {
>> >       '*fd':        'str',
>> >       '*listen':    'str',
>> >       '*connect':   'str',
>> >       '*mcast':     'str',
>> >       '*localaddr': 'str',
>> >       '*udp':       'str' } }
>> > 
>> > Then some things come to mind
>> > 
>> >  - We're not provinding a way to say what kind of "fd" is
>> >    passed in - is it a UDP/TCP FD, is it a listener or
>> >    client FD, is it unicast or multicast FD. Though QEMU
>> >    can interogate the socket to discover this I guess.
>> 
>> Some form of probing was already added in commit 894022e61601 ("net:
>> check if the file descriptor is valid before using it"). Does qemu need
>> to care, though, once the socket is connected? That is, what would it
>> do with that information?
>
> The only thing it really has to care about is the distinction between
> a listener socket and a data socket.
>
>> >  - All of the properties there except "fd" are encoding two values
>> >    in a single property - address + port. This is an anti-pattern

Yes.

More anti-patterns:

      - Of multiple, nominally optional members, you must provide
        exactly one: listen, connect, mcast, udp.

      - Nominally optional member may only be provided together with
        another optional member: localaddr with mcast or udp.

      - Nominally optional member may be mandatory: localaddr with udp.

Such things can't always be avoided.  But they should always make you
think whether you could avoid them with judicious use of union types.

>> >  - No support for ipv4=on|off and ipv6=on|off flags to control
>> >    dual-stack usage.
>> 
>> I wonder if this needs to be explicit -- it might simply derive from
>> addressing.
>
> This is explicitly everywhere we use sockets in QEMU - it is part
> of the SocketAddress QAPI schema.
>
> Consider an address "::", this is an IPv6 address, but depending on
> how you configure the socket it can accept either IPv6-only or both
> IPv6 and IPv4 incoming connections.
>
> If passing a hostname instead of a raw address, then  the ipv4/ipv6
> flags control whether we use all the returned DNS entries.

Yes.

>> > The "right" way to fix most of this long term is a radical change
>> > to introduce use of the SocketAddress struct.
>> > 
>> > I could envisage something like this
>> > 
>> >   { 'enum': 'NetdevSocketMode',
>> >     'data':  ['dgram', 'client', 'server'] }

I understand 'dgram' asks for passing SOCK_DGRAM to socket().  There are
no SOCK_CLIENT, SOCK_SERVER.  I guess they mean active SOCK_STREAM,
i.e. connect(), vs. passive SOCK_STREAM, i.e. listen().  In short:

    SOCK_DGRAM              'dgram'
    SOCK_STREAM, active     'client'
    SOCK_STREAM, passive    'server'

If we add another connection-based type, say SOCK_SEQPACKET, do we get
to pick names for

    SOCK_SEQPACKET, active  ???
    SOCK_SEQPACKET, passive ???

Encoding type and, if applicable, is-active in a single enum may or may
not be a good idea.

>> > 
>> >   { 'struct': 'NetdevSocketOptions',
>> >     'data': {
>> >       'addr':      'SocketAddress',
>> >       '*localaddr': 'SocketAddress',
>> >       '*mode':      'NetdevSocketMode' } }
>> > 
>> > 
>> >  - A TCP client
>> > 
>> >       addr.type = inet
>> >       addr.host = 192.168.1.1
>> >       mode = client
>> >
>> >  - A TCP server on all interfaces
>> > 
>> >       addr.type = inet
>> >       addr.host = 
>> >       mode = server
>> > 
>> >  - A TCP server on a specific interface
>> > 
>> >       addr.type = inet
>> >       addr.host = 192.168.1.1
>> >       mode = server
>> > 
>> >  - A TCP server on all interfaces, without IPv4
>> > 
>> >       addr.type = inet
>> >       addr.host = 
>> >       addr.has_ipv4 = true
>> >       addr.ipv4 = false
>> >       mode = server
>> 
>> ...perhaps it would be more consistent with other consolidated
>> practices to have addr.type = inet | inet6... and perhaps call it
>> addr.family.

Uh, "family": "fd" would be odd, wouldn't it?

Reminder:

    { 'union': 'SocketAddress',
      'base': { 'type': 'SocketAddressType' },
      'discriminator': 'type',
      'data': { 'inet': 'InetSocketAddress',
                'unix': 'UnixSocketAddress',
                'vsock': 'VsockSocketAddress',
                'fd': 'String' } }

>> Also, for "mode" (that could be called "type" to reflect
>> parameters for socket(2)), we should probably support SOCK_SEQPACKET as
>> well ("seq"?).

Well, 'mode' is more than just the socket type, it also includes
is-active for certain socket types.

> The naming I use here is determined by the QAPI 'SocketAddress'
> struct which has a 'type' field, and separate 'ipv4' and 'ipv6'
> flags.
>
>> 
>> >  - A UDP unicast transport
>> > 
>> >       addr.type = inet
>> >       addr.host = 192.168.1.1
>> >       mode = dgram
>> > 
>> >  - A UDP unicast transport with local addr
>> > 
>> >       addr.type = inet
>> >       addr.host = 192.168.1.1
>> >       localaddr.type = inet
>> >       localaddr.host = 192.168.1.2
>> >       mode = dgram
>> > 
>> >  - A UDP multicast transport
>> > 
>> >      addr.type = inet
>> >      addr.host = 224.0.23.166
>> >      mode = dgram
>> > 
>> >  - A UNIX stream client
>> > 
>> >       addr.type = unix
>> >       addr.path = /some/socket
>> >       mode = client
>> > 
>> >  - A UNIX stream server
>> > 
>> >       addr.type = unix
>> >       addr.path = /some/socket
>> >       mode = server
>> > 
>> >  - A UNIX dgram transport
>> > 
>> >       addr.type = unix
>> >       addr.path = /some/socket
>> >       mode = dgram

What about

           addr.type = fd
           addr.fd = some-fd
           mode = dgram

If the file descriptor has the right type, isn't "mode" redundant?  And
what if it doesn't?

>> > 
>> > 
>> > Now, of course you're just interested in adding UNIX socket support.
>> > 
>> > This design I've outlined above is very much "boiling the ocean".
>> > Thus I'm not requesting you implement this, unless you have a strong
>> > desire to get heavily involved in some QEMU refactoring work.
>> 
>> I don't really have a strong desire to do that, but to my naive eyes it
>> doesn't look that complicated, I'll give it a try.
>
> The hard bit is always the backwards compatibility for existing usage....
>
>
>> > The key question is whether we try to graft UNIX socket support onto
>> > the existing args ("connect"/"listen") or try to do something more
>> > explicit.
>> > 
>> > Given the desire to have both dgram + stream support, I'd be inclined
>> > to do something more explicit, that's slightly more aligned with a
>> > possible future best praactice QAPI design
>> > 
>> > 
>> > IOW, we could take a simplified variant of the above as follows:
>> > 
>> > 
>> >   { 'enum': 'NetdevSocketMode',
>> >     'data':  ['dgram', 'client', 'server'] }
>> > 
>> >   { 'struct': 'NetdevSocketOptions',
>> >     'data': {
>> >       '*fd':        'str',
>> >       '*listen':    'str',
>> >       '*connect':   'str',
>> >       '*mcast':     'str',
>> >       '*localaddr': 'str',
>> >       '*udp':       'str',
>> >       '*path':      'str' } }
>> >       '*localpath': 'str' } }
>> > 
>> > 
>> > Cli examples:
>> > 
>> >  * Unix stream client
>> > 
>> >   -netdev socket,path=/wibble,mode=client
>> > 
>> >  * Unix stream server
>> >  
>> >   -netdev socket,path=/wibble,mode=server
>> > 
>> >  * Unix datagram 
>> > 
>> >   -netdev socket,path=/wibble,mode=dgram
>> >   -netdev socket,path=/wibble,localpath=/fish,mode=dgram
>> 
>> I think this looks reasonable, I'm not sure about "localpath",
>> because also /wibble is local in some sense.
>
> "local" as in local to the process, rather than "local" as in
> local to the host.

I'm confused.  Is this related to the socket's address (think
getsockname()) vs. the peer's address (think getpeername())?

>> I don't know what would be a good implementation practice to keep the
>> current options available -- should this be done with some new code
>> that applies a translation to the new parameters?
>
> At the CLI parser side we'd just do translation to the new QAPI style
> usually, but I'm not sure how to handle the existing QAPI stuff though.
>
> Perhaps just add new fields to "NetdevSocketOptions" and deprecate
> existing ones that become obsolete.

This is always possible.  At best, it remains just as ugly as it is now.

> The only other alternative is a parallel type to completely obsolete
> NetdevSocketOptions,

Probably the only way to get a "modern" interface.  Whether it's worth
the effort is hard to tell.

>                      but I'm not sure what we'd call that.

If we can come up with a better backend name than "socket", it's easy:
NetdevBetterNameOptions.

> I had added Markus / Eric to CC to get advice on QAPI side here..

Hope I could help at least a little.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-29 12:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23  6:56 socket.c added support for unix domain socket datagram transport Ralph Schmieder
2021-04-23  9:16 ` Daniel P. Berrangé
2021-04-23 13:38   ` Ralph Schmieder
2021-04-23 15:29 ` Stefano Brivio
2021-04-23 15:48   ` Ralph Schmieder
2021-04-23 16:39     ` Stefano Brivio
2021-04-26 11:14       ` Ralph Schmieder
2021-04-27 21:51         ` Stefano Brivio
2021-04-23 16:21   ` Daniel P. Berrangé
2021-04-23 16:54     ` Stefano Brivio
2021-04-26 12:05       ` Ralph Schmieder
2021-04-26 12:56       ` Daniel P. Berrangé
2021-04-27 21:52         ` Stefano Brivio
2021-04-28  9:02           ` Daniel P. Berrangé
2021-04-29 12:07             ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.