All of lore.kernel.org
 help / color / mirror / Atom feed
* Upstream Homa?
@ 2022-11-10 19:42 John Ousterhout
  2022-11-10 21:25 ` Stephen Hemminger
  2022-12-04 18:17 ` Jamal Hadi Salim
  0 siblings, 2 replies; 15+ messages in thread
From: John Ousterhout @ 2022-11-10 19:42 UTC (permalink / raw)
  To: netdev

Several people at the netdev conference asked me if I was working to
upstream the Homa transport protocol into the kernel. I have assumed
that this is premature, given that there is not yet significant usage of
Homa, but they encouraged me to start a discussion about upstreaming
with the netdev community.

So, I'm sending this message to ask for advice about (a) what state
Homa needs to reach before it would be appropriate to upstream it,
and, (b) if/when that time is reached, what is the right way to go about it.
Homa currently has about 13K lines of code, which I assume is far too
large for a single patch set; at the same time, it's hard to envision a
manageable first patch set with enough functionality to be useful by itself.

-John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-10 19:42 Upstream Homa? John Ousterhout
@ 2022-11-10 21:25 ` Stephen Hemminger
  2022-11-10 23:23   ` Andrew Lunn
  2022-12-04 18:17 ` Jamal Hadi Salim
  1 sibling, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2022-11-10 21:25 UTC (permalink / raw)
  To: John Ousterhout; +Cc: netdev

On Thu, 10 Nov 2022 11:42:35 -0800
John Ousterhout <ouster@cs.stanford.edu> wrote:

> Several people at the netdev conference asked me if I was working to
> upstream the Homa transport protocol into the kernel. I have assumed
> that this is premature, given that there is not yet significant usage of
> Homa, but they encouraged me to start a discussion about upstreaming
> with the netdev community.
> 
> So, I'm sending this message to ask for advice about (a) what state
> Homa needs to reach before it would be appropriate to upstream it,
> and, (b) if/when that time is reached, what is the right way to go about it.
> Homa currently has about 13K lines of code, which I assume is far too
> large for a single patch set; at the same time, it's hard to envision a
> manageable first patch set with enough functionality to be useful by itself.
> 
> -John-

There are lots of experimental protocols already in Linux.
The usual upstream problem areas are:
 - coding style

 - compatibility layers
   developers don't care about code to run on older versions or other OS.
   
 - user API
   once you define it hard to change, need to get it right

 - tests
   is there a way to make sure it works on all platforms

Heuristics and bug fixing are fine, in fact having a wider community
will help.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-10 21:25 ` Stephen Hemminger
@ 2022-11-10 23:23   ` Andrew Lunn
       [not found]     ` <CAGXJAmw=NY17=6TnDh0oV9WTmNkQCe9Q9F3Z=uGjG9x5NKn7TQ@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2022-11-10 23:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John Ousterhout, netdev

On Thu, Nov 10, 2022 at 01:25:40PM -0800, Stephen Hemminger wrote:
> On Thu, 10 Nov 2022 11:42:35 -0800
> John Ousterhout <ouster@cs.stanford.edu> wrote:
> 
> > Several people at the netdev conference asked me if I was working to
> > upstream the Homa transport protocol into the kernel. I have assumed
> > that this is premature, given that there is not yet significant usage of
> > Homa, but they encouraged me to start a discussion about upstreaming
> > with the netdev community.
> > 
> > So, I'm sending this message to ask for advice about (a) what state
> > Homa needs to reach before it would be appropriate to upstream it,
> > and, (b) if/when that time is reached, what is the right way to go about it.
> > Homa currently has about 13K lines of code, which I assume is far too
> > large for a single patch set; at the same time, it's hard to envision a
> > manageable first patch set with enough functionality to be useful by itself.
> > 
> > -John-

Hi John

> The usual upstream problem areas are:
>  - coding style

You can get a good feeling about what sort of coding style review
comments you will get by running ./scripts/checkpatch.pl over your
files. You don't need to be completely checkpatch clean, it does get
things wrong sometimes.

Adding to Stephens list.

- You have reinvented something which the kernel already has. You need
  to throw away your version and use the kernel version.

- You have used deprecated things, like /proc, ioctls rather than
  netlink.

- 32 bit kernel problems. Since this is about data center, your code
  might make assumptions about running on a 64 bit machine. Statistics
  tend to be done wrong, unless you are using the correct kernel
  helpers to deal with 64 bit counters on 32 bit machines.

Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
       [not found]     ` <CAGXJAmw=NY17=6TnDh0oV9WTmNkQCe9Q9F3Z=uGjG9x5NKn7TQ@mail.gmail.com>
@ 2022-11-11 19:10       ` Stephen Hemminger
  2022-11-11 19:25       ` Andrew Lunn
  1 sibling, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2022-11-11 19:10 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Andrew Lunn, netdev

On Fri, 11 Nov 2022 10:59:58 -0800
John Ousterhout <ouster@cs.stanford.edu> wrote:

> The netlink and 32-bit kernel issues are new for me; I've done some digging
> to learn more, but still have some questions.
> 
> * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
> currently uses ioctls on sockets for I/O (its APIs aren't
> sockets-compatible). It looks like switching to netlink would double the
> number of system calls that have to be invoked, which would be unfortunate
> given Homa's goal of getting the lowest possible latency. It also looks
> like netlink might be awkward for dumping large volumes of kernel data to
> user space (potential for buffer overflow?).
> 
> * By "32 bit kernel problems" are you referring to the lack of atomic
> 64-bit operations and using the facilities of u64_stats_sync.h, or is there
> a more general issue with 64-bit operations?
> 
> -John-

I admit, haven't looked at Hama code. Are you using ioctl as a generic
way into kernel for operations?

Ioctl's on sockets are awkward API and have lots of issues.
The support of 32 bit app on 64 bit OS is one of them.
For that reason they are strongly discouraged.

Netlink allows multiple TLV options in single request and they should
be processed as transaction.  Netlink is intended for control operations.

If you need a new normal path operation, then either use an existing
system call (sendmsg/recvmsg) with new flags; or introduce a new system
call. Don't abuse ioctl as a way to avoid introducing new system call.
New system calls do add additional complexity to security modules, so
SELinux etc may need to know.

PS: please don't top post in replys to Linux mailing lists.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
       [not found]     ` <CAGXJAmw=NY17=6TnDh0oV9WTmNkQCe9Q9F3Z=uGjG9x5NKn7TQ@mail.gmail.com>
  2022-11-11 19:10       ` Stephen Hemminger
@ 2022-11-11 19:25       ` Andrew Lunn
  2022-11-12  7:53         ` Jiri Pirko
  2022-11-13  6:09         ` John Ousterhout
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Lunn @ 2022-11-11 19:25 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Stephen Hemminger, netdev

On Fri, Nov 11, 2022 at 10:59:58AM -0800, John Ousterhout wrote:
> The netlink and 32-bit kernel issues are new for me; I've done some digging to
> learn more, but still have some questions.
> 

> * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
> currently uses ioctls on sockets for I/O (its APIs aren't sockets-compatible).
> It looks like switching to netlink would double the number of system calls that
> have to be invoked, which would be unfortunate given Homa's goal of getting the
> lowest possible latency. It also looks like netlink might be awkward for
> dumping large volumes of kernel data to user space (potential for buffer
> overflow?).

I've not looked at the actually code, i'm making general comments.

netlink, like ioctl, is meant for the control plain, not the data
plain. Your statistics should be reported via netlink, for
example. netlink is used to configure routes, setup bonding, bridges
etc. netlink can also dump large volumes of data, it has no problems
dumping the full Internet routing table for example.

How you get real packet data between the userspace and kernel space is
a different question. You say it is not BSD socket compatible. But
maybe there is another existing kernel API which will work? Maybe post
what your ideal API looks like and why sockets don't work. Eric
Dumazet could give you some ideas about what the kernel has which
might do what you need. This is the uAPI point that Stephen raised.

> * By "32 bit kernel problems" are you referring to the lack of atomic 64-bit
> operations and using the facilities of u64_stats_sync.h, or is there a more
> general issue with 64-bit operations?

Those helpers do the real work, and should optimise to pretty much
nothing on an 64 bit kernel, but do the right thing on 32 bit kernels.

But you are right, the general point is that they are not atomic, so
you need to be careful with threads, and any access to a 64 bit values
needs to be protected somehow, hopefully in a way that is optimised
out on 64bit systems.

      Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-11 19:25       ` Andrew Lunn
@ 2022-11-12  7:53         ` Jiri Pirko
  2022-11-13  6:25           ` John Ousterhout
  2022-11-13  6:09         ` John Ousterhout
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Pirko @ 2022-11-12  7:53 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: John Ousterhout, Stephen Hemminger, netdev

Fri, Nov 11, 2022 at 08:25:44PM CET, andrew@lunn.ch wrote:
>On Fri, Nov 11, 2022 at 10:59:58AM -0800, John Ousterhout wrote:
>> The netlink and 32-bit kernel issues are new for me; I've done some digging to
>> learn more, but still have some questions.
>> 
>
>> * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
>> currently uses ioctls on sockets for I/O (its APIs aren't sockets-compatible).

Why exactly it isn't sockets-comatible?


>> It looks like switching to netlink would double the number of system calls that
>> have to be invoked, which would be unfortunate given Homa's goal of getting the
>> lowest possible latency. It also looks like netlink might be awkward for
>> dumping large volumes of kernel data to user space (potential for buffer
>> overflow?).

Netlink is slow, you should use it for fast path. It is for
configuration and stats.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-11 19:25       ` Andrew Lunn
  2022-11-12  7:53         ` Jiri Pirko
@ 2022-11-13  6:09         ` John Ousterhout
  2022-11-13  8:24           ` Jiri Pirko
  1 sibling, 1 reply; 15+ messages in thread
From: John Ousterhout @ 2022-11-13  6:09 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Stephen Hemminger, netdev

On Fri, Nov 11, 2022 at 11:25 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Fri, Nov 11, 2022 at 10:59:58AM -0800, John Ousterhout wrote:
> > The netlink and 32-bit kernel issues are new for me; I've done some digging to
> > learn more, but still have some questions.
> >
>
> > * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
> > currently uses ioctls on sockets for I/O (its APIs aren't sockets-compatible).
> > It looks like switching to netlink would double the number of system calls that
> > have to be invoked, which would be unfortunate given Homa's goal of getting the
> > lowest possible latency. It also looks like netlink might be awkward for
> > dumping large volumes of kernel data to user space (potential for buffer
> > overflow?).
>
> I've not looked at the actually code, i'm making general comments.
>
> netlink, like ioctl, is meant for the control plain, not the data
> plain. Your statistics should be reported via netlink, for
> example. netlink is used to configure routes, setup bonding, bridges
> etc. netlink can also dump large volumes of data, it has no problems
> dumping the full Internet routing table for example.
>
> How you get real packet data between the userspace and kernel space is
> a different question. You say it is not BSD socket compatible. But
> maybe there is another existing kernel API which will work? Maybe post
> what your ideal API looks like and why sockets don't work. Eric
> Dumazet could give you some ideas about what the kernel has which
> might do what you need. This is the uAPI point that Stephen raised.

OK, will do. I'm in the middle of a major API refactor, so I'll wait
until that is
resolved before pursing this issue more.

> > * By "32 bit kernel problems" are you referring to the lack of atomic 64-bit
> > operations and using the facilities of u64_stats_sync.h, or is there a more
> > general issue with 64-bit operations?
>
> Those helpers do the real work, and should optimise to pretty much
> nothing on an 64 bit kernel, but do the right thing on 32 bit kernels.
>
> But you are right, the general point is that they are not atomic, so
> you need to be careful with threads, and any access to a 64 bit values
> needs to be protected somehow, hopefully in a way that is optimised
> out on 64bit systems.

Is it acceptable to have features that are only supported on 64-bit kernels?
This would be my first choice, since I don't think there will be much interest
in Homa on 32-bit platforms.

If that's not OK, are there any mechanisms available for helping people
test on 32-bit platforms? For example, is it possible to configure Linux to
compile in 32-bit mode so I could test that even on a 64-bit machine
(I don't have access to a 32-bit machine)?

-John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-12  7:53         ` Jiri Pirko
@ 2022-11-13  6:25           ` John Ousterhout
  2022-11-13 17:10             ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: John Ousterhout @ 2022-11-13  6:25 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Andrew Lunn, Stephen Hemminger, netdev

On Fri, Nov 11, 2022 at 11:53 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Nov 11, 2022 at 08:25:44PM CET, andrew@lunn.ch wrote:
> >On Fri, Nov 11, 2022 at 10:59:58AM -0800, John Ousterhout wrote:
> >> The netlink and 32-bit kernel issues are new for me; I've done some digging to
> >> learn more, but still have some questions.
> >>
> >
> >> * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
> >> currently uses ioctls on sockets for I/O (its APIs aren't sockets-compatible).
>
> Why exactly it isn't sockets-comatible?

Homa implements RPCs rather than streams like TCP or messages like
UDP. An RPC consists of a request message sent from client to server,
followed by a response message from server back to client. This requires
additional information in the API beyond what is provided in the arguments to
sendto and recvfrom. For example, when sending a request message, the
kernel returns an RPC identifier back to the application; when waiting for
a response, the application can specify that it wants to receive the reply for
a specific RPC identifier (or, it can specify that it will accept any
reply, or any
request, or both).

> >> It looks like switching to netlink would double the number of system calls that
> >> have to be invoked, which would be unfortunate given Homa's goal of getting the
> >> lowest possible latency. It also looks like netlink might be awkward for
> >> dumping large volumes of kernel data to user space (potential for buffer
> >> overflow?).
>
> Netlink is slow, you should use it for fast path. It is for
> configuration and stats.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13  6:09         ` John Ousterhout
@ 2022-11-13  8:24           ` Jiri Pirko
  2022-11-13 18:53             ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Pirko @ 2022-11-13  8:24 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Andrew Lunn, Stephen Hemminger, netdev

Sun, Nov 13, 2022 at 07:09:48AM CET, ouster@cs.stanford.edu wrote:
>On Fri, Nov 11, 2022 at 11:25 AM Andrew Lunn <andrew@lunn.ch> wrote:
>>
>> On Fri, Nov 11, 2022 at 10:59:58AM -0800, John Ousterhout wrote:
>> > The netlink and 32-bit kernel issues are new for me; I've done some digging to
>> > learn more, but still have some questions.
>> >
>>
>> > * Is the intent that netlink replaces *all* uses of /proc and ioctl? Homa
>> > currently uses ioctls on sockets for I/O (its APIs aren't sockets-compatible).
>> > It looks like switching to netlink would double the number of system calls that
>> > have to be invoked, which would be unfortunate given Homa's goal of getting the
>> > lowest possible latency. It also looks like netlink might be awkward for
>> > dumping large volumes of kernel data to user space (potential for buffer
>> > overflow?).
>>
>> I've not looked at the actually code, i'm making general comments.
>>
>> netlink, like ioctl, is meant for the control plain, not the data
>> plain. Your statistics should be reported via netlink, for
>> example. netlink is used to configure routes, setup bonding, bridges
>> etc. netlink can also dump large volumes of data, it has no problems
>> dumping the full Internet routing table for example.
>>
>> How you get real packet data between the userspace and kernel space is
>> a different question. You say it is not BSD socket compatible. But
>> maybe there is another existing kernel API which will work? Maybe post
>> what your ideal API looks like and why sockets don't work. Eric
>> Dumazet could give you some ideas about what the kernel has which
>> might do what you need. This is the uAPI point that Stephen raised.
>
>OK, will do. I'm in the middle of a major API refactor, so I'll wait
>until that is
>resolved before pursing this issue more.
>
>> > * By "32 bit kernel problems" are you referring to the lack of atomic 64-bit
>> > operations and using the facilities of u64_stats_sync.h, or is there a more
>> > general issue with 64-bit operations?
>>
>> Those helpers do the real work, and should optimise to pretty much
>> nothing on an 64 bit kernel, but do the right thing on 32 bit kernels.
>>
>> But you are right, the general point is that they are not atomic, so
>> you need to be careful with threads, and any access to a 64 bit values
>> needs to be protected somehow, hopefully in a way that is optimised
>> out on 64bit systems.
>
>Is it acceptable to have features that are only supported on 64-bit kernels?

I don't think so. There are plenty 32bit platforms supported, all should
work there.


>This would be my first choice, since I don't think there will be much interest
>in Homa on 32-bit platforms.
>
>If that's not OK, are there any mechanisms available for helping people
>test on 32-bit platforms? For example, is it possible to configure Linux to
>compile in 32-bit mode so I could test that even on a 64-bit machine
>(I don't have access to a 32-bit machine)?

You can do it easily in emulated environment, like qemu.


>
>-John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13  6:25           ` John Ousterhout
@ 2022-11-13 17:10             ` Andrew Lunn
  2022-11-13 20:10               ` John Ousterhout
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2022-11-13 17:10 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Jiri Pirko, Stephen Hemminger, netdev

> Homa implements RPCs rather than streams like TCP or messages like
> UDP. An RPC consists of a request message sent from client to server,
> followed by a response message from server back to client. This requires
> additional information in the API beyond what is provided in the arguments to
> sendto and recvfrom. For example, when sending a request message, the
> kernel returns an RPC identifier back to the application; when waiting for
> a response, the application can specify that it wants to receive the reply for
> a specific RPC identifier (or, it can specify that it will accept any
> reply, or any
> request, or both).

This sounds like the ancillary data you can pass to sendmsg(). I've
not checked the code, it might be the current plumbing is only into to
the kernel, but i don't see why you cannot extend it to also allow
data to be passed back to user space. If this is new functionality,
maybe add a new flags argument to control it.

recvmsg() also has ancillary data.

	  Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13  8:24           ` Jiri Pirko
@ 2022-11-13 18:53             ` Andrew Lunn
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Lunn @ 2022-11-13 18:53 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: John Ousterhout, Stephen Hemminger, netdev

> You can do it easily in emulated environment, like qemu.

https://translatedcode.wordpress.com/2016/11/03/installing-debian-on-qemus-32-bit-arm-virt-board/

This is a few years old, but things have not changed much. It will get
you a reasonably generic ARM system running in QEMU, on top of
whatever hardware you have. I would replace jessie with bullseye, but
the process should remain the same.

It will not be a very fast machine, since there is no KVM
acceleration. So you probably will want to cross compile the
kernel. This is well supported, it is what Embedded developers do all
the time.

    Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13 17:10             ` Andrew Lunn
@ 2022-11-13 20:10               ` John Ousterhout
  2022-11-13 20:37                 ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: John Ousterhout @ 2022-11-13 20:10 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jiri Pirko, Stephen Hemminger, netdev

On Sun, Nov 13, 2022 at 9:10 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Homa implements RPCs rather than streams like TCP or messages like
> > UDP. An RPC consists of a request message sent from client to server,
> > followed by a response message from server back to client. This requires
> > additional information in the API beyond what is provided in the arguments to
> > sendto and recvfrom. For example, when sending a request message, the
> > kernel returns an RPC identifier back to the application; when waiting for
> > a response, the application can specify that it wants to receive the reply for
> > a specific RPC identifier (or, it can specify that it will accept any
> > reply, or any
> > request, or both).
>
> This sounds like the ancillary data you can pass to sendmsg(). I've
> not checked the code, it might be the current plumbing is only into to
> the kernel, but i don't see why you cannot extend it to also allow
> data to be passed back to user space. If this is new functionality,
> maybe add a new flags argument to control it.
>
> recvmsg() also has ancillary data.

Whoah! I'd never noticed the msg_control and msg_controllen fields before.
These may be sufficient to do everything Homa needs. Thanks for pointing
this out.

-John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13 20:10               ` John Ousterhout
@ 2022-11-13 20:37                 ` Andrew Lunn
  2022-11-14  5:37                   ` John Ousterhout
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2022-11-13 20:37 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Jiri Pirko, Stephen Hemminger, netdev

On Sun, Nov 13, 2022 at 12:10:22PM -0800, John Ousterhout wrote:
> On Sun, Nov 13, 2022 at 9:10 AM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > Homa implements RPCs rather than streams like TCP or messages like
> > > UDP. An RPC consists of a request message sent from client to server,
> > > followed by a response message from server back to client. This requires
> > > additional information in the API beyond what is provided in the arguments to
> > > sendto and recvfrom. For example, when sending a request message, the
> > > kernel returns an RPC identifier back to the application; when waiting for
> > > a response, the application can specify that it wants to receive the reply for
> > > a specific RPC identifier (or, it can specify that it will accept any
> > > reply, or any
> > > request, or both).
> >
> > This sounds like the ancillary data you can pass to sendmsg(). I've
> > not checked the code, it might be the current plumbing is only into to
> > the kernel, but i don't see why you cannot extend it to also allow
> > data to be passed back to user space. If this is new functionality,
> > maybe add a new flags argument to control it.
> >
> > recvmsg() also has ancillary data.
> 
> Whoah! I'd never noticed the msg_control and msg_controllen fields before.
> These may be sufficient to do everything Homa needs. Thanks for pointing
> this out.

Is zero copy also required? https://lwn.net/Articles/726917/ talks
about this. But rather than doing the transmit complete notification
via MSG_ERRORQUEUE, maybe you could make it part of the ancillary data
for a later message? That could save you some system calls? Or is the
latency low enough that the RPC reply acts an implicitly indication
the transmit buffer can be recycled?

If your aim is to offload Homa to the NIC, it seems like zero copy is
something you want, so even if you are not implementing it now, you
probably should consider what the uAPI looks like.

	 Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-13 20:37                 ` Andrew Lunn
@ 2022-11-14  5:37                   ` John Ousterhout
  0 siblings, 0 replies; 15+ messages in thread
From: John Ousterhout @ 2022-11-14  5:37 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jiri Pirko, Stephen Hemminger, netdev

On Sun, Nov 13, 2022 at 12:38 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, Nov 13, 2022 at 12:10:22PM -0800, John Ousterhout wrote:
> > On Sun, Nov 13, 2022 at 9:10 AM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > Homa implements RPCs rather than streams like TCP or messages like
> > > > UDP. An RPC consists of a request message sent from client to server,
> > > > followed by a response message from server back to client. This requires
> > > > additional information in the API beyond what is provided in the arguments to
> > > > sendto and recvfrom. For example, when sending a request message, the
> > > > kernel returns an RPC identifier back to the application; when waiting for
> > > > a response, the application can specify that it wants to receive the reply for
> > > > a specific RPC identifier (or, it can specify that it will accept any
> > > > reply, or any
> > > > request, or both).
> > >
> > > This sounds like the ancillary data you can pass to sendmsg(). I've
> > > not checked the code, it might be the current plumbing is only into to
> > > the kernel, but i don't see why you cannot extend it to also allow
> > > data to be passed back to user space. If this is new functionality,
> > > maybe add a new flags argument to control it.
> > >
> > > recvmsg() also has ancillary data.
> >
> > Whoah! I'd never noticed the msg_control and msg_controllen fields before.
> > These may be sufficient to do everything Homa needs. Thanks for pointing
> > this out.
>
> Is zero copy also required? https://lwn.net/Articles/726917/ talks
> about this. But rather than doing the transmit complete notification
> via MSG_ERRORQUEUE, maybe you could make it part of the ancillary data
> for a later message? That could save you some system calls? Or is the
> latency low enough that the RPC reply acts an implicitly indication
> the transmit buffer can be recycled?
>
> If your aim is to offload Homa to the NIC, it seems like zero copy is
> something you want, so even if you are not implementing it now, you
> probably should consider what the uAPI looks like.

I know that zero copy is all the rage these days, but I've become somewhat of
a skeptic. We spent quite a bit of time in the RAMCloud project
implementing zero
copy (and we were using kernel-bypass NICs, which make it about as efficient as
possible); we found that it is very difficult to get a real performance benefit.
Managing the space so you know when you can reclaim it adds a lot of complexity
and overhead. My current thinking is that zero copy only makes sense when you
have really large blocks of data. I'm inclined to let others
experiment with zero-copy
for a while and see if they can achieve sustainable benefits over a
meaningful range
of operating conditions.

-John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Upstream Homa?
  2022-11-10 19:42 Upstream Homa? John Ousterhout
  2022-11-10 21:25 ` Stephen Hemminger
@ 2022-12-04 18:17 ` Jamal Hadi Salim
  1 sibling, 0 replies; 15+ messages in thread
From: Jamal Hadi Salim @ 2022-12-04 18:17 UTC (permalink / raw)
  To: John Ousterhout; +Cc: netdev

And for folks interested in John's work, we just posted his excellent keynote
slides+video from netdevconf 0x16. See:
https://netdevconf.info/0x16/session.html?keynote-ousterhout

cheers,
jamal

On Thu, Nov 10, 2022 at 2:43 PM John Ousterhout <ouster@cs.stanford.edu> wrote:
>
> Several people at the netdev conference asked me if I was working to
> upstream the Homa transport protocol into the kernel. I have assumed
> that this is premature, given that there is not yet significant usage of
> Homa, but they encouraged me to start a discussion about upstreaming
> with the netdev community.
>
> So, I'm sending this message to ask for advice about (a) what state
> Homa needs to reach before it would be appropriate to upstream it,
> and, (b) if/when that time is reached, what is the right way to go about it.
> Homa currently has about 13K lines of code, which I assume is far too
> large for a single patch set; at the same time, it's hard to envision a
> manageable first patch set with enough functionality to be useful by itself.
>
> -John-

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-12-04 18:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10 19:42 Upstream Homa? John Ousterhout
2022-11-10 21:25 ` Stephen Hemminger
2022-11-10 23:23   ` Andrew Lunn
     [not found]     ` <CAGXJAmw=NY17=6TnDh0oV9WTmNkQCe9Q9F3Z=uGjG9x5NKn7TQ@mail.gmail.com>
2022-11-11 19:10       ` Stephen Hemminger
2022-11-11 19:25       ` Andrew Lunn
2022-11-12  7:53         ` Jiri Pirko
2022-11-13  6:25           ` John Ousterhout
2022-11-13 17:10             ` Andrew Lunn
2022-11-13 20:10               ` John Ousterhout
2022-11-13 20:37                 ` Andrew Lunn
2022-11-14  5:37                   ` John Ousterhout
2022-11-13  6:09         ` John Ousterhout
2022-11-13  8:24           ` Jiri Pirko
2022-11-13 18:53             ` Andrew Lunn
2022-12-04 18:17 ` Jamal Hadi Salim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.