All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-22  0:34 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-08-22  0:34 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 13882 bytes --]


Lorenzo and Christoph,

On Sat, 19 Aug 2017, Christoph Paasch wrote:

> Hello Lorenzo,
>
>
> thanks for chiming in. Please see inline:
>
> On 19/08/17 - 13:43:38, Lorenzo Colitti wrote:
>> Sorry I'm late to this thread. I've been thinking about this for a while,
>> and wanted to share some of those thoughts in the hope that they are useful.

Glad that you're here, Lorenzo!

>> Reading through the 4.4 patch, many of the points made by Mat make sense to
>> me. Especially, off-by-default is important (upstream cares deeply about
>> backwards compatibility). In fact, personally I don't see a lot of
>> advantage to enabling MPTCP on unmodified applications that directly call
>> the socket API. But perhaps there are use cases there that I don't see.
>
> I agree that we should have off-by-default and opt-in through a
> socket-option (or similar mechanism).
>
>
> Wrt., to using MPTCP on unmodified applications, Daniel Borkmann told me
> once that a big use-case for MPTCP would be data-center applications where
> one cannot change the application (e.g., because it's closed-source).
> If I am not mistaken, at the time he was working on SCTP and they were
> trying to do some magic with LD_PRELOAD to make these apps seamlessly use
> SCTP.
>
> But I would say bad luck for these apps then ;-)

I've also heard anecdotes of data center use cases for unmodified 
applications, but it seems like that won't be the most common usage. 
Applications (user or data-center) can make better use of MPTCP if they're 
written for it, but regular TCP applications could get some of MPTCP's 
benefits using proxies or LD_PRELOAD techniques.

>> Having as much code in userspace as possible will help.  Moving path
>> management to userspace seems like it could be an easy way to do that (and
>> thinking specifically about how Android does networking, the kernel doesn't
>> really have enough information to decide which subflows should go where,
>> anyway).
>>
>> For a mobile device, the simplest approach seems to be to just to
>> have zero or one subflows per interface, with explicit addition and removal
>> of subflows from userspace, and possibly a setsockopt to add and disable
>> MP_PRIO.
>
> +1

For userspace path management and control of adding/removing subflows, do 
you prefer to have per-application control, or does systemwide control 
through a netlink socket work? I have been leaning in the netlink 
direction as a way to keep the application interface simpler, and am 
hoping to get more information to determine if that's the best approach.

>> What I think that means is that there needs to be a subflow abstraction 
>> that userspace can see. One way to do this might be to expose the subflows
>> as individual sockets on which read() and write() return EINVAL, but on
>> which connect() and setsockopt() can operate as normal - connect() to
>> establish a subflow, setsockopt() to do things like set subflows to backup,
>> ask the kernel to send the MP_PRIO option, and so on. Not sure what you'd
>> do on a server to get the subflows for accepted connections, though. For
>> blocking calls you could use accept() on the master socket, but really you
>> want an asynchronous notification when a connection comes in. Perhaps give
>> up support for urgent data and re-use POLLPRI? There is precedent for
>> reusing POLLPRI for things that aren't urgent data, but such as solution
>> might be seen as too hacky by upstream.

On the server side, is there application-level knowledge that would affect 
the decision to accept, reject, initiate, or close a subflow? Or would it 
be a set policy (say, maximum of X subflows per connection) that could be 
managed without notifying the application?

>> The alternative would be to expose only one filedescriptor and use
>> subsetsockopts to affect the individual flows, like
>> draft-hesmans-mptcp-socket-02
>> does. On Android specifically, routing traffic on a particular network
>> (e.g., wifi vs. cellular) requires that sk->sk_mark be set to an
>> appropriate value, so there must be a way to do that for each subflow.
>
> Yes, I think both approaches would be fine. Although, I'm not sure what
> upstream's opinion is on adding more socket-options. Which is why the first
> option might probably be better.

Since we could add SOL_MPTCP we wouldn't be cluttering up TCP's option 
space, so it seems reasonable to propose some new options.

> Wrt. to the asynchronous mechanism for accepting new subflows. What about
> using the error-queue that is used today for SO_TIMESTAMP? This could wake
> up the socket and signal that there is a new subflow. Then the app can do an
> accept() to get the fd.

MSG_ZEROCOPY is a recent addition that uses the error queue in a similar 
way: https://lwn.net/Articles/726917/

> I have another question: How do you see the risk of "malicious" apps that
> then create subflows on cell and just send plenty of data over the cellular
> interface? Because, if we expose path-management at the user-space level an
> app can take full control over the behavior of MPTCP.

I'd see this as a benefit of the netlink-based path manager, which would 
act at the system level.

>> One general thing I've heard is that having the API flow be similar to the
>> single-TCP-socket model is an advantage - for example, I hear that one of
>> the reasons for the low adoption of TFO on Linux is that the API is
>> different to standard TCP because it uses sendto() instead of connect, and
>> that makes it hard to use lots of libraries such as openssl that want "a
>> filedescriptor of a connected socket".

Agreed - ease of use and familiarity are important. Applications without 
MPTCP already have the ability to open multiple TCP connections and make 
use of different network paths, but then they have extra complexity to 
manage. MPTCP can hide that complexity from the application.

>> The scheduler seems like it could belong in the kernel, because it has to
>> react quickly to events such as receiving packets. Perhaps this could use
>> similar mechanisms to the existing pluggable congestion control algorithms?
>
> Yes, we already have this today where schedulers are implemented as
> pluggable modules that can be selected either through sysctl or a
> socket-option (exactly the same as congestion control).
>
> These schedulers however currently don't have many configuration knobs to
> tune them.

Application with more specific scheduling needs could give hints to 
schedulers using control messages.

>> Another option, for advanced clients, would be to have userspace pass in a
>> scheduling algorithm written in EBPF. One thing that might be helpful is to
>> have a "send all packets on all subflows" scheduler for cases, though I'm
>> not sure if that's feasible.
>
> I like that idea of an ebpf-style scheduler.

The recent "socket tap" patch set on netdev made me start thinking of how 
eBPF might apply to MPTCP. Scheduling would be a good fit!


Regards,
Mat


>
>
> Cheers,
> Christoph
>
>>
>> Regards,
>> Lorenzo
>>
>> On Wed, Aug 2, 2017 at 2:09 PM, Christoph Paasch <cpaasch(a)apple.com> wrote:
>>
>>> Hello,
>>>
>>> I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
>>> networking of Android at Google.
>>> Lorenzo, you can subscribe to the mailing-list at
>>> https://lists.01.org/mailman/listinfo/mptcp.
>>>
>>>
>>> I discussed MPTCP with him at the IETF two weeks back, and he was
>>> interested
>>> in helping making MPTCP upstreamable.
>>> I let him chime in on the discussion.
>>>
>>>
>>>
>>> Wrt. to the below, yes I agree with the appraoch Mat outlined.
>>> On my side, I will be able to spend more cycles now on Linux. I will start
>>> by porting the code from multipath-tcp.org up to upstream's version (we
>>> have
>>> been lagging behind quite a bit again). That way we have a common base
>>> where
>>> we can easily see how well the RFC-patches (for example more generic
>>> capabilities in TCP) would work with MPTCP.
>>>
>>>
>>> Cheers,
>>> Christoph
>>>
>>>
>>>
>>> On 18/07/17 - 17:31:51, Mat Martineau wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> Our goal on this mailing list is to add an MPTCP implementation to the
>>>> upstream Linux kernel. There's a fair amount of work to be done to
>>> achieve
>>>> this, and a number of options for how to go about it. Some of this
>>> revisits
>>>> previous discussions on this list and elsewhere, but I want to be sure we
>>>> have some level of consensus about the direction to head in.
>>>>
>>>> A couple of us on this list have had discussions with the Linux net
>>>> maintainers, and they have some some specific needs concerning
>>> modifications
>>>> to the Linux TCP stack:
>>>>
>>>>  * TCP complexity can't increase. It's already a complex,
>>>> performance-sensitive piece of software that every Linux user depends on.
>>>> Intrusive changes have a risk of creating bugs or changing operation of
>>> the
>>>> stack in unexpected ways.
>>>>
>>>>  * sk_buff structure size can't get bigger. It's already large and, if
>>>> anything, they hope to reduce it's size. Changes to the data structure
>>> size
>>>> are amplified by the large number of instances in a system handling a
>>> lot of
>>>> traffic.
>>>>
>>>>  * An additional protocol like MPTCP should be opt-in, so users of
>>> regular
>>>> TCP continue to get the same type of connection and performance unless
>>> MPTCP
>>>> is requested.
>>>>
>>>> I also recommend reading "On submitting kernel patches"
>>>> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
>>>> process and hurdles involved in merging major core functionality for the
>>>> Linux kernel.
>>>>
>>>>
>>>> Various Strategies
>>>> ------------------
>>>>
>>>> One approach is to attempt to merge the multipath-tcp.org fork. This is
>>> an
>>>> implementation in which the multipath-tcp.org community has invested a
>>> lot
>>>> of time and effort, and it is in production for major applications (see
>>>> https://tools.ietf.org/html/rfc8041). This is a tremendous amount of
>>> code to
>>>> review at once (even separating out modules), and currently doesn't fit
>>> with
>>>> what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP
>>> by
>>>> default). I don't think the maintainers would consider merging such an
>>>> extensive piece git history, especially where there are a fair number of
>>>> commits without an "mptcp:" label on the subject line or without a DCO
>>>> signoff (https://www.kernel.org/doc/html/latest/process/
>>> submitting-patches.html#sign-your-work-the-developer-s-
>>> certificate-of-origin).
>>>> Today, the fork is at kernel v4.4 and current upstream development is at
>>>> v4.13-rc1, so the fork would have to catch up and stay current.
>>>>
>>>> The other extreme is to rewrite from scratch. This would allow
>>> incremental
>>>> development with maintainer review from the start, but doesn't take
>>>> advantage of existing code.
>>>>
>>>> The most realistic approach is somewhere in between, where we write new
>>> code
>>>> that fits maintainer expectations and utilize components from the fork
>>> where
>>>> licensing allows and the code fits. We'll have to find the right balance:
>>>> over-reliance on new code could take extra time, but constantly reworking
>>>> the fork and keeping it up-to-date with net-next is also a lot of
>>> overhead.
>>>>
>>>> To start with, we can create RFC patches (code that's ready for comment
>>>> rather than merge -- not "RFC" in the IETF sense) that allow us to extend
>>>> TCP in the ways that are useful for both MPTCP and other extended TCP
>>>> features. The maintainers would be able to review those standalone
>>> patches,
>>>> and there's potential to backport the patches to prove them out with the
>>>> multipath-tcp.org code. Does this sound sensible? Any other approaches
>>> to
>>>> consider, or details that we should discuss here?
>>>>
>>>>
>>>> Design for Upstream
>>>> -------------------
>>>>
>>>> As a starting point for discussion, here are some characteristics that
>>> might
>>>> make MPTCP more upstream-friendly:
>>>>
>>>>  * MPTCP is used when requested by the application, either through an
>>>> IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
>>>> Protocol) capability.
>>>>
>>>>  * Move away from meta-sockets, treating each subflow more like a regular
>>>> TCP connection. The overall MPTCP connection is coordinated by an upper
>>>> layer socket that is distinct from tcp_sock.
>>>>
>>>>  * Move functionality to userspace where possible, like tracking
>>> ADD_ADDRs
>>>> received, initiating new subflows, or accepting new subflows.
>>>>
>>>>  * Avoid adding locks to coordinate access to data that's shared between
>>>> subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics,
>>> and
>>>> RCU to deal with shared data efficiently.
>>>>
>>>>  * Add generic capabilities to the TCP stack where it looks useful to
>>> other
>>>> protocol extensions. Examples: dynamically register handlers for TCP
>>> option
>>>> headers, make it possible to pass TCP options to/from an upper layer.
>>>>
>>>> Any comment on these? Maybe each deserves a thread of its own.
>>>>
>>>>
>>>> Thanks again to Rao, Christoph, Peter, and Ossama for your help, work,
>>> and
>>>> interest. I'm looking forward to your insights.
>>>>
>>>>
>>>> --
>>>> Mat Martineau
>>>> Intel OTC
>>>> _______________________________________________
>>>> mptcp mailing list
>>>> mptcp(a)lists.01.org
>>>> https://lists.01.org/mailman/listinfo/mptcp
>>>
>

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-22  6:55 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-22  6:55 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 15843 bytes --]

Hello Mat,

On 21/08/17 - 17:34:01, Mat Martineau wrote:
> On Sat, 19 Aug 2017, Christoph Paasch wrote:
> > thanks for chiming in. Please see inline:
> > 
> > On 19/08/17 - 13:43:38, Lorenzo Colitti wrote:
> > > Sorry I'm late to this thread. I've been thinking about this for a while,
> > > and wanted to share some of those thoughts in the hope that they are useful.
> 
> Glad that you're here, Lorenzo!
> 
> > > Reading through the 4.4 patch, many of the points made by Mat make sense to
> > > me. Especially, off-by-default is important (upstream cares deeply about
> > > backwards compatibility). In fact, personally I don't see a lot of
> > > advantage to enabling MPTCP on unmodified applications that directly call
> > > the socket API. But perhaps there are use cases there that I don't see.
> > 
> > I agree that we should have off-by-default and opt-in through a
> > socket-option (or similar mechanism).
> > 
> > 
> > Wrt., to using MPTCP on unmodified applications, Daniel Borkmann told me
> > once that a big use-case for MPTCP would be data-center applications where
> > one cannot change the application (e.g., because it's closed-source).
> > If I am not mistaken, at the time he was working on SCTP and they were
> > trying to do some magic with LD_PRELOAD to make these apps seamlessly use
> > SCTP.
> > 
> > But I would say bad luck for these apps then ;-)
> 
> I've also heard anecdotes of data center use cases for unmodified
> applications, but it seems like that won't be the most common usage.
> Applications (user or data-center) can make better use of MPTCP if they're
> written for it, but regular TCP applications could get some of MPTCP's
> benefits using proxies or LD_PRELOAD techniques.
> 
> > > Having as much code in userspace as possible will help.  Moving path
> > > management to userspace seems like it could be an easy way to do that (and
> > > thinking specifically about how Android does networking, the kernel doesn't
> > > really have enough information to decide which subflows should go where,
> > > anyway).
> > > 
> > > For a mobile device, the simplest approach seems to be to just to
> > > have zero or one subflows per interface, with explicit addition and removal
> > > of subflows from userspace, and possibly a setsockopt to add and disable
> > > MP_PRIO.
> > 
> > +1
> 
> For userspace path management and control of adding/removing subflows, do
> you prefer to have per-application control, or does systemwide control
> through a netlink socket work? I have been leaning in the netlink direction
> as a way to keep the application interface simpler, and am hoping to get
> more information to determine if that's the best approach.
> 
> > > What I think that means is that there needs to be a subflow
> > > abstraction that userspace can see. One way to do this might be to
> > > expose the subflows
> > > as individual sockets on which read() and write() return EINVAL, but on
> > > which connect() and setsockopt() can operate as normal - connect() to
> > > establish a subflow, setsockopt() to do things like set subflows to backup,
> > > ask the kernel to send the MP_PRIO option, and so on. Not sure what you'd
> > > do on a server to get the subflows for accepted connections, though. For
> > > blocking calls you could use accept() on the master socket, but really you
> > > want an asynchronous notification when a connection comes in. Perhaps give
> > > up support for urgent data and re-use POLLPRI? There is precedent for
> > > reusing POLLPRI for things that aren't urgent data, but such as solution
> > > might be seen as too hacky by upstream.
> 
> On the server side, is there application-level knowledge that would affect
> the decision to accept, reject, initiate, or close a subflow? Or would it be
> a set policy (say, maximum of X subflows per connection) that could be
> managed without notifying the application?

One use-case for servers to steer subflows would be in peer-to-peer
communications. E.g., think of two smartphones communicating with each other
over WiFi/Cell.

> 
> > > The alternative would be to expose only one filedescriptor and use
> > > subsetsockopts to affect the individual flows, like
> > > draft-hesmans-mptcp-socket-02
> > > does. On Android specifically, routing traffic on a particular network
> > > (e.g., wifi vs. cellular) requires that sk->sk_mark be set to an
> > > appropriate value, so there must be a way to do that for each subflow.
> > 
> > Yes, I think both approaches would be fine. Although, I'm not sure what
> > upstream's opinion is on adding more socket-options. Which is why the first
> > option might probably be better.
> 
> Since we could add SOL_MPTCP we wouldn't be cluttering up TCP's option
> space, so it seems reasonable to propose some new options.

Good point, SOL_MPTCP is the way to go.

> 
> > Wrt. to the asynchronous mechanism for accepting new subflows. What about
> > using the error-queue that is used today for SO_TIMESTAMP? This could wake
> > up the socket and signal that there is a new subflow. Then the app can do an
> > accept() to get the fd.
> 
> MSG_ZEROCOPY is a recent addition that uses the error queue in a similar
> way: https://lwn.net/Articles/726917/
> 
> > I have another question: How do you see the risk of "malicious" apps that
> > then create subflows on cell and just send plenty of data over the cellular
> > interface? Because, if we expose path-management at the user-space level an
> > app can take full control over the behavior of MPTCP.
> 
> I'd see this as a benefit of the netlink-based path manager, which would act
> at the system level.

Personally, I like a netlink-based path manager. It's more flexible and also
simplifies things as now not every application needs to implement
interface-monitoring to decide when to bring up a subflow on WiFi. All this
logic would be in a single daemon that monitors the interfaces and
MPTCP-connections.

On the other side, only the app really knows what kind of SLA it expects
from MPTCP. So we need to pass down this info from the app to the
MPTCP-socket in the kernel so that this one can pass it back up to the
daemon.


Cheers,
Christoph

> > > One general thing I've heard is that having the API flow be similar to the
> > > single-TCP-socket model is an advantage - for example, I hear that one of
> > > the reasons for the low adoption of TFO on Linux is that the API is
> > > different to standard TCP because it uses sendto() instead of connect, and
> > > that makes it hard to use lots of libraries such as openssl that want "a
> > > filedescriptor of a connected socket".
> 
> Agreed - ease of use and familiarity are important. Applications without
> MPTCP already have the ability to open multiple TCP connections and make use
> of different network paths, but then they have extra complexity to manage.
> MPTCP can hide that complexity from the application.
> 
> > > The scheduler seems like it could belong in the kernel, because it has to
> > > react quickly to events such as receiving packets. Perhaps this could use
> > > similar mechanisms to the existing pluggable congestion control algorithms?
> > 
> > Yes, we already have this today where schedulers are implemented as
> > pluggable modules that can be selected either through sysctl or a
> > socket-option (exactly the same as congestion control).
> > 
> > These schedulers however currently don't have many configuration knobs to
> > tune them.
> 
> Application with more specific scheduling needs could give hints to
> schedulers using control messages.
> 
> > > Another option, for advanced clients, would be to have userspace pass in a
> > > scheduling algorithm written in EBPF. One thing that might be helpful is to
> > > have a "send all packets on all subflows" scheduler for cases, though I'm
> > > not sure if that's feasible.
> > 
> > I like that idea of an ebpf-style scheduler.
> 
> The recent "socket tap" patch set on netdev made me start thinking of how
> eBPF might apply to MPTCP. Scheduling would be a good fit!
> 
> 
> Regards,
> Mat
> 
> 
> > 
> > 
> > Cheers,
> > Christoph
> > 
> > > 
> > > Regards,
> > > Lorenzo
> > > 
> > > On Wed, Aug 2, 2017 at 2:09 PM, Christoph Paasch <cpaasch(a)apple.com> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
> > > > networking of Android at Google.
> > > > Lorenzo, you can subscribe to the mailing-list at
> > > > https://lists.01.org/mailman/listinfo/mptcp.
> > > > 
> > > > 
> > > > I discussed MPTCP with him at the IETF two weeks back, and he was
> > > > interested
> > > > in helping making MPTCP upstreamable.
> > > > I let him chime in on the discussion.
> > > > 
> > > > 
> > > > 
> > > > Wrt. to the below, yes I agree with the appraoch Mat outlined.
> > > > On my side, I will be able to spend more cycles now on Linux. I will start
> > > > by porting the code from multipath-tcp.org up to upstream's version (we
> > > > have
> > > > been lagging behind quite a bit again). That way we have a common base
> > > > where
> > > > we can easily see how well the RFC-patches (for example more generic
> > > > capabilities in TCP) would work with MPTCP.
> > > > 
> > > > 
> > > > Cheers,
> > > > Christoph
> > > > 
> > > > 
> > > > 
> > > > On 18/07/17 - 17:31:51, Mat Martineau wrote:
> > > > > 
> > > > > Hello everyone,
> > > > > 
> > > > > Our goal on this mailing list is to add an MPTCP implementation to the
> > > > > upstream Linux kernel. There's a fair amount of work to be done to
> > > > achieve
> > > > > this, and a number of options for how to go about it. Some of this
> > > > revisits
> > > > > previous discussions on this list and elsewhere, but I want to be sure we
> > > > > have some level of consensus about the direction to head in.
> > > > > 
> > > > > A couple of us on this list have had discussions with the Linux net
> > > > > maintainers, and they have some some specific needs concerning
> > > > modifications
> > > > > to the Linux TCP stack:
> > > > > 
> > > > >  * TCP complexity can't increase. It's already a complex,
> > > > > performance-sensitive piece of software that every Linux user depends on.
> > > > > Intrusive changes have a risk of creating bugs or changing operation of
> > > > the
> > > > > stack in unexpected ways.
> > > > > 
> > > > >  * sk_buff structure size can't get bigger. It's already large and, if
> > > > > anything, they hope to reduce it's size. Changes to the data structure
> > > > size
> > > > > are amplified by the large number of instances in a system handling a
> > > > lot of
> > > > > traffic.
> > > > > 
> > > > >  * An additional protocol like MPTCP should be opt-in, so users of
> > > > regular
> > > > > TCP continue to get the same type of connection and performance unless
> > > > MPTCP
> > > > > is requested.
> > > > > 
> > > > > I also recommend reading "On submitting kernel patches"
> > > > > (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
> > > > > process and hurdles involved in merging major core functionality for the
> > > > > Linux kernel.
> > > > > 
> > > > > 
> > > > > Various Strategies
> > > > > ------------------
> > > > > 
> > > > > One approach is to attempt to merge the multipath-tcp.org fork. This is
> > > > an
> > > > > implementation in which the multipath-tcp.org community has invested a
> > > > lot
> > > > > of time and effort, and it is in production for major applications (see
> > > > > https://tools.ietf.org/html/rfc8041). This is a tremendous amount of
> > > > code to
> > > > > review at once (even separating out modules), and currently doesn't fit
> > > > with
> > > > > what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP
> > > > by
> > > > > default). I don't think the maintainers would consider merging such an
> > > > > extensive piece git history, especially where there are a fair number of
> > > > > commits without an "mptcp:" label on the subject line or without a DCO
> > > > > signoff (https://www.kernel.org/doc/html/latest/process/
> > > > submitting-patches.html#sign-your-work-the-developer-s-
> > > > certificate-of-origin).
> > > > > Today, the fork is at kernel v4.4 and current upstream development is at
> > > > > v4.13-rc1, so the fork would have to catch up and stay current.
> > > > > 
> > > > > The other extreme is to rewrite from scratch. This would allow
> > > > incremental
> > > > > development with maintainer review from the start, but doesn't take
> > > > > advantage of existing code.
> > > > > 
> > > > > The most realistic approach is somewhere in between, where we write new
> > > > code
> > > > > that fits maintainer expectations and utilize components from the fork
> > > > where
> > > > > licensing allows and the code fits. We'll have to find the right balance:
> > > > > over-reliance on new code could take extra time, but constantly reworking
> > > > > the fork and keeping it up-to-date with net-next is also a lot of
> > > > overhead.
> > > > > 
> > > > > To start with, we can create RFC patches (code that's ready for comment
> > > > > rather than merge -- not "RFC" in the IETF sense) that allow us to extend
> > > > > TCP in the ways that are useful for both MPTCP and other extended TCP
> > > > > features. The maintainers would be able to review those standalone
> > > > patches,
> > > > > and there's potential to backport the patches to prove them out with the
> > > > > multipath-tcp.org code. Does this sound sensible? Any other approaches
> > > > to
> > > > > consider, or details that we should discuss here?
> > > > > 
> > > > > 
> > > > > Design for Upstream
> > > > > -------------------
> > > > > 
> > > > > As a starting point for discussion, here are some characteristics that
> > > > might
> > > > > make MPTCP more upstream-friendly:
> > > > > 
> > > > >  * MPTCP is used when requested by the application, either through an
> > > > > IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
> > > > > Protocol) capability.
> > > > > 
> > > > >  * Move away from meta-sockets, treating each subflow more like a regular
> > > > > TCP connection. The overall MPTCP connection is coordinated by an upper
> > > > > layer socket that is distinct from tcp_sock.
> > > > > 
> > > > >  * Move functionality to userspace where possible, like tracking
> > > > ADD_ADDRs
> > > > > received, initiating new subflows, or accepting new subflows.
> > > > > 
> > > > >  * Avoid adding locks to coordinate access to data that's shared between
> > > > > subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics,
> > > > and
> > > > > RCU to deal with shared data efficiently.
> > > > > 
> > > > >  * Add generic capabilities to the TCP stack where it looks useful to
> > > > other
> > > > > protocol extensions. Examples: dynamically register handlers for TCP
> > > > option
> > > > > headers, make it possible to pass TCP options to/from an upper layer.
> > > > > 
> > > > > Any comment on these? Maybe each deserves a thread of its own.
> > > > > 
> > > > > 
> > > > > Thanks again to Rao, Christoph, Peter, and Ossama for your help, work,
> > > > and
> > > > > interest. I'm looking forward to your insights.
> > > > > 
> > > > > 
> > > > > --
> > > > > Mat Martineau
> > > > > Intel OTC
> > > > > _______________________________________________
> > > > > mptcp mailing list
> > > > > mptcp(a)lists.01.org
> > > > > https://lists.01.org/mailman/listinfo/mptcp
> > > > 
> > 
> 
> --
> Mat Martineau
> Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-19 23:55 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-19 23:55 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 11830 bytes --]

Hello Lorenzo,


thanks for chiming in. Please see inline:

On 19/08/17 - 13:43:38, Lorenzo Colitti wrote:
> Sorry I'm late to this thread. I've been thinking about this for a while,
> and wanted to share some of those thoughts in the hope that they are useful.
> 
> Reading through the 4.4 patch, many of the points made by Mat make sense to
> me. Especially, off-by-default is important (upstream cares deeply about
> backwards compatibility). In fact, personally I don't see a lot of
> advantage to enabling MPTCP on unmodified applications that directly call
> the socket API. But perhaps there are use cases there that I don't see.

I agree that we should have off-by-default and opt-in through a
socket-option (or similar mechanism).


Wrt., to using MPTCP on unmodified applications, Daniel Borkmann told me
once that a big use-case for MPTCP would be data-center applications where
one cannot change the application (e.g., because it's closed-source).
If I am not mistaken, at the time he was working on SCTP and they were
trying to do some magic with LD_PRELOAD to make these apps seamlessly use
SCTP.

But I would say bad luck for these apps then ;-)

> Having as much code in userspace as possible will help.  Moving path
> management to userspace seems like it could be an easy way to do that (and
> thinking specifically about how Android does networking, the kernel doesn't
> really have enough information to decide which subflows should go where,
> anyway). For a mobile device, the simplest approach seems to be to just to
> have zero or one subflows per interface, with explicit addition and removal
> of subflows from userspace, and possibly a setsockopt to add and disable
> MP_PRIO.

+1

> 
> What I think that means is that there needs to be a subflow abstraction
> that userspace can see. One way to do this might be to expose the subflows
> as individual sockets on which read() and write() return EINVAL, but on
> which connect() and setsockopt() can operate as normal - connect() to
> establish a subflow, setsockopt() to do things like set subflows to backup,
> ask the kernel to send the MP_PRIO option, and so on. Not sure what you'd
> do on a server to get the subflows for accepted connections, though. For
> blocking calls you could use accept() on the master socket, but really you
> want an asynchronous notification when a connection comes in. Perhaps give
> up support for urgent data and re-use POLLPRI? There is precedent for
> reusing POLLPRI for things that aren't urgent data, but such as solution
> might be seen as too hacky by upstream.
> 
> The alternative would be to expose only one filedescriptor and use
> subsetsockopts to affect the individual flows, like
> draft-hesmans-mptcp-socket-02
> does. On Android specifically, routing traffic on a particular network
> (e.g., wifi vs. cellular) requires that sk->sk_mark be set to an
> appropriate value, so there must be a way to do that for each subflow.

Yes, I think both approaches would be fine. Although, I'm not sure what
upstream's opinion is on adding more socket-options. Which is why the first
option might probably be better.

Wrt. to the asynchronous mechanism for accepting new subflows. What about
using the error-queue that is used today for SO_TIMESTAMP? This could wake
up the socket and signal that there is a new subflow. Then the app can do an
accept() to get the fd.


I have another question: How do you see the risk of "malicious" apps that
then create subflows on cell and just send plenty of data over the cellular
interface? Because, if we expose path-management at the user-space level an
app can take full control over the behavior of MPTCP.


> One general thing I've heard is that having the API flow be similar to the
> single-TCP-socket model is an advantage - for example, I hear that one of
> the reasons for the low adoption of TFO on Linux is that the API is
> different to standard TCP because it uses sendto() instead of connect, and
> that makes it hard to use lots of libraries such as openssl that want "a
> filedescriptor of a connected socket".
> 
> The scheduler seems like it could belong in the kernel, because it has to
> react quickly to events such as receiving packets. Perhaps this could use
> similar mechanisms to the existing pluggable congestion control algorithms?

Yes, we already have this today where schedulers are implemented as
pluggable modules that can be selected either through sysctl or a
socket-option (exactly the same as congestion control).

These schedulers however currently don't have many configuration knobs to
tune them.

> Another option, for advanced clients, would be to have userspace pass in a
> scheduling algorithm written in EBPF. One thing that might be helpful is to
> have a "send all packets on all subflows" scheduler for cases, though I'm
> not sure if that's feasible.

I like that idea of an ebpf-style scheduler.


Cheers,
Christoph

> 
> Regards,
> Lorenzo
> 
> On Wed, Aug 2, 2017 at 2:09 PM, Christoph Paasch <cpaasch(a)apple.com> wrote:
> 
> > Hello,
> >
> > I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
> > networking of Android at Google.
> > Lorenzo, you can subscribe to the mailing-list at
> > https://lists.01.org/mailman/listinfo/mptcp.
> >
> >
> > I discussed MPTCP with him at the IETF two weeks back, and he was
> > interested
> > in helping making MPTCP upstreamable.
> > I let him chime in on the discussion.
> >
> >
> >
> > Wrt. to the below, yes I agree with the appraoch Mat outlined.
> > On my side, I will be able to spend more cycles now on Linux. I will start
> > by porting the code from multipath-tcp.org up to upstream's version (we
> > have
> > been lagging behind quite a bit again). That way we have a common base
> > where
> > we can easily see how well the RFC-patches (for example more generic
> > capabilities in TCP) would work with MPTCP.
> >
> >
> > Cheers,
> > Christoph
> >
> >
> >
> > On 18/07/17 - 17:31:51, Mat Martineau wrote:
> > >
> > > Hello everyone,
> > >
> > > Our goal on this mailing list is to add an MPTCP implementation to the
> > > upstream Linux kernel. There's a fair amount of work to be done to
> > achieve
> > > this, and a number of options for how to go about it. Some of this
> > revisits
> > > previous discussions on this list and elsewhere, but I want to be sure we
> > > have some level of consensus about the direction to head in.
> > >
> > > A couple of us on this list have had discussions with the Linux net
> > > maintainers, and they have some some specific needs concerning
> > modifications
> > > to the Linux TCP stack:
> > >
> > >  * TCP complexity can't increase. It's already a complex,
> > > performance-sensitive piece of software that every Linux user depends on.
> > > Intrusive changes have a risk of creating bugs or changing operation of
> > the
> > > stack in unexpected ways.
> > >
> > >  * sk_buff structure size can't get bigger. It's already large and, if
> > > anything, they hope to reduce it's size. Changes to the data structure
> > size
> > > are amplified by the large number of instances in a system handling a
> > lot of
> > > traffic.
> > >
> > >  * An additional protocol like MPTCP should be opt-in, so users of
> > regular
> > > TCP continue to get the same type of connection and performance unless
> > MPTCP
> > > is requested.
> > >
> > > I also recommend reading "On submitting kernel patches"
> > > (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
> > > process and hurdles involved in merging major core functionality for the
> > > Linux kernel.
> > >
> > >
> > > Various Strategies
> > > ------------------
> > >
> > > One approach is to attempt to merge the multipath-tcp.org fork. This is
> > an
> > > implementation in which the multipath-tcp.org community has invested a
> > lot
> > > of time and effort, and it is in production for major applications (see
> > > https://tools.ietf.org/html/rfc8041). This is a tremendous amount of
> > code to
> > > review at once (even separating out modules), and currently doesn't fit
> > with
> > > what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP
> > by
> > > default). I don't think the maintainers would consider merging such an
> > > extensive piece git history, especially where there are a fair number of
> > > commits without an "mptcp:" label on the subject line or without a DCO
> > > signoff (https://www.kernel.org/doc/html/latest/process/
> > submitting-patches.html#sign-your-work-the-developer-s-
> > certificate-of-origin).
> > > Today, the fork is at kernel v4.4 and current upstream development is at
> > > v4.13-rc1, so the fork would have to catch up and stay current.
> > >
> > > The other extreme is to rewrite from scratch. This would allow
> > incremental
> > > development with maintainer review from the start, but doesn't take
> > > advantage of existing code.
> > >
> > > The most realistic approach is somewhere in between, where we write new
> > code
> > > that fits maintainer expectations and utilize components from the fork
> > where
> > > licensing allows and the code fits. We'll have to find the right balance:
> > > over-reliance on new code could take extra time, but constantly reworking
> > > the fork and keeping it up-to-date with net-next is also a lot of
> > overhead.
> > >
> > > To start with, we can create RFC patches (code that's ready for comment
> > > rather than merge -- not "RFC" in the IETF sense) that allow us to extend
> > > TCP in the ways that are useful for both MPTCP and other extended TCP
> > > features. The maintainers would be able to review those standalone
> > patches,
> > > and there's potential to backport the patches to prove them out with the
> > > multipath-tcp.org code. Does this sound sensible? Any other approaches
> > to
> > > consider, or details that we should discuss here?
> > >
> > >
> > > Design for Upstream
> > > -------------------
> > >
> > > As a starting point for discussion, here are some characteristics that
> > might
> > > make MPTCP more upstream-friendly:
> > >
> > >  * MPTCP is used when requested by the application, either through an
> > > IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
> > > Protocol) capability.
> > >
> > >  * Move away from meta-sockets, treating each subflow more like a regular
> > > TCP connection. The overall MPTCP connection is coordinated by an upper
> > > layer socket that is distinct from tcp_sock.
> > >
> > >  * Move functionality to userspace where possible, like tracking
> > ADD_ADDRs
> > > received, initiating new subflows, or accepting new subflows.
> > >
> > >  * Avoid adding locks to coordinate access to data that's shared between
> > > subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics,
> > and
> > > RCU to deal with shared data efficiently.
> > >
> > >  * Add generic capabilities to the TCP stack where it looks useful to
> > other
> > > protocol extensions. Examples: dynamically register handlers for TCP
> > option
> > > headers, make it possible to pass TCP options to/from an upper layer.
> > >
> > > Any comment on these? Maybe each deserves a thread of its own.
> > >
> > >
> > > Thanks again to Rao, Christoph, Peter, and Ossama for your help, work,
> > and
> > > interest. I'm looking forward to your insights.
> > >
> > >
> > > --
> > > Mat Martineau
> > > Intel OTC
> > > _______________________________________________
> > > mptcp mailing list
> > > mptcp(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/mptcp
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-19  4:43 Lorenzo Colitti
  0 siblings, 0 replies; 15+ messages in thread
From: Lorenzo Colitti @ 2017-08-19  4:43 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 9644 bytes --]

All,

Sorry I'm late to this thread. I've been thinking about this for a while,
and wanted to share some of those thoughts in the hope that they are useful.

Reading through the 4.4 patch, many of the points made by Mat make sense to
me. Especially, off-by-default is important (upstream cares deeply about
backwards compatibility). In fact, personally I don't see a lot of
advantage to enabling MPTCP on unmodified applications that directly call
the socket API. But perhaps there are use cases there that I don't see.

Having as much code in userspace as possible will help.  Moving path
management to userspace seems like it could be an easy way to do that (and
thinking specifically about how Android does networking, the kernel doesn't
really have enough information to decide which subflows should go where,
anyway). For a mobile device, the simplest approach seems to be to just to
have zero or one subflows per interface, with explicit addition and removal
of subflows from userspace, and possibly a setsockopt to add and disable
MP_PRIO.

What I think that means is that there needs to be a subflow abstraction
that userspace can see. One way to do this might be to expose the subflows
as individual sockets on which read() and write() return EINVAL, but on
which connect() and setsockopt() can operate as normal - connect() to
establish a subflow, setsockopt() to do things like set subflows to backup,
ask the kernel to send the MP_PRIO option, and so on. Not sure what you'd
do on a server to get the subflows for accepted connections, though. For
blocking calls you could use accept() on the master socket, but really you
want an asynchronous notification when a connection comes in. Perhaps give
up support for urgent data and re-use POLLPRI? There is precedent for
reusing POLLPRI for things that aren't urgent data, but such as solution
might be seen as too hacky by upstream.

The alternative would be to expose only one filedescriptor and use
subsetsockopts to affect the individual flows, like
draft-hesmans-mptcp-socket-02
does. On Android specifically, routing traffic on a particular network
(e.g., wifi vs. cellular) requires that sk->sk_mark be set to an
appropriate value, so there must be a way to do that for each subflow.

One general thing I've heard is that having the API flow be similar to the
single-TCP-socket model is an advantage - for example, I hear that one of
the reasons for the low adoption of TFO on Linux is that the API is
different to standard TCP because it uses sendto() instead of connect, and
that makes it hard to use lots of libraries such as openssl that want "a
filedescriptor of a connected socket".

The scheduler seems like it could belong in the kernel, because it has to
react quickly to events such as receiving packets. Perhaps this could use
similar mechanisms to the existing pluggable congestion control algorithms?
Another option, for advanced clients, would be to have userspace pass in a
scheduling algorithm written in EBPF. One thing that might be helpful is to
have a "send all packets on all subflows" scheduler for cases, though I'm
not sure if that's feasible.

Regards,
Lorenzo

On Wed, Aug 2, 2017 at 2:09 PM, Christoph Paasch <cpaasch(a)apple.com> wrote:

> Hello,
>
> I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
> networking of Android at Google.
> Lorenzo, you can subscribe to the mailing-list at
> https://lists.01.org/mailman/listinfo/mptcp.
>
>
> I discussed MPTCP with him at the IETF two weeks back, and he was
> interested
> in helping making MPTCP upstreamable.
> I let him chime in on the discussion.
>
>
>
> Wrt. to the below, yes I agree with the appraoch Mat outlined.
> On my side, I will be able to spend more cycles now on Linux. I will start
> by porting the code from multipath-tcp.org up to upstream's version (we
> have
> been lagging behind quite a bit again). That way we have a common base
> where
> we can easily see how well the RFC-patches (for example more generic
> capabilities in TCP) would work with MPTCP.
>
>
> Cheers,
> Christoph
>
>
>
> On 18/07/17 - 17:31:51, Mat Martineau wrote:
> >
> > Hello everyone,
> >
> > Our goal on this mailing list is to add an MPTCP implementation to the
> > upstream Linux kernel. There's a fair amount of work to be done to
> achieve
> > this, and a number of options for how to go about it. Some of this
> revisits
> > previous discussions on this list and elsewhere, but I want to be sure we
> > have some level of consensus about the direction to head in.
> >
> > A couple of us on this list have had discussions with the Linux net
> > maintainers, and they have some some specific needs concerning
> modifications
> > to the Linux TCP stack:
> >
> >  * TCP complexity can't increase. It's already a complex,
> > performance-sensitive piece of software that every Linux user depends on.
> > Intrusive changes have a risk of creating bugs or changing operation of
> the
> > stack in unexpected ways.
> >
> >  * sk_buff structure size can't get bigger. It's already large and, if
> > anything, they hope to reduce it's size. Changes to the data structure
> size
> > are amplified by the large number of instances in a system handling a
> lot of
> > traffic.
> >
> >  * An additional protocol like MPTCP should be opt-in, so users of
> regular
> > TCP continue to get the same type of connection and performance unless
> MPTCP
> > is requested.
> >
> > I also recommend reading "On submitting kernel patches"
> > (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
> > process and hurdles involved in merging major core functionality for the
> > Linux kernel.
> >
> >
> > Various Strategies
> > ------------------
> >
> > One approach is to attempt to merge the multipath-tcp.org fork. This is
> an
> > implementation in which the multipath-tcp.org community has invested a
> lot
> > of time and effort, and it is in production for major applications (see
> > https://tools.ietf.org/html/rfc8041). This is a tremendous amount of
> code to
> > review at once (even separating out modules), and currently doesn't fit
> with
> > what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP
> by
> > default). I don't think the maintainers would consider merging such an
> > extensive piece git history, especially where there are a fair number of
> > commits without an "mptcp:" label on the subject line or without a DCO
> > signoff (https://www.kernel.org/doc/html/latest/process/
> submitting-patches.html#sign-your-work-the-developer-s-
> certificate-of-origin).
> > Today, the fork is at kernel v4.4 and current upstream development is at
> > v4.13-rc1, so the fork would have to catch up and stay current.
> >
> > The other extreme is to rewrite from scratch. This would allow
> incremental
> > development with maintainer review from the start, but doesn't take
> > advantage of existing code.
> >
> > The most realistic approach is somewhere in between, where we write new
> code
> > that fits maintainer expectations and utilize components from the fork
> where
> > licensing allows and the code fits. We'll have to find the right balance:
> > over-reliance on new code could take extra time, but constantly reworking
> > the fork and keeping it up-to-date with net-next is also a lot of
> overhead.
> >
> > To start with, we can create RFC patches (code that's ready for comment
> > rather than merge -- not "RFC" in the IETF sense) that allow us to extend
> > TCP in the ways that are useful for both MPTCP and other extended TCP
> > features. The maintainers would be able to review those standalone
> patches,
> > and there's potential to backport the patches to prove them out with the
> > multipath-tcp.org code. Does this sound sensible? Any other approaches
> to
> > consider, or details that we should discuss here?
> >
> >
> > Design for Upstream
> > -------------------
> >
> > As a starting point for discussion, here are some characteristics that
> might
> > make MPTCP more upstream-friendly:
> >
> >  * MPTCP is used when requested by the application, either through an
> > IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
> > Protocol) capability.
> >
> >  * Move away from meta-sockets, treating each subflow more like a regular
> > TCP connection. The overall MPTCP connection is coordinated by an upper
> > layer socket that is distinct from tcp_sock.
> >
> >  * Move functionality to userspace where possible, like tracking
> ADD_ADDRs
> > received, initiating new subflows, or accepting new subflows.
> >
> >  * Avoid adding locks to coordinate access to data that's shared between
> > subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics,
> and
> > RCU to deal with shared data efficiently.
> >
> >  * Add generic capabilities to the TCP stack where it looks useful to
> other
> > protocol extensions. Examples: dynamically register handlers for TCP
> option
> > headers, make it possible to pass TCP options to/from an upper layer.
> >
> > Any comment on these? Maybe each deserves a thread of its own.
> >
> >
> > Thanks again to Rao, Christoph, Peter, and Ossama for your help, work,
> and
> > interest. I'm looking forward to your insights.
> >
> >
> > --
> > Mat Martineau
> > Intel OTC
> > _______________________________________________
> > mptcp mailing list
> > mptcp(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/mptcp
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11578 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-08 22:50 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-08 22:50 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On 08/08/17 - 15:10:48, Mat Martineau wrote:
> 
> On Tue, 1 Aug 2017, Christoph Paasch wrote:
> 
> > On my side, I will be able to spend more cycles now on Linux. I will start
> > by porting the code from multipath-tcp.org up to upstream's version (we have
> > been lagging behind quite a bit again).
> 
> Which version or branch do you think you'll merge up to? Latest stable (or
> stable LTS) release?

I will jump from LTS to LTS. First, 4.9 it will stay there for a bit to
prepare the new stable release but then move up to 4.14 (which will be the next one).

My goal is to track linux-stable more closely - meaning merge as soon as the
rc come out.


Christoph


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-08 22:49 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-08 22:49 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 9221 bytes --]

On 08/08/17 - 15:06:49, Mat Martineau wrote:
> On Mon, 7 Aug 2017, Rao Shoaib wrote:
> 
> > 
> > 
> > On 08/01/2017 04:39 PM, Mat Martineau wrote:
> > > 
> > > On Mon, 31 Jul 2017, Rao Shoaib wrote:
> > > 
> > > > Hi Mat,
> > > > 
> > > > Thanks for writing this up. See my comments in line
> > > > 
> > > > 
> > > > On 07/18/2017 05:31 PM, Mat Martineau wrote:
> > > > > 
> > > > > Hello everyone,
> > > > > 
> > > > > Our goal on this mailing list is to add an MPTCP
> > > > > implementation to the upstream Linux kernel. There's a fair
> > > > > amount of work to be done to achieve this, and a number of
> > > > > options for how to go about it. Some of this revisits
> > > > > previous discussions on this list and elsewhere, but I want
> > > > > to be sure we have some level of consensus about the
> > > > > direction to head in.
> > > > > 
> > > > > A couple of us on this list have had discussions with the
> > > > > Linux net maintainers, and they have some some specific
> > > > > needs concerning modifications to the Linux TCP stack:
> > > > > 
> > > > >  * TCP complexity can't increase. It's already a complex,
> > > > > performance-sensitive piece of software that every Linux
> > > > > user depends on. Intrusive changes have a risk of creating
> > > > > bugs or changing operation of the stack in unexpected ways.
> > > > > 
> > > > >  * sk_buff structure size can't get bigger. It's already
> > > > > large and, if anything, they hope to reduce it's size.
> > > > > Changes to the data structure size are amplified by the
> > > > > large number of instances in a system handling a lot of
> > > > > traffic.
> > > > > 
> > > > >  * An additional protocol like MPTCP should be opt-in, so
> > > > > users of regular TCP continue to get the same type of
> > > > > connection and performance unless MPTCP is requested.
> > > > > 
> > > > > I also recommend reading "On submitting kernel patches"
> > > > > (http://halobates.de/on-submitting-patches.pdf) to get an
> > > > > idea of the process and hurdles involved in merging major
> > > > > core functionality for the Linux kernel.
> > > > > 
> > > > > 
> > > > > Various Strategies
> > > > > ------------------
> > > > > 
> > > > > One approach is to attempt to merge the multipath-tcp.org
> > > > > fork. This is an implementation in which the
> > > > > multipath-tcp.org community has invested a lot of time and
> > > > > effort, and it is in production for major applications (see
> > > > > https://tools.ietf.org/html/rfc8041). This is a tremendous
> > > > > amount of code to review at once (even separating out
> > > > > modules), and currently doesn't fit with what the
> > > > > maintainers have asked for (non-intrusive, sk_buff size,
> > > > > MPTCP by default). I don't think the maintainers would
> > > > > consider merging such an extensive piece git history,
> > > > > especially where there are a fair number of commits without
> > > > > an "mptcp:" label on the subject line or without a DCO
> > > > > signoff (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
> > > > > Today, the fork is at kernel v4.4 and current upstream
> > > > > development is at v4.13-rc1, so the fork would have to catch
> > > > > up and stay current.
> > > > > 
> > > > > The other extreme is to rewrite from scratch. This would
> > > > > allow incremental development with maintainer review from
> > > > > the start, but doesn't take advantage of existing code.
> > > > > 
> > > > > The most realistic approach is somewhere in between, where
> > > > > we write new code that fits maintainer expectations and
> > > > > utilize components from the fork where licensing allows and
> > > > > the code fits. We'll have to find the right balance:
> > > > > over-reliance on new code could take extra time, but
> > > > > constantly reworking the fork and keeping it up-to-date with
> > > > > net-next is also a lot of overhead.
> > > > > 
> > > > > To start with, we can create RFC patches (code that's ready
> > > > > for comment rather than merge -- not "RFC" in the IETF
> > > > > sense) that allow us to extend TCP in the ways that are
> > > > > useful for both MPTCP and other extended TCP features. The
> > > > > maintainers would be able to review those standalone
> > > > > patches, and there's potential to backport the patches to
> > > > > prove them out with the multipath-tcp.org code. Does this
> > > > > sound sensible? Any other approaches to consider, or details
> > > > > that we should discuss here?
> > > > I agree with the above approach but want to expand on the initial goal.
> > > > 
> > > > Our initial goal must be to put a minimal (bare bones) MPTCP
> > > > implementation in main stream Linux. That could mean no fancy
> > > > scheduling schemes, just simple round robin or just
> > > > active/standby. Implementing minimal MPTCP functionality will
> > > > pretty much expose how main TCP code will be impacted. Any
> > > > future work to add features will be confined to changes within
> > > > MPTCP and should not be a concern right now. Such an
> > > > implementation will also be fully RFC compliant.
> > > 
> > > Reaching a bare-bones upstream MPTCP implementation is a very
> > > important milestone to reach, and it will take several steps to get
> > > there. I'm suggesting that initial patches for TCP extensibility are
> > > the first steps toward the bare-bones upstream implementation.
> > I want to be explicit about the initial goal. As the email alluded to
> > lot of things that need to happen much later. I am not sure if we
> > initially even need an elaborate framework for adding tcp options
> > specific to MPTCP.
> 
> If we skip the generic TCP options framework, then the alternative is to
> "keep it simple" and hook in to the necessary places with hardcoded calls to
> mptcp code (with necessary ifdefs)? Just making sure that's what you're
> referring to.
> 
> We had some conversations about getting some generic hooks merged earlier
> for TCP so there was a stable interface for MPTCP while MPTCP was still
> out-of-tree. However, the use of some new extension framework doesn't change
> much about the rest of the MPTCP code, and it would be straightforward to
> splice in a framework later *if* it is requested by the maintainers. In
> other words, I agree that it makes sense to focus our energy on core MPTCP
> functionality.
> 
> 
> > > 
> > > > I agree with you that upstream folks would want an opt-in
> > > > option. I am in favor of a completely different socket family as
> > > > that would leave tcp socket code untouched. However we should
> > > > talk to upstream folks once again.
> > > 
> > > We can still use AF_INET or AF_INET6 without interfering with TCP
> > > socket code. MPTCP is layered on IP, like TCP and UDP, so it makes
> > > some sense to group it there.
> > My bad. I used family, I should have used protocol.  Adding a new socket
> > type is simple enough. We can start with your code.
> > > 
> > > One technique is to define a new IPPROTO_MPTCP value to pass in to
> > > socket()'s third arg. The catch is that most IPPROTO_* definitions
> > > use the IANA-defined IP protocol numbers, and MPTCP packets use
> > > protocol 6 like TCP. IP packets have a single byte for the protocol,
> > > but socket() takes a wider int, so there's some room to use a larger
> > > integer (>255). I have some prototype code that does this.
> > > 
> > > Another option is to leverage the new ULP infrastructure for setting
> > > upper-layer protocols. It looks like Tom Herbert is making some
> > > changes to improve ULP, maybe he will address Christoph's safety
> > > concerns.
> > I think this is the way to go. I read Christoph's email and his
> > concerns. We have handled similar (even more complex) issues in Solaris.
> > In this particular case it could be simple, Let's not allow SO_ULP
> > socket option once a SYN has been sent or received.
> > 
> 
> Just to be sure we're thinking the same thing, using ULP to create a mptcp
> socket would look something like this:
> 
> sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
> setsockopt(sock, SOL_SOCKET, SO_ULP, "mptcp", strlen("mptcp"));

I think, this looks good to me.

An alternative would be to do something like AF_MULTIPATH. But, this makes
it ambiguous as to what address-family is being used for the initial
subflow. Which is why I prefer the "pure" ULP-approach.


Christoph

> 
> I was thinking that the IPPROTO_MPTCP and ULP approaches are mutually
> exclusive - is this what you have in mind too, or were you thinking of using
> ULP in a different way?
> 
> > > 
> > > The best way to engage with the maintainers on this is to post it to
> > > netdev in patch form - for example, an RFC patch with just the uapi
> > > header changes and a thorough explanation.
> > 
> > 
> 
> Thanks,
> 
> --
> Mat Martineau
> Intel OTC
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-08 22:10 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-08-08 22:10 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 383 bytes --]


On Tue, 1 Aug 2017, Christoph Paasch wrote:

> On my side, I will be able to spend more cycles now on Linux. I will start
> by porting the code from multipath-tcp.org up to upstream's version (we have
> been lagging behind quite a bit again).

Which version or branch do you think you'll merge up to? Latest stable 
(or stable LTS) release?

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-08 22:06 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-08-08 22:06 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 8138 bytes --]

On Mon, 7 Aug 2017, Rao Shoaib wrote:

>
>
> On 08/01/2017 04:39 PM, Mat Martineau wrote:
>> 
>> On Mon, 31 Jul 2017, Rao Shoaib wrote:
>> 
>>> Hi Mat,
>>> 
>>> Thanks for writing this up. See my comments in line
>>> 
>>> 
>>> On 07/18/2017 05:31 PM, Mat Martineau wrote:
>>>> 
>>>> Hello everyone,
>>>> 
>>>> Our goal on this mailing list is to add an MPTCP implementation to the 
>>>> upstream Linux kernel. There's a fair amount of work to be done to 
>>>> achieve this, and a number of options for how to go about it. Some of 
>>>> this revisits previous discussions on this list and elsewhere, but I want 
>>>> to be sure we have some level of consensus about the direction to head 
>>>> in.
>>>> 
>>>> A couple of us on this list have had discussions with the Linux net 
>>>> maintainers, and they have some some specific needs concerning 
>>>> modifications to the Linux TCP stack:
>>>>
>>>>  * TCP complexity can't increase. It's already a complex, 
>>>> performance-sensitive piece of software that every Linux user depends on. 
>>>> Intrusive changes have a risk of creating bugs or changing operation of 
>>>> the stack in unexpected ways.
>>>>
>>>>  * sk_buff structure size can't get bigger. It's already large and, if 
>>>> anything, they hope to reduce it's size. Changes to the data structure 
>>>> size are amplified by the large number of instances in a system handling 
>>>> a lot of traffic.
>>>>
>>>>  * An additional protocol like MPTCP should be opt-in, so users of 
>>>> regular TCP continue to get the same type of connection and performance 
>>>> unless MPTCP is requested.
>>>> 
>>>> I also recommend reading "On submitting kernel patches" 
>>>> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the 
>>>> process and hurdles involved in merging major core functionality for the 
>>>> Linux kernel.
>>>> 
>>>> 
>>>> Various Strategies
>>>> ------------------
>>>> 
>>>> One approach is to attempt to merge the multipath-tcp.org fork. This is 
>>>> an implementation in which the multipath-tcp.org community has invested a 
>>>> lot of time and effort, and it is in production for major applications 
>>>> (see https://tools.ietf.org/html/rfc8041). This is a tremendous amount of 
>>>> code to review at once (even separating out modules), and currently 
>>>> doesn't fit with what the maintainers have asked for (non-intrusive, 
>>>> sk_buff size, MPTCP by default). I don't think the maintainers would 
>>>> consider merging such an extensive piece git history, especially where 
>>>> there are a fair number of commits without an "mptcp:" label on the 
>>>> subject line or without a DCO signoff 
>>>> (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin). 
>>>> Today, the fork is at kernel v4.4 and current upstream development is at 
>>>> v4.13-rc1, so the fork would have to catch up and stay current.
>>>> 
>>>> The other extreme is to rewrite from scratch. This would allow 
>>>> incremental development with maintainer review from the start, but 
>>>> doesn't take advantage of existing code.
>>>> 
>>>> The most realistic approach is somewhere in between, where we write new 
>>>> code that fits maintainer expectations and utilize components from the 
>>>> fork where licensing allows and the code fits. We'll have to find the 
>>>> right balance: over-reliance on new code could take extra time, but 
>>>> constantly reworking the fork and keeping it up-to-date with net-next is 
>>>> also a lot of overhead.
>>>> 
>>>> To start with, we can create RFC patches (code that's ready for comment 
>>>> rather than merge -- not "RFC" in the IETF sense) that allow us to extend 
>>>> TCP in the ways that are useful for both MPTCP and other extended TCP 
>>>> features. The maintainers would be able to review those standalone 
>>>> patches, and there's potential to backport the patches to prove them out 
>>>> with the multipath-tcp.org code. Does this sound sensible? Any other 
>>>> approaches to consider, or details that we should discuss here?
>>> I agree with the above approach but want to expand on the initial goal.
>>> 
>>> Our initial goal must be to put a minimal (bare bones) MPTCP 
>>> implementation in main stream Linux. That could mean no fancy scheduling 
>>> schemes, just simple round robin or just active/standby. Implementing 
>>> minimal MPTCP functionality will pretty much expose how main TCP code will 
>>> be impacted. Any future work to add features will be confined to changes 
>>> within MPTCP and should not be a concern right now. Such an implementation 
>>> will also be fully RFC compliant.
>> 
>> Reaching a bare-bones upstream MPTCP implementation is a very important 
>> milestone to reach, and it will take several steps to get there. I'm 
>> suggesting that initial patches for TCP extensibility are the first steps 
>> toward the bare-bones upstream implementation.
> I want to be explicit about the initial goal. As the email alluded to lot of 
> things that need to happen much later. I am not sure if we initially even 
> need an elaborate framework for adding tcp options specific to MPTCP.

If we skip the generic TCP options framework, then the alternative is to 
"keep it simple" and hook in to the necessary places with hardcoded calls 
to mptcp code (with necessary ifdefs)? Just making sure that's what you're 
referring to.

We had some conversations about getting some generic hooks merged earlier 
for TCP so there was a stable interface for MPTCP while MPTCP was still 
out-of-tree. However, the use of some new extension framework doesn't 
change much about the rest of the MPTCP code, and it would be 
straightforward to splice in a framework later *if* it is requested by the 
maintainers. In other words, I agree that it makes sense to focus our 
energy on core MPTCP functionality.


>> 
>>> I agree with you that upstream folks would want an opt-in option. I am in 
>>> favor of a completely different socket family as that would leave tcp 
>>> socket code untouched. However we should talk to upstream folks once 
>>> again.
>> 
>> We can still use AF_INET or AF_INET6 without interfering with TCP socket 
>> code. MPTCP is layered on IP, like TCP and UDP, so it makes some sense to 
>> group it there.
> My bad. I used family, I should have used protocol.  Adding a new socket type 
> is simple enough. We can start with your code.
>> 
>> One technique is to define a new IPPROTO_MPTCP value to pass in to 
>> socket()'s third arg. The catch is that most IPPROTO_* definitions use the 
>> IANA-defined IP protocol numbers, and MPTCP packets use protocol 6 like 
>> TCP. IP packets have a single byte for the protocol, but socket() takes a 
>> wider int, so there's some room to use a larger integer (>255). I have some 
>> prototype code that does this.
>> 
>> Another option is to leverage the new ULP infrastructure for setting 
>> upper-layer protocols. It looks like Tom Herbert is making some changes to 
>> improve ULP, maybe he will address Christoph's safety concerns.
> I think this is the way to go. I read Christoph's email and his concerns. We 
> have handled similar (even more complex) issues in Solaris. In this 
> particular case it could be simple, Let's not allow SO_ULP socket option once 
> a SYN has been sent or received.
>

Just to be sure we're thinking the same thing, using ULP to create a mptcp 
socket would look something like this:

sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
setsockopt(sock, SOL_SOCKET, SO_ULP, "mptcp", strlen("mptcp"));

I was thinking that the IPPROTO_MPTCP and ULP approaches are mutually 
exclusive - is this what you have in mind too, or were you thinking of 
using ULP in a different way?

>> 
>> The best way to engage with the maintainers on this is to post it to netdev 
>> in patch form - for example, an RFC patch with just the uapi header changes 
>> and a thorough explanation.
>
>

Thanks,

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-07 23:56 Rao Shoaib
  0 siblings, 0 replies; 15+ messages in thread
From: Rao Shoaib @ 2017-08-07 23:56 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6778 bytes --]



On 08/01/2017 04:39 PM, Mat Martineau wrote:
>
> On Mon, 31 Jul 2017, Rao Shoaib wrote:
>
>> Hi Mat,
>>
>> Thanks for writing this up. See my comments in line
>>
>>
>> On 07/18/2017 05:31 PM, Mat Martineau wrote:
>>>
>>> Hello everyone,
>>>
>>> Our goal on this mailing list is to add an MPTCP implementation to 
>>> the upstream Linux kernel. There's a fair amount of work to be done 
>>> to achieve this, and a number of options for how to go about it. 
>>> Some of this revisits previous discussions on this list and 
>>> elsewhere, but I want to be sure we have some level of consensus 
>>> about the direction to head in.
>>>
>>> A couple of us on this list have had discussions with the Linux net 
>>> maintainers, and they have some some specific needs concerning 
>>> modifications to the Linux TCP stack:
>>>
>>>  * TCP complexity can't increase. It's already a complex, 
>>> performance-sensitive piece of software that every Linux user 
>>> depends on. Intrusive changes have a risk of creating bugs or 
>>> changing operation of the stack in unexpected ways.
>>>
>>>  * sk_buff structure size can't get bigger. It's already large and, 
>>> if anything, they hope to reduce it's size. Changes to the data 
>>> structure size are amplified by the large number of instances in a 
>>> system handling a lot of traffic.
>>>
>>>  * An additional protocol like MPTCP should be opt-in, so users of 
>>> regular TCP continue to get the same type of connection and 
>>> performance unless MPTCP is requested.
>>>
>>> I also recommend reading "On submitting kernel patches" 
>>> (http://halobates.de/on-submitting-patches.pdf) to get an idea of 
>>> the process and hurdles involved in merging major core functionality 
>>> for the Linux kernel.
>>>
>>>
>>> Various Strategies
>>> ------------------
>>>
>>> One approach is to attempt to merge the multipath-tcp.org fork. This 
>>> is an implementation in which the multipath-tcp.org community has 
>>> invested a lot of time and effort, and it is in production for major 
>>> applications (see https://tools.ietf.org/html/rfc8041). This is a 
>>> tremendous amount of code to review at once (even separating out 
>>> modules), and currently doesn't fit with what the maintainers have 
>>> asked for (non-intrusive, sk_buff size, MPTCP by default). I don't 
>>> think the maintainers would consider merging such an extensive piece 
>>> git history, especially where there are a fair number of commits 
>>> without an "mptcp:" label on the subject line or without a DCO 
>>> signoff 
>>> (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin). 
>>> Today, the fork is at kernel v4.4 and current upstream development 
>>> is at v4.13-rc1, so the fork would have to catch up and stay current.
>>>
>>> The other extreme is to rewrite from scratch. This would allow 
>>> incremental development with maintainer review from the start, but 
>>> doesn't take advantage of existing code.
>>>
>>> The most realistic approach is somewhere in between, where we write 
>>> new code that fits maintainer expectations and utilize components 
>>> from the fork where licensing allows and the code fits. We'll have 
>>> to find the right balance: over-reliance on new code could take 
>>> extra time, but constantly reworking the fork and keeping it 
>>> up-to-date with net-next is also a lot of overhead.
>>>
>>> To start with, we can create RFC patches (code that's ready for 
>>> comment rather than merge -- not "RFC" in the IETF sense) that allow 
>>> us to extend TCP in the ways that are useful for both MPTCP and 
>>> other extended TCP features. The maintainers would be able to review 
>>> those standalone patches, and there's potential to backport the 
>>> patches to prove them out with the multipath-tcp.org code. Does this 
>>> sound sensible? Any other approaches to consider, or details that we 
>>> should discuss here?
>> I agree with the above approach but want to expand on the initial goal.
>>
>> Our initial goal must be to put a minimal (bare bones) MPTCP 
>> implementation in main stream Linux. That could mean no fancy 
>> scheduling schemes, just simple round robin or just active/standby. 
>> Implementing minimal MPTCP functionality will pretty much expose how 
>> main TCP code will be impacted. Any future work to add features will 
>> be confined to changes within MPTCP and should not be a concern right 
>> now. Such an implementation will also be fully RFC compliant.
>
> Reaching a bare-bones upstream MPTCP implementation is a very 
> important milestone to reach, and it will take several steps to get 
> there. I'm suggesting that initial patches for TCP extensibility are 
> the first steps toward the bare-bones upstream implementation.
I want to be explicit about the initial goal. As the email alluded to 
lot of things that need to happen much later. I am not sure if we 
initially even need an elaborate framework for adding tcp options 
specific to MPTCP.
>
>> I agree with you that upstream folks would want an opt-in option. I 
>> am in favor of a completely different socket family as that would 
>> leave tcp socket code untouched. However we should talk to upstream 
>> folks once again.
>
> We can still use AF_INET or AF_INET6 without interfering with TCP 
> socket code. MPTCP is layered on IP, like TCP and UDP, so it makes 
> some sense to group it there.
My bad. I used family, I should have used protocol.  Adding a new socket 
type is simple enough. We can start with your code.
>
> One technique is to define a new IPPROTO_MPTCP value to pass in to 
> socket()'s third arg. The catch is that most IPPROTO_* definitions use 
> the IANA-defined IP protocol numbers, and MPTCP packets use protocol 6 
> like TCP. IP packets have a single byte for the protocol, but socket() 
> takes a wider int, so there's some room to use a larger integer 
> (>255). I have some prototype code that does this.
>
> Another option is to leverage the new ULP infrastructure for setting 
> upper-layer protocols. It looks like Tom Herbert is making some 
> changes to improve ULP, maybe he will address Christoph's safety 
> concerns.
I think this is the way to go. I read Christoph's email and his 
concerns. We have handled similar (even more complex) issues in Solaris. 
In this particular case it could be simple, Let's not allow SO_ULP 
socket option once a SYN has been sent or received.

Shoaib.

>
> The best way to engage with the maintainers on this is to post it to 
> netdev in patch form - for example, an RFC patch with just the uapi 
> header changes and a thorough explanation.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-02 17:04 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-08-02 17:04 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6347 bytes --]


On Tue, 1 Aug 2017, Christoph Paasch wrote:

> Hello,
>
> I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
> networking of Android at Google.
> Lorenzo, you can subscribe to the mailing-list at
> https://lists.01.org/mailman/listinfo/mptcp.
>
>
> I discussed MPTCP with him at the IETF two weeks back, and he was interested
> in helping making MPTCP upstreamable.
> I let him chime in on the discussion.

Welcome, Lorenzo! Great to hear that you're interested in lending your 
expertise.

>
> Wrt. to the below, yes I agree with the appraoch Mat outlined.
> On my side, I will be able to spend more cycles now on Linux. I will start
> by porting the code from multipath-tcp.org up to upstream's version (we have
> been lagging behind quite a bit again). That way we have a common base where
> we can easily see how well the RFC-patches (for example more generic
> capabilities in TCP) would work with MPTCP.
>

Thanks, Christoph. That will be a big help.


Mat


>
>
> On 18/07/17 - 17:31:51, Mat Martineau wrote:
>>
>> Hello everyone,
>>
>> Our goal on this mailing list is to add an MPTCP implementation to the
>> upstream Linux kernel. There's a fair amount of work to be done to achieve
>> this, and a number of options for how to go about it. Some of this revisits
>> previous discussions on this list and elsewhere, but I want to be sure we
>> have some level of consensus about the direction to head in.
>>
>> A couple of us on this list have had discussions with the Linux net
>> maintainers, and they have some some specific needs concerning modifications
>> to the Linux TCP stack:
>>
>>  * TCP complexity can't increase. It's already a complex,
>> performance-sensitive piece of software that every Linux user depends on.
>> Intrusive changes have a risk of creating bugs or changing operation of the
>> stack in unexpected ways.
>>
>>  * sk_buff structure size can't get bigger. It's already large and, if
>> anything, they hope to reduce it's size. Changes to the data structure size
>> are amplified by the large number of instances in a system handling a lot of
>> traffic.
>>
>>  * An additional protocol like MPTCP should be opt-in, so users of regular
>> TCP continue to get the same type of connection and performance unless MPTCP
>> is requested.
>>
>> I also recommend reading "On submitting kernel patches"
>> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
>> process and hurdles involved in merging major core functionality for the
>> Linux kernel.
>>
>>
>> Various Strategies
>> ------------------
>>
>> One approach is to attempt to merge the multipath-tcp.org fork. This is an
>> implementation in which the multipath-tcp.org community has invested a lot
>> of time and effort, and it is in production for major applications (see
>> https://tools.ietf.org/html/rfc8041). This is a tremendous amount of code to
>> review at once (even separating out modules), and currently doesn't fit with
>> what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP by
>> default). I don't think the maintainers would consider merging such an
>> extensive piece git history, especially where there are a fair number of
>> commits without an "mptcp:" label on the subject line or without a DCO
>> signoff (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
>> Today, the fork is at kernel v4.4 and current upstream development is at
>> v4.13-rc1, so the fork would have to catch up and stay current.
>>
>> The other extreme is to rewrite from scratch. This would allow incremental
>> development with maintainer review from the start, but doesn't take
>> advantage of existing code.
>>
>> The most realistic approach is somewhere in between, where we write new code
>> that fits maintainer expectations and utilize components from the fork where
>> licensing allows and the code fits. We'll have to find the right balance:
>> over-reliance on new code could take extra time, but constantly reworking
>> the fork and keeping it up-to-date with net-next is also a lot of overhead.
>>
>> To start with, we can create RFC patches (code that's ready for comment
>> rather than merge -- not "RFC" in the IETF sense) that allow us to extend
>> TCP in the ways that are useful for both MPTCP and other extended TCP
>> features. The maintainers would be able to review those standalone patches,
>> and there's potential to backport the patches to prove them out with the
>> multipath-tcp.org code. Does this sound sensible? Any other approaches to
>> consider, or details that we should discuss here?
>>
>>
>> Design for Upstream
>> -------------------
>>
>> As a starting point for discussion, here are some characteristics that might
>> make MPTCP more upstream-friendly:
>>
>>  * MPTCP is used when requested by the application, either through an
>> IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
>> Protocol) capability.
>>
>>  * Move away from meta-sockets, treating each subflow more like a regular
>> TCP connection. The overall MPTCP connection is coordinated by an upper
>> layer socket that is distinct from tcp_sock.
>>
>>  * Move functionality to userspace where possible, like tracking ADD_ADDRs
>> received, initiating new subflows, or accepting new subflows.
>>
>>  * Avoid adding locks to coordinate access to data that's shared between
>> subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics, and
>> RCU to deal with shared data efficiently.
>>
>>  * Add generic capabilities to the TCP stack where it looks useful to other
>> protocol extensions. Examples: dynamically register handlers for TCP option
>> headers, make it possible to pass TCP options to/from an upper layer.
>>
>> Any comment on these? Maybe each deserves a thread of its own.
>>
>>
>> Thanks again to Rao, Christoph, Peter, and Ossama for your help, work, and
>> interest. I'm looking forward to your insights.
>>
>>
>> --
>> Mat Martineau
>> Intel OTC
>> _______________________________________________
>> mptcp mailing list
>> mptcp(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/mptcp
>

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-02  5:16 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-02  5:16 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 9781 bytes --]

(+Lorenzo)

On 01/08/17 - 16:39:53, Mat Martineau wrote:
> 
> On Mon, 31 Jul 2017, Rao Shoaib wrote:
> 
> > Hi Mat,
> > 
> > Thanks for writing this up. See my comments in line
> > 
> > 
> > On 07/18/2017 05:31 PM, Mat Martineau wrote:
> > > 
> > > Hello everyone,
> > > 
> > > Our goal on this mailing list is to add an MPTCP implementation to
> > > the upstream Linux kernel. There's a fair amount of work to be done
> > > to achieve this, and a number of options for how to go about it.
> > > Some of this revisits previous discussions on this list and
> > > elsewhere, but I want to be sure we have some level of consensus
> > > about the direction to head in.
> > > 
> > > A couple of us on this list have had discussions with the Linux net
> > > maintainers, and they have some some specific needs concerning
> > > modifications to the Linux TCP stack:
> > > 
> > >  * TCP complexity can't increase. It's already a complex,
> > > performance-sensitive piece of software that every Linux user
> > > depends on. Intrusive changes have a risk of creating bugs or
> > > changing operation of the stack in unexpected ways.
> > > 
> > >  * sk_buff structure size can't get bigger. It's already large and,
> > > if anything, they hope to reduce it's size. Changes to the data
> > > structure size are amplified by the large number of instances in a
> > > system handling a lot of traffic.
> > > 
> > >  * An additional protocol like MPTCP should be opt-in, so users of
> > > regular TCP continue to get the same type of connection and
> > > performance unless MPTCP is requested.
> > > 
> > > I also recommend reading "On submitting kernel patches"
> > > (http://halobates.de/on-submitting-patches.pdf) to get an idea of
> > > the process and hurdles involved in merging major core functionality
> > > for the Linux kernel.
> > > 
> > > 
> > > Various Strategies
> > > ------------------
> > > 
> > > One approach is to attempt to merge the multipath-tcp.org fork. This
> > > is an implementation in which the multipath-tcp.org community has
> > > invested a lot of time and effort, and it is in production for major
> > > applications (see https://tools.ietf.org/html/rfc8041). This is a
> > > tremendous amount of code to review at once (even separating out
> > > modules), and currently doesn't fit with what the maintainers have
> > > asked for (non-intrusive, sk_buff size, MPTCP by default). I don't
> > > think the maintainers would consider merging such an extensive piece
> > > git history, especially where there are a fair number of commits
> > > without an "mptcp:" label on the subject line or without a DCO
> > > signoff (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
> > > Today, the fork is at kernel v4.4 and current upstream development
> > > is at v4.13-rc1, so the fork would have to catch up and stay
> > > current.
> > > 
> > > The other extreme is to rewrite from scratch. This would allow
> > > incremental development with maintainer review from the start, but
> > > doesn't take advantage of existing code.
> > > 
> > > The most realistic approach is somewhere in between, where we write
> > > new code that fits maintainer expectations and utilize components
> > > from the fork where licensing allows and the code fits. We'll have
> > > to find the right balance: over-reliance on new code could take
> > > extra time, but constantly reworking the fork and keeping it
> > > up-to-date with net-next is also a lot of overhead.
> > > 
> > > To start with, we can create RFC patches (code that's ready for
> > > comment rather than merge -- not "RFC" in the IETF sense) that allow
> > > us to extend TCP in the ways that are useful for both MPTCP and
> > > other extended TCP features. The maintainers would be able to review
> > > those standalone patches, and there's potential to backport the
> > > patches to prove them out with the multipath-tcp.org code. Does this
> > > sound sensible? Any other approaches to consider, or details that we
> > > should discuss here?
> > I agree with the above approach but want to expand on the initial goal.
> > 
> > Our initial goal must be to put a minimal (bare bones) MPTCP
> > implementation in main stream Linux. That could mean no fancy scheduling
> > schemes, just simple round robin or just active/standby. Implementing
> > minimal MPTCP functionality will pretty much expose how main TCP code
> > will be impacted. Any future work to add features will be confined to
> > changes within MPTCP and should not be a concern right now. Such an
> > implementation will also be fully RFC compliant.
> 
> Reaching a bare-bones upstream MPTCP implementation is a very important
> milestone to reach, and it will take several steps to get there. I'm
> suggesting that initial patches for TCP extensibility are the first steps
> toward the bare-bones upstream implementation.

+1 on the minimal implementation

Things that can be "removed" are IMO:
* Path-management outsourced to user-space via an API
* All the scheduling algorithms, with a basic API exposed ot user-space
* Congestion-controls
* Address management (ADD_ADDR, REMOVE_ADDR)
* subflow-priorities
* We can consider shipping without TSO-support, splice-support,... (if it
  makes the code-base simpler)


Christoph

> 
> > I agree with you that upstream folks would want an opt-in option. I am
> > in favor of a completely different socket family as that would leave tcp
> > socket code untouched. However we should talk to upstream folks once
> > again.
> 
> We can still use AF_INET or AF_INET6 without interfering with TCP socket
> code. MPTCP is layered on IP, like TCP and UDP, so it makes some sense to
> group it there.
> 
> One technique is to define a new IPPROTO_MPTCP value to pass in to
> socket()'s third arg. The catch is that most IPPROTO_* definitions use the
> IANA-defined IP protocol numbers, and MPTCP packets use protocol 6 like TCP.
> IP packets have a single byte for the protocol, but socket() takes a wider
> int, so there's some room to use a larger integer (>255). I have some
> prototype code that does this.
> 
> Another option is to leverage the new ULP infrastructure for setting
> upper-layer protocols. It looks like Tom Herbert is making some changes to
> improve ULP, maybe he will address Christoph's safety concerns.
> 
> The best way to engage with the maintainers on this is to post it to netdev
> in patch form - for example, an RFC patch with just the uapi header changes
> and a thorough explanation.
> 
> > 
> > I would like us to use as much code as possible from the current
> > implementation.
> 
> Where it saves effort and testing, we should definitely consider code reuse.
> 
> > In fact if for some reason (I don't see one) we can not ship a minimal
> > implementation than I prefer that we just port the current
> > implementation and worry about architectural issues later.
> 
> I also don't see a reason a minimal implementation won't work.
> 
> > In short, I would like our focus to be on putting a minimal MPTCP
> > implementation in Linux so that we have a stake in the ground.
> > Performance and Features can come later.
> 
> Agreed.
> 
> > That does not mean that the quality of our implementation is so bad that
> > it is unusable.
> 
> I suggest we aim for high quality :)
> 
> My team has some prototype code that I'm getting in shape to post on this
> list. I'll separate out the user API so we can settle on a proposal for
> netdev.
> 
> 
> Mat
> 
> 
> > 
> > Shoaib
> > 
> > 
> > > 
> > > Design for Upstream
> > > -------------------
> > > 
> > > As a starting point for discussion, here are some characteristics
> > > that might make MPTCP more upstream-friendly:
> > > 
> > >  * MPTCP is used when requested by the application, either through
> > > an IPPROTO_MPTCP parameter to socket() or by using the new ULP
> > > (Upper Layer Protocol) capability.
> > > 
> > >  * Move away from meta-sockets, treating each subflow more like a
> > > regular TCP connection. The overall MPTCP connection is coordinated
> > > by an upper layer socket that is distinct from tcp_sock.
> > > 
> > >  * Move functionality to userspace where possible, like tracking
> > > ADD_ADDRs received, initiating new subflows, or accepting new
> > > subflows.
> > > 
> > >  * Avoid adding locks to coordinate access to data that's shared
> > > between subflows. Utilize capabilities like compare-and-swap
> > > (cmpxchg), atomics, and RCU to deal with shared data efficiently.
> > > 
> > >  * Add generic capabilities to the TCP stack where it looks useful
> > > to other protocol extensions. Examples: dynamically register
> > > handlers for TCP option headers, make it possible to pass TCP
> > > options to/from an upper layer.
> > > 
> > > Any comment on these? Maybe each deserves a thread of its own.
> > > 
> > > 
> > > Thanks again to Rao, Christoph, Peter, and Ossama for your help,
> > > work, and interest. I'm looking forward to your insights.
> > > 
> > > 
> > > -- 
> > > Mat Martineau
> > > Intel OTC
> > > _______________________________________________
> > > mptcp mailing list
> > > mptcp(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/mptcp
> > 
> > _______________________________________________
> > mptcp mailing list
> > mptcp(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/mptcp
> > 
> 
> --
> Mat Martineau
> Intel OTC
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-02  5:09 Christoph Paasch
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Paasch @ 2017-08-02  5:09 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6024 bytes --]

Hello,

I'm adding Lorenzo (in CC) from Google to this thread. Lorenzo works on the
networking of Android at Google.
Lorenzo, you can subscribe to the mailing-list at
https://lists.01.org/mailman/listinfo/mptcp.


I discussed MPTCP with him at the IETF two weeks back, and he was interested
in helping making MPTCP upstreamable.
I let him chime in on the discussion.



Wrt. to the below, yes I agree with the appraoch Mat outlined.
On my side, I will be able to spend more cycles now on Linux. I will start
by porting the code from multipath-tcp.org up to upstream's version (we have
been lagging behind quite a bit again). That way we have a common base where
we can easily see how well the RFC-patches (for example more generic
capabilities in TCP) would work with MPTCP.


Cheers,
Christoph



On 18/07/17 - 17:31:51, Mat Martineau wrote:
> 
> Hello everyone,
> 
> Our goal on this mailing list is to add an MPTCP implementation to the
> upstream Linux kernel. There's a fair amount of work to be done to achieve
> this, and a number of options for how to go about it. Some of this revisits
> previous discussions on this list and elsewhere, but I want to be sure we
> have some level of consensus about the direction to head in.
> 
> A couple of us on this list have had discussions with the Linux net
> maintainers, and they have some some specific needs concerning modifications
> to the Linux TCP stack:
> 
>  * TCP complexity can't increase. It's already a complex,
> performance-sensitive piece of software that every Linux user depends on.
> Intrusive changes have a risk of creating bugs or changing operation of the
> stack in unexpected ways.
> 
>  * sk_buff structure size can't get bigger. It's already large and, if
> anything, they hope to reduce it's size. Changes to the data structure size
> are amplified by the large number of instances in a system handling a lot of
> traffic.
> 
>  * An additional protocol like MPTCP should be opt-in, so users of regular
> TCP continue to get the same type of connection and performance unless MPTCP
> is requested.
> 
> I also recommend reading "On submitting kernel patches"
> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the
> process and hurdles involved in merging major core functionality for the
> Linux kernel.
> 
> 
> Various Strategies
> ------------------
> 
> One approach is to attempt to merge the multipath-tcp.org fork. This is an
> implementation in which the multipath-tcp.org community has invested a lot
> of time and effort, and it is in production for major applications (see
> https://tools.ietf.org/html/rfc8041). This is a tremendous amount of code to
> review at once (even separating out modules), and currently doesn't fit with
> what the maintainers have asked for (non-intrusive, sk_buff size, MPTCP by
> default). I don't think the maintainers would consider merging such an
> extensive piece git history, especially where there are a fair number of
> commits without an "mptcp:" label on the subject line or without a DCO
> signoff (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
> Today, the fork is at kernel v4.4 and current upstream development is at
> v4.13-rc1, so the fork would have to catch up and stay current.
> 
> The other extreme is to rewrite from scratch. This would allow incremental
> development with maintainer review from the start, but doesn't take
> advantage of existing code.
> 
> The most realistic approach is somewhere in between, where we write new code
> that fits maintainer expectations and utilize components from the fork where
> licensing allows and the code fits. We'll have to find the right balance:
> over-reliance on new code could take extra time, but constantly reworking
> the fork and keeping it up-to-date with net-next is also a lot of overhead.
> 
> To start with, we can create RFC patches (code that's ready for comment
> rather than merge -- not "RFC" in the IETF sense) that allow us to extend
> TCP in the ways that are useful for both MPTCP and other extended TCP
> features. The maintainers would be able to review those standalone patches,
> and there's potential to backport the patches to prove them out with the
> multipath-tcp.org code. Does this sound sensible? Any other approaches to
> consider, or details that we should discuss here?
> 
> 
> Design for Upstream
> -------------------
> 
> As a starting point for discussion, here are some characteristics that might
> make MPTCP more upstream-friendly:
> 
>  * MPTCP is used when requested by the application, either through an
> IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
> Protocol) capability.
> 
>  * Move away from meta-sockets, treating each subflow more like a regular
> TCP connection. The overall MPTCP connection is coordinated by an upper
> layer socket that is distinct from tcp_sock.
> 
>  * Move functionality to userspace where possible, like tracking ADD_ADDRs
> received, initiating new subflows, or accepting new subflows.
> 
>  * Avoid adding locks to coordinate access to data that's shared between
> subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics, and
> RCU to deal with shared data efficiently.
> 
>  * Add generic capabilities to the TCP stack where it looks useful to other
> protocol extensions. Examples: dynamically register handlers for TCP option
> headers, make it possible to pass TCP options to/from an upper layer.
> 
> Any comment on these? Maybe each deserves a thread of its own.
> 
> 
> Thanks again to Rao, Christoph, Peter, and Ossama for your help, work, and
> interest. I'm looking forward to your insights.
> 
> 
> --
> Mat Martineau
> Intel OTC
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-08-01 23:39 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-08-01 23:39 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 8676 bytes --]


On Mon, 31 Jul 2017, Rao Shoaib wrote:

> Hi Mat,
>
> Thanks for writing this up. See my comments in line
>
>
> On 07/18/2017 05:31 PM, Mat Martineau wrote:
>> 
>> Hello everyone,
>> 
>> Our goal on this mailing list is to add an MPTCP implementation to the 
>> upstream Linux kernel. There's a fair amount of work to be done to achieve 
>> this, and a number of options for how to go about it. Some of this revisits 
>> previous discussions on this list and elsewhere, but I want to be sure we 
>> have some level of consensus about the direction to head in.
>> 
>> A couple of us on this list have had discussions with the Linux net 
>> maintainers, and they have some some specific needs concerning 
>> modifications to the Linux TCP stack:
>>
>>  * TCP complexity can't increase. It's already a complex, 
>> performance-sensitive piece of software that every Linux user depends on. 
>> Intrusive changes have a risk of creating bugs or changing operation of the 
>> stack in unexpected ways.
>>
>>  * sk_buff structure size can't get bigger. It's already large and, if 
>> anything, they hope to reduce it's size. Changes to the data structure size 
>> are amplified by the large number of instances in a system handling a lot 
>> of traffic.
>>
>>  * An additional protocol like MPTCP should be opt-in, so users of regular 
>> TCP continue to get the same type of connection and performance unless 
>> MPTCP is requested.
>> 
>> I also recommend reading "On submitting kernel patches" 
>> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the 
>> process and hurdles involved in merging major core functionality for the 
>> Linux kernel.
>> 
>> 
>> Various Strategies
>> ------------------
>> 
>> One approach is to attempt to merge the multipath-tcp.org fork. This is an 
>> implementation in which the multipath-tcp.org community has invested a lot 
>> of time and effort, and it is in production for major applications (see 
>> https://tools.ietf.org/html/rfc8041). This is a tremendous amount of code 
>> to review at once (even separating out modules), and currently doesn't fit 
>> with what the maintainers have asked for (non-intrusive, sk_buff size, 
>> MPTCP by default). I don't think the maintainers would consider merging 
>> such an extensive piece git history, especially where there are a fair 
>> number of commits without an "mptcp:" label on the subject line or without 
>> a DCO signoff 
>> (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin). 
>> Today, the fork is at kernel v4.4 and current upstream development is at 
>> v4.13-rc1, so the fork would have to catch up and stay current.
>> 
>> The other extreme is to rewrite from scratch. This would allow incremental 
>> development with maintainer review from the start, but doesn't take 
>> advantage of existing code.
>> 
>> The most realistic approach is somewhere in between, where we write new 
>> code that fits maintainer expectations and utilize components from the fork 
>> where licensing allows and the code fits. We'll have to find the right 
>> balance: over-reliance on new code could take extra time, but constantly 
>> reworking the fork and keeping it up-to-date with net-next is also a lot of 
>> overhead.
>> 
>> To start with, we can create RFC patches (code that's ready for comment 
>> rather than merge -- not "RFC" in the IETF sense) that allow us to extend 
>> TCP in the ways that are useful for both MPTCP and other extended TCP 
>> features. The maintainers would be able to review those standalone patches, 
>> and there's potential to backport the patches to prove them out with the 
>> multipath-tcp.org code. Does this sound sensible? Any other approaches to 
>> consider, or details that we should discuss here?
> I agree with the above approach but want to expand on the initial goal.
>
> Our initial goal must be to put a minimal (bare bones) MPTCP implementation 
> in main stream Linux. That could mean no fancy scheduling schemes, just 
> simple round robin or just active/standby. Implementing minimal MPTCP 
> functionality will pretty much expose how main TCP code will be impacted. Any 
> future work to add features will be confined to changes within MPTCP and 
> should not be a concern right now. Such an implementation will also be fully 
> RFC compliant.

Reaching a bare-bones upstream MPTCP implementation is a very important 
milestone to reach, and it will take several steps to get there. I'm 
suggesting that initial patches for TCP extensibility are the first steps 
toward the bare-bones upstream implementation.

> I agree with you that upstream folks would want an opt-in option. I am in 
> favor of a completely different socket family as that would leave tcp socket 
> code untouched. However we should talk to upstream folks once again.

We can still use AF_INET or AF_INET6 without interfering with TCP socket 
code. MPTCP is layered on IP, like TCP and UDP, so it makes some sense to 
group it there.

One technique is to define a new IPPROTO_MPTCP value to pass in to 
socket()'s third arg. The catch is that most IPPROTO_* definitions use the 
IANA-defined IP protocol numbers, and MPTCP packets use protocol 6 like 
TCP. IP packets have a single byte for the protocol, but socket() takes a 
wider int, so there's some room to use a larger integer (>255). I have 
some prototype code that does this.

Another option is to leverage the new ULP infrastructure for setting 
upper-layer protocols. It looks like Tom Herbert is making some changes to 
improve ULP, maybe he will address Christoph's safety concerns.

The best way to engage with the maintainers on this is to post it to 
netdev in patch form - for example, an RFC patch with just the uapi header 
changes and a thorough explanation.

>
> I would like us to use as much code as possible from the current 
> implementation.

Where it saves effort and testing, we should definitely consider code 
reuse.

> In fact if for some reason (I don't see one) we can not ship 
> a minimal implementation than I prefer that we just port the current 
> implementation and worry about architectural issues later.

I also don't see a reason a minimal implementation won't work.

> In short, I would like our focus to be on putting a minimal MPTCP 
> implementation in Linux so that we have a stake in the ground. 
> Performance and Features can come later.

Agreed.

> That does not mean that the quality of our implementation is so bad that 
> it is unusable.

I suggest we aim for high quality :)

My team has some prototype code that I'm getting in shape to post on this 
list. I'll separate out the user API so we can settle on a proposal for 
netdev.


Mat


>
> Shoaib
>
>
>> 
>> Design for Upstream
>> -------------------
>> 
>> As a starting point for discussion, here are some characteristics that 
>> might make MPTCP more upstream-friendly:
>>
>>  * MPTCP is used when requested by the application, either through an 
>> IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer 
>> Protocol) capability.
>>
>>  * Move away from meta-sockets, treating each subflow more like a regular 
>> TCP connection. The overall MPTCP connection is coordinated by an upper 
>> layer socket that is distinct from tcp_sock.
>>
>>  * Move functionality to userspace where possible, like tracking ADD_ADDRs 
>> received, initiating new subflows, or accepting new subflows.
>>
>>  * Avoid adding locks to coordinate access to data that's shared between 
>> subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics, 
>> and RCU to deal with shared data efficiently.
>>
>>  * Add generic capabilities to the TCP stack where it looks useful to other 
>> protocol extensions. Examples: dynamically register handlers for TCP option 
>> headers, make it possible to pass TCP options to/from an upper layer.
>> 
>> Any comment on these? Maybe each deserves a thread of its own.
>> 
>> 
>> Thanks again to Rao, Christoph, Peter, and Ossama for your help, work, and 
>> interest. I'm looking forward to your insights.
>> 
>> 
>> -- 
>> Mat Martineau
>> Intel OTC
>> _______________________________________________
>> mptcp mailing list
>> mptcp(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/mptcp
>
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp
>

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MPTCP] MPTCP upstreaming strategy and design
@ 2017-07-31 23:03 Rao Shoaib
  0 siblings, 0 replies; 15+ messages in thread
From: Rao Shoaib @ 2017-07-31 23:03 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6649 bytes --]

Hi Mat,

Thanks for writing this up. See my comments in line


On 07/18/2017 05:31 PM, Mat Martineau wrote:
>
> Hello everyone,
>
> Our goal on this mailing list is to add an MPTCP implementation to the 
> upstream Linux kernel. There's a fair amount of work to be done to 
> achieve this, and a number of options for how to go about it. Some of 
> this revisits previous discussions on this list and elsewhere, but I 
> want to be sure we have some level of consensus about the direction to 
> head in.
>
> A couple of us on this list have had discussions with the Linux net 
> maintainers, and they have some some specific needs concerning 
> modifications to the Linux TCP stack:
>
>  * TCP complexity can't increase. It's already a complex, 
> performance-sensitive piece of software that every Linux user depends 
> on. Intrusive changes have a risk of creating bugs or changing 
> operation of the stack in unexpected ways.
>
>  * sk_buff structure size can't get bigger. It's already large and, if 
> anything, they hope to reduce it's size. Changes to the data structure 
> size are amplified by the large number of instances in a system 
> handling a lot of traffic.
>
>  * An additional protocol like MPTCP should be opt-in, so users of 
> regular TCP continue to get the same type of connection and 
> performance unless MPTCP is requested.
>
> I also recommend reading "On submitting kernel patches" 
> (http://halobates.de/on-submitting-patches.pdf) to get an idea of the 
> process and hurdles involved in merging major core functionality for 
> the Linux kernel.
>
>
> Various Strategies
> ------------------
>
> One approach is to attempt to merge the multipath-tcp.org fork. This 
> is an implementation in which the multipath-tcp.org community has 
> invested a lot of time and effort, and it is in production for major 
> applications (see https://tools.ietf.org/html/rfc8041). This is a 
> tremendous amount of code to review at once (even separating out 
> modules), and currently doesn't fit with what the maintainers have 
> asked for (non-intrusive, sk_buff size, MPTCP by default). I don't 
> think the maintainers would consider merging such an extensive piece 
> git history, especially where there are a fair number of commits 
> without an "mptcp:" label on the subject line or without a DCO signoff 
> (https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin). 
> Today, the fork is at kernel v4.4 and current upstream development is 
> at v4.13-rc1, so the fork would have to catch up and stay current.
>
> The other extreme is to rewrite from scratch. This would allow 
> incremental development with maintainer review from the start, but 
> doesn't take advantage of existing code.
>
> The most realistic approach is somewhere in between, where we write 
> new code that fits maintainer expectations and utilize components from 
> the fork where licensing allows and the code fits. We'll have to find 
> the right balance: over-reliance on new code could take extra time, 
> but constantly reworking the fork and keeping it up-to-date with 
> net-next is also a lot of overhead.
>
> To start with, we can create RFC patches (code that's ready for 
> comment rather than merge -- not "RFC" in the IETF sense) that allow 
> us to extend TCP in the ways that are useful for both MPTCP and other 
> extended TCP features. The maintainers would be able to review those 
> standalone patches, and there's potential to backport the patches to 
> prove them out with the multipath-tcp.org code. Does this sound 
> sensible? Any other approaches to consider, or details that we should 
> discuss here?
I agree with the above approach but want to expand on the initial goal.

Our initial goal must be to put a minimal (bare bones) MPTCP 
implementation in main stream Linux. That could mean no fancy scheduling 
schemes, just simple round robin or just active/standby. Implementing 
minimal MPTCP functionality will pretty much expose how main TCP code 
will be impacted. Any future work to add features will be confined to 
changes within MPTCP and should not be a concern right now. Such an 
implementation will also be fully RFC compliant.

I agree with you that upstream folks would want an opt-in option. I am 
in favor of a completely different socket family as that would leave tcp 
socket code untouched. However we should talk to upstream folks once again.

I would like us to use as much code as possible from the current 
implementation. In fact if for some reason (I don't see one) we can not 
ship a minimal implementation than I prefer that we just port the 
current implementation and worry about architectural issues later.

In short, I would like our focus to be on putting a minimal MPTCP 
implementation in Linux so that we have a stake in the ground. 
Performance and Features can come later. That does not mean that the 
quality of our implementation is so bad that it is unusable.

Shoaib


>
> Design for Upstream
> -------------------
>
> As a starting point for discussion, here are some characteristics that 
> might make MPTCP more upstream-friendly:
>
>  * MPTCP is used when requested by the application, either through an 
> IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper 
> Layer Protocol) capability.
>
>  * Move away from meta-sockets, treating each subflow more like a 
> regular TCP connection. The overall MPTCP connection is coordinated by 
> an upper layer socket that is distinct from tcp_sock.
>
>  * Move functionality to userspace where possible, like tracking 
> ADD_ADDRs received, initiating new subflows, or accepting new subflows.
>
>  * Avoid adding locks to coordinate access to data that's shared 
> between subflows. Utilize capabilities like compare-and-swap 
> (cmpxchg), atomics, and RCU to deal with shared data efficiently.
>
>  * Add generic capabilities to the TCP stack where it looks useful to 
> other protocol extensions. Examples: dynamically register handlers for 
> TCP option headers, make it possible to pass TCP options to/from an 
> upper layer.
>
> Any comment on these? Maybe each deserves a thread of its own.
>
>
> Thanks again to Rao, Christoph, Peter, and Ossama for your help, work, 
> and interest. I'm looking forward to your insights.
>
>
> -- 
> Mat Martineau
> Intel OTC
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [MPTCP] MPTCP upstreaming strategy and design
@ 2017-07-19  0:31 Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2017-07-19  0:31 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4868 bytes --]


Hello everyone,

Our goal on this mailing list is to add an MPTCP implementation to the 
upstream Linux kernel. There's a fair amount of work to be done to achieve 
this, and a number of options for how to go about it. Some of this 
revisits previous discussions on this list and elsewhere, but I want to be 
sure we have some level of consensus about the direction to head in.

A couple of us on this list have had discussions with the Linux net 
maintainers, and they have some some specific needs concerning 
modifications to the Linux TCP stack:

  * TCP complexity can't increase. It's already a complex, 
performance-sensitive piece of software that every Linux user depends on. 
Intrusive changes have a risk of creating bugs or changing operation of 
the stack in unexpected ways.

  * sk_buff structure size can't get bigger. It's already large and, if 
anything, they hope to reduce it's size. Changes to the data structure 
size are amplified by the large number of instances in a system handling a 
lot of traffic.

  * An additional protocol like MPTCP should be opt-in, so users of regular 
TCP continue to get the same type of connection and performance unless 
MPTCP is requested.

I also recommend reading "On submitting kernel patches" 
(http://halobates.de/on-submitting-patches.pdf) to get an idea of the 
process and hurdles involved in merging major core functionality for the 
Linux kernel.


Various Strategies
------------------

One approach is to attempt to merge the multipath-tcp.org fork. This is an 
implementation in which the multipath-tcp.org community has invested a lot 
of time and effort, and it is in production for major applications (see 
https://tools.ietf.org/html/rfc8041). This is a tremendous amount of code 
to review at once (even separating out modules), and currently doesn't fit 
with what the maintainers have asked for (non-intrusive, sk_buff size, 
MPTCP by default). I don't think the maintainers would consider merging 
such an extensive piece git history, especially where there are a fair 
number of commits without an "mptcp:" label on the subject line or without 
a DCO signoff 
(https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin). 
Today, the fork is at kernel v4.4 and current upstream development is at 
v4.13-rc1, so the fork would have to catch up and stay current.

The other extreme is to rewrite from scratch. This would allow incremental 
development with maintainer review from the start, but doesn't take 
advantage of existing code.

The most realistic approach is somewhere in between, where we write new 
code that fits maintainer expectations and utilize components from the 
fork where licensing allows and the code fits. We'll have to find the 
right balance: over-reliance on new code could take extra time, but 
constantly reworking the fork and keeping it up-to-date with net-next is 
also a lot of overhead.

To start with, we can create RFC patches (code that's ready for comment 
rather than merge -- not "RFC" in the IETF sense) that allow us to extend 
TCP in the ways that are useful for both MPTCP and other extended TCP 
features. The maintainers would be able to review those standalone 
patches, and there's potential to backport the patches to prove them out 
with the multipath-tcp.org code. Does this sound sensible? Any other 
approaches to consider, or details that we should discuss here?


Design for Upstream
-------------------

As a starting point for discussion, here are some characteristics that 
might make MPTCP more upstream-friendly:

  * MPTCP is used when requested by the application, either through an 
IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer 
Protocol) capability.

  * Move away from meta-sockets, treating each subflow more like a regular 
TCP connection. The overall MPTCP connection is coordinated by an upper 
layer socket that is distinct from tcp_sock.

  * Move functionality to userspace where possible, like tracking ADD_ADDRs 
received, initiating new subflows, or accepting new subflows.

  * Avoid adding locks to coordinate access to data that's shared between 
subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics, 
and RCU to deal with shared data efficiently.

  * Add generic capabilities to the TCP stack where it looks useful to 
other protocol extensions. Examples: dynamically register handlers for TCP 
option headers, make it possible to pass TCP options to/from an upper 
layer.

Any comment on these? Maybe each deserves a thread of its own.


Thanks again to Rao, Christoph, Peter, and Ossama for your help, work, and 
interest. I'm looking forward to your insights.


--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-08-22  6:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22  0:34 [MPTCP] MPTCP upstreaming strategy and design Mat Martineau
  -- strict thread matches above, loose matches on Subject: below --
2017-08-22  6:55 Christoph Paasch
2017-08-19 23:55 Christoph Paasch
2017-08-19  4:43 Lorenzo Colitti
2017-08-08 22:50 Christoph Paasch
2017-08-08 22:49 Christoph Paasch
2017-08-08 22:10 Mat Martineau
2017-08-08 22:06 Mat Martineau
2017-08-07 23:56 Rao Shoaib
2017-08-02 17:04 Mat Martineau
2017-08-02  5:16 Christoph Paasch
2017-08-02  5:09 Christoph Paasch
2017-08-01 23:39 Mat Martineau
2017-07-31 23:03 Rao Shoaib
2017-07-19  0:31 Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.