All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-08 23:46 Stephen Brennan
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Brennan @ 2018-03-08 23:46 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 7923 bytes --]

Hi Ossama,

I'm very interested in the idea of a MPTCP path manager Netlink API, since
I actually posted a patch on mptcp-dev recently containing a path manager
which did send and receive some commands over a Netlink API. My API was not
nearly as complete as this one, but it's nice to see that path management
policy could move further into user-space. That certainly makes the sort of
research I was working on a lot easier.

I do have some inline comments drawn from my experience, but first one
question. Is the implementation strategy for the Netlink PM to simply be
another path manager (selectable among the other path managers), or would
it replace the other path managers in some way?

On Thu, Mar 08, 2018 at 12:48:40PM -0800, Othman, Ossama wrote:
> Hi,
> 
> Following up on on a brief exchange between Matthieu and Mat regarding
> a MPTCP path manager netlink API, I'd like to share to our own
> proposed generic netlink API developed in parallel.
> 
> Please find the high level description below.  It'll be great to
> compare the two netlink based APIs to determine if either can be
> improved by leveraging different aspects from each one.
> 
> Thanks!
> 
> --
> Ossama Othman
> Intel OTC
> 
> ==============================================
> RFC: MPTCP Path Management Generic Netlink API
> ==============================================
> 
> A generic netlink socket is used to facilitate communication between
> the kernel and a user space daemon that handles MPTCP path management
> related operations, from here on in called the path manager.  Several
> multicast groups, attributes and operations are exposed by the "mptcp"
> generic netlink family, e.g.:
> 
> $ genl ctrl list
> ...
> Name: mptcp
>        ID: 0x1d  Version: 0x1  header size: 0  max attribs: 7
>        commands supported:
>                #1:  ID-0x0
>                #2:  ID-0x1
>                #3:  ID-0x2
>                #4:  ID-0x3
>                #5:  ID-0x4
> 
> 
>        multicast groups:
>                #1:  ID-0xa  name: new_connection
>                #2:  ID-0xb  name: new_addr
>                #3:  ID-0xc  name: join_attempt
>                #4:  ID-0xd  name: new_subflow
>                #5:  ID-0xe  name: subflow_closed
>                #6:  ID-0xf  name: conn_closed
> 
> Each of the multicast groups corresponds to MPTCP path manager events
> supported by the kernel MPTCP stack.
> 
> Kernel Initiated Events
> -----------------------
> * new_connection
>    * Called upon completion of new MPTCP-capable connection.
>      Information for initial subflow is made available to the path
>      manager.
>    * Payload
>       * Connection ID (globally unique for host)
>       * Local address
>       * Local port
>       * Remote address
>       * Remote port
>       * Priority
> * new_addr
>    * Triggered when the host receives an ADD_ADDR MPTCP option, i.e. a
>      new address is advertised by the remote side.
>    * Payload
>       * Connection ID
>       * Remote address ID
>       * Remote address
>       * Remote port
> * join_attempt
>    * Called when a MP_JOIN has been ACKed.  The path manager is
>      expected to respond with an allow_join event containing its
>      decision based on the configured policy.

How would this be implemented? I'm not sure of the specifics, but many path
manager functions are called from contexts in which sleeping is not
possible. It seems like waiting on a user-space decision for join_attempt
would require a decent amount of re-implementation. Plus, there is the
possibility that userspace may never respond.

It seems like a more pragmatic approach could be to have some strategies
implemented in the kernel. EG: default decline, default accept, accept on
certain interface, etc. A global (per net namespace?) default could be set,
and userspace could modify the global policy, or set the policy on a
particular connection. This would allow the kernel implementation to
respond immediately, while still giving userspace some flexibility.

>    * Payload
>       * Connection ID
>       * Local address ID
>       * Local address
>       * Local port
>       * Remote address ID
>       * Remote address
>       * Remote port
> * new_subflow
>    * Called when final MP_JOIN ACK has been ACKed.
>    * Payload
>       * Connection ID
>       * Subflow ID
> * subflow_closed
>    * Called when a subflow has been closed.  Allows path manager to
>      clean up subflow related resources.
>    * Payload
>       * Connection ID
>       * Subflow ID
> * conn_closed
>    * Call when an MPTCP connection as a whole, as opposed to a single
>      subflow, has been closed.  This is the case when close(2) has
>      been called on an MPTCP connection.
>    * Payload
>       * Connection ID
> 

Is there any event for "new network interface added"? Or would the
user-space daemon need to subscribe to that information elsewhere?

> Path Manager Initiated Events (Commands)
> ----------------------------------------
> * send_addr
>    * Notify the kernel of the availability of new address for use in
>      MPTCP connections.  Triggers an ADD_ADDR to be sent to the peer.
>    * Payload
>       * Connection ID
>       * Address ID
>       * Local address
>       * Local port (optional, use same port as initial subflow if not
>         specified)
> * add_subflow
>    * Add new subflow to the MPTCP connection.  This triggers an
>      MP_JOIN to be sent to the peer.
>    * Payload
>       * Connection ID
>       * Local address ID
>       * Local address (optional, required if send_addr not previously
>         sent to establish the local address ID)
>       * Local port (optional, use same port as initial subflow if not
>         specified)
>       * Remote address ID (e.g. from a previously received new_addr or
>         join_attempt event)

It seems like a command to list the available remote address IDs would be a
useful addition to this, so a user-space daemon could learn of remote
address IDs that are currently in use for a connection that was already
established when the daemon started.

Similarly, a list_subflows command could be useful to allow an application
to query the existing subflows of a connection.

>       * Backup priority flag (optional, use default priority if not
>         specified)
>       * Subflow ID
> * allow_join
>    * Allow MP_JOIN attempt from peer.
>    * Payload
>       * Connection ID
>       * Remote address ID (e.g from a previously received join_attempt
>         event).
>       * Local address
>       * Local port
>       * Allow indication (optional, do not allow join if not
>         specified)
>       * Backup priority flag (optional, use default priority if not
>         specified)
>       * Subflow ID
> * set_backup
>    * Set subflow priority to backup priority.
>    * Payload
>       * Connection ID
>       * Subflow ID
>       * Backup priority flag (optional, use default priority if not
>         specified)
> * remove_subflow
>    * Triggers a REMOVE_ADDR MPTCP option to be sent, ultimately
>      resulting in subflows routed through that invalidated address to
>      be closed.
>    * Payload
>       * Connection ID
>       * Subflow ID
> 
> Security
> --------
> For security reasons, path management operations may only be performed
> by privileged processes due to the GENL_ADMIN_PERM generic netlink
> flag being set.  In particular, access to the MPTCP generic netlink
> interface will require CAP_NET_ADMIN privileges.

It seems important to require additionally that a process can only act on a
MPTCP connection if they are within the same network namespace. This might
be implicit in your security model but I wanted to confirm it.

Regards,
Stephen


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-13 18:26 'Christoph Paasch'
  0 siblings, 0 replies; 9+ messages in thread
From: 'Christoph Paasch' @ 2018-03-13 18:26 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

On 12/03/18 - 22:04:59, Alexander Frömmgen wrote:
> Hi
> 
> > How do you plan to handle schedulers? Given the good experimental results
> of the min-RTT-first scheduler, will that be the only option? Or will it
> remain as a kernel module due to the time-sensitive nature of scheduling?
> 
> We experimented with a programmable scheduler in the Kernel, which enables
> timely scheduling decisions and a lot of flexibility. Details are available
> at https://progmp.net and you might want to directly test a scheduler at
> http://progmp.net/progmp.html 
> 
> Our current setup contains a lot of additional logic in the Kernel (e.g.,
> lexer, parser). While this has some benefits, I think that the eBPF-based
> part of our work should be sufficient  for most application scenarios.

I would love to see a patch-submission :-)


Cheers,
Christoph


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-13 18:24 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2018-03-13 18:24 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

Hello Stephen,

On 12/03/18 - 13:55:07, Stephen Brennan wrote:
> Thanks for the info, I think I understand the plan a lot better now!
> 
> On Sun, Mar 11, 2018 at 08:27:43PM -0700, Christoph Paasch wrote:
> > Hello Stephen,
> > 
> > On 08/03/18 - 15:46:29, Stephen Brennan wrote:
> > > Hi Ossama,
> > > 
> > > I'm very interested in the idea of a MPTCP path manager Netlink API, since
> > > I actually posted a patch on mptcp-dev recently containing a path manager
> > > which did send and receive some commands over a Netlink API. My API was not
> > > nearly as complete as this one, but it's nice to see that path management
> > > policy could move further into user-space. That certainly makes the sort of
> > > research I was working on a lot easier.
> > 
> > I saw your patches and will do a review soon.
> > 
> > > I do have some inline comments drawn from my experience, but first one
> > > question. Is the implementation strategy for the Netlink PM to simply be
> > > another path manager (selectable among the other path managers), or would
> > > it replace the other path managers in some way?
> > 
> > Long-term we definitely want to go away from having the kernel-moduled
> > path-managers. They were very useful when we had to write research papers,
> > but for upstreaming it's not a good design :)
> 
> How do you plan to handle schedulers? Given the good experimental results
> of the min-RTT-first scheduler, will that be the only option? Or will it
> remain as a kernel module due to the time-sensitive nature of scheduling?
> 
> I ask because I've found scheduling to be sometimes related to path
> management. For instance, the system I worked on in my thesis essentially
> routes subflows through proxies. A piece of future work we've been looking
> into has been to "explore" new proxies without risk of disrupting the
> connection due to packet loss. We would do this by only transmitting
> redundant data segments along the subflow being "explored".

I think there are two things to consider.

For the upstreaming effort, it makes sense to have the most simple and
(more importantly) cleanest solution for scheduling. At least initially,
that allows to target upstream submission without the complexity of the
more advanced schedulers (e.g., reinjection based on retransmission timeouts,...).
As use-cases become more clear, a upstream scheduler could then get changed
to accomodate those. As Alexander mentions, the BPF scheduler could indeed
be an interesting approach for upstream.

On the other hand, that shouldn't prevent research and experiments to go into the
multipath-tcp.org implementation, as long as it is clean and neatly fits
into the current framework (assuming that the submission is being
maintained by the researcher).


Christoph


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-12 21:10 Stephen Brennan
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Brennan @ 2018-03-12 21:10 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

Hi Ossama,

One other thought that came up about this is that the performance of each
subflow might be a consideration in path management. I don't know what
existing facilities exist for userspace to see statistics about subflow
performance (estimated RTT, loss events, etc). If these facilities don't
exist, they may be a worthy addition to this API.

A related question is, will this API address data scheduling at all?

Finally, a couple comments inline :)

On Mon, Mar 12, 2018 at 10:27:29AM -0700, Othman, Ossama wrote:
> Hi Stephen,
> 
> On Thu, Mar 8, 2018 at 3:46 PM, Stephen Brennan <stephen(a)brennan.io> wrote:
> > I'm very interested in the idea of a MPTCP path manager Netlink API, since
> > I actually posted a patch on mptcp-dev recently containing a path manager
> > which did send and receive some commands over a Netlink API. My API was not
> > nearly as complete as this one, but it's nice to see that path management
> > policy could move further into user-space. That certainly makes the sort of
> > research I was working on a lot easier.
> 
> Thanks for pointing this out!  I'm interested in seeing your approach,
> so I'll take a look, too.

I doubt my approach will be of much use -- it is highly specialized to a
particular use case.

I wish I had implemented some sort of connection identifier as your API
describes - it would have made some of my operations much better.

> > Is there any event for "new network interface added"? Or would the
> > user-space daemon need to subscribe to that information elsewhere?
> 
> The daemon subscribes to rtnetlink events to determine when network
> interfaces have been added or removed, and propagates those events to
> the path manager plugins.  I'm not sure if this is the best approach
> since we could also propagate such network interface related events
> from the kernel MPTCP implementation, similar to how the
> multipath-tcp.org implementation propagates such events to its
> in-kernel path managers.  Ultimately I opted to listen for network
> interface changes in the user space to avoid adding more code in the
> kernel.

This makes perfect sense; don't reinvent the wheel!

Thanks for being open to suggestions. I look forward to what this
implementation will look like!

Stephen


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-12 20:55 Stephen Brennan
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Brennan @ 2018-03-12 20:55 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

Christoph,

Thanks for the info, I think I understand the plan a lot better now!

On Sun, Mar 11, 2018 at 08:27:43PM -0700, Christoph Paasch wrote:
> Hello Stephen,
> 
> On 08/03/18 - 15:46:29, Stephen Brennan wrote:
> > Hi Ossama,
> > 
> > I'm very interested in the idea of a MPTCP path manager Netlink API, since
> > I actually posted a patch on mptcp-dev recently containing a path manager
> > which did send and receive some commands over a Netlink API. My API was not
> > nearly as complete as this one, but it's nice to see that path management
> > policy could move further into user-space. That certainly makes the sort of
> > research I was working on a lot easier.
> 
> I saw your patches and will do a review soon.
> 
> > I do have some inline comments drawn from my experience, but first one
> > question. Is the implementation strategy for the Netlink PM to simply be
> > another path manager (selectable among the other path managers), or would
> > it replace the other path managers in some way?
> 
> Long-term we definitely want to go away from having the kernel-moduled
> path-managers. They were very useful when we had to write research papers,
> but for upstreaming it's not a good design :)

How do you plan to handle schedulers? Given the good experimental results
of the min-RTT-first scheduler, will that be the only option? Or will it
remain as a kernel module due to the time-sensitive nature of scheduling?

I ask because I've found scheduling to be sometimes related to path
management. For instance, the system I worked on in my thesis essentially
routes subflows through proxies. A piece of future work we've been looking
into has been to "explore" new proxies without risk of disrupting the
connection due to packet loss. We would do this by only transmitting
redundant data segments along the subflow being "explored".

Stephen


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-12 17:27 Othman, Ossama
  0 siblings, 0 replies; 9+ messages in thread
From: Othman, Ossama @ 2018-03-12 17:27 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5744 bytes --]

Hi Stephen,

On Thu, Mar 8, 2018 at 3:46 PM, Stephen Brennan <stephen(a)brennan.io> wrote:
> I'm very interested in the idea of a MPTCP path manager Netlink API, since
> I actually posted a patch on mptcp-dev recently containing a path manager
> which did send and receive some commands over a Netlink API. My API was not
> nearly as complete as this one, but it's nice to see that path management
> policy could move further into user-space. That certainly makes the sort of
> research I was working on a lot easier.

Thanks for pointing this out!  I'm interested in seeing your approach,
so I'll take a look, too.

> I do have some inline comments drawn from my experience, but first one
> question. Is the implementation strategy for the Netlink PM to simply be
> another path manager (selectable among the other path managers), or would
> it replace the other path managers in some way?

Our approach is to move all path management related operations to a
user space daemon since they aren't in the critical path (assuming
connection setup/teardown doesn't occur often), with the goal of
minimizing the amount of work done in the kernel.  That daemon
provides path manager plugin infrastructure.  Path manager plugins
would implement the plugin interface (callback functions)
corresponding to the generic netlink events described below.
Selection of a path manager would be done at daemon start (the
default) or at run-time through a socket option, similar to the socket
option based selection in the multipath-tcp.org implementation.  To
start, and to simplify the initial implementation, we'll only support
selecting a path manager at run-time before the MPTCP connection has
been established.  However, the below generic netlink proposal lacks a
payload field, such as the path manager name, necessary to support
run-time selection.  Adding a path manager name payload field to the
new_connection event should suffice, as far as I can tell.

More comments inline below ...

> On Thu, Mar 08, 2018 at 12:48:40PM -0800, Othman, Ossama wrote:
>> * join_attempt
>>    * Called when a MP_JOIN has been ACKed.  The path manager is
>>      expected to respond with an allow_join event containing its
>>      decision based on the configured policy.
>
> How would this be implemented? I'm not sure of the specifics, but many path
> manager functions are called from contexts in which sleeping is not
> possible. It seems like waiting on a user-space decision for join_attempt
> would require a decent amount of re-implementation. Plus, there is the
> possibility that userspace may never respond.

You certainly bring up good point.  I'm not far enough in the
implementation, particularly for this join_attempt event, to be able
to give you a meaningful answer regarding the inability to sleep.
Regarding the issue of user space potentially not responding, wouldn't
leveraging a timeout prevent the kernel space from waiting
indefinitely?  It may not be the best solution but at least we could
bound the wait time.

> It seems like a more pragmatic approach could be to have some strategies
> implemented in the kernel. EG: default decline, default accept, accept on
> certain interface, etc. A global (per net namespace?) default could be set,
> and userspace could modify the global policy, or set the policy on a
> particular connection. This would allow the kernel implementation to
> respond immediately, while still giving userspace some flexibility.

Yes, that sounds reasonable.  I'll take this into account as our
implementation progresses.  Thanks!

> Is there any event for "new network interface added"? Or would the
> user-space daemon need to subscribe to that information elsewhere?

The daemon subscribes to rtnetlink events to determine when network
interfaces have been added or removed, and propagates those events to
the path manager plugins.  I'm not sure if this is the best approach
since we could also propagate such network interface related events
from the kernel MPTCP implementation, similar to how the
multipath-tcp.org implementation propagates such events to its
in-kernel path managers.  Ultimately I opted to listen for network
interface changes in the user space to avoid adding more code in the
kernel.

>> Path Manager Initiated Events (Commands)
>> ----------------------------------------
...
> It seems like a command to list the available remote address IDs would be a
> useful addition to this, so a user-space daemon could learn of remote
> address IDs that are currently in use for a connection that was already
> established when the daemon started.

I was under the assumption that MPTCP connections would not be
established without the daemon up and running, meaning it would know
about all remote address IDs.  Perhaps that isn't a reasonable
assumption, especially if we implement the default path management
strategies in the kernel you mentioned above.

> Similarly, a list_subflows command could be useful to allow an application
> to query the existing subflows of a connection.

That sounds reasonable.

>> Security
>> --------
>> For security reasons, path management operations may only be performed
>> by privileged processes due to the GENL_ADMIN_PERM generic netlink
>> flag being set.  In particular, access to the MPTCP generic netlink
>> interface will require CAP_NET_ADMIN privileges.
>
> It seems important to require additionally that a process can only act on a
> MPTCP connection if they are within the same network namespace. This might
> be implicit in your security model but I wanted to confirm it.

It is now. :)

Thanks for the excellent feedback!

-Ossama

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-12  3:27 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2018-03-12  3:27 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 8814 bytes --]

Hello Stephen,

On 08/03/18 - 15:46:29, Stephen Brennan wrote:
> Hi Ossama,
> 
> I'm very interested in the idea of a MPTCP path manager Netlink API, since
> I actually posted a patch on mptcp-dev recently containing a path manager
> which did send and receive some commands over a Netlink API. My API was not
> nearly as complete as this one, but it's nice to see that path management
> policy could move further into user-space. That certainly makes the sort of
> research I was working on a lot easier.

I saw your patches and will do a review soon.

> I do have some inline comments drawn from my experience, but first one
> question. Is the implementation strategy for the Netlink PM to simply be
> another path manager (selectable among the other path managers), or would
> it replace the other path managers in some way?

Long-term we definitely want to go away from having the kernel-moduled
path-managers. They were very useful when we had to write research papers,
but for upstreaming it's not a good design :)


Christoph

> 
> On Thu, Mar 08, 2018 at 12:48:40PM -0800, Othman, Ossama wrote:
> > Hi,
> > 
> > Following up on on a brief exchange between Matthieu and Mat regarding
> > a MPTCP path manager netlink API, I'd like to share to our own
> > proposed generic netlink API developed in parallel.
> > 
> > Please find the high level description below.  It'll be great to
> > compare the two netlink based APIs to determine if either can be
> > improved by leveraging different aspects from each one.
> > 
> > Thanks!
> > 
> > --
> > Ossama Othman
> > Intel OTC
> > 
> > ==============================================
> > RFC: MPTCP Path Management Generic Netlink API
> > ==============================================
> > 
> > A generic netlink socket is used to facilitate communication between
> > the kernel and a user space daemon that handles MPTCP path management
> > related operations, from here on in called the path manager.  Several
> > multicast groups, attributes and operations are exposed by the "mptcp"
> > generic netlink family, e.g.:
> > 
> > $ genl ctrl list
> > ...
> > Name: mptcp
> >        ID: 0x1d  Version: 0x1  header size: 0  max attribs: 7
> >        commands supported:
> >                #1:  ID-0x0
> >                #2:  ID-0x1
> >                #3:  ID-0x2
> >                #4:  ID-0x3
> >                #5:  ID-0x4
> > 
> > 
> >        multicast groups:
> >                #1:  ID-0xa  name: new_connection
> >                #2:  ID-0xb  name: new_addr
> >                #3:  ID-0xc  name: join_attempt
> >                #4:  ID-0xd  name: new_subflow
> >                #5:  ID-0xe  name: subflow_closed
> >                #6:  ID-0xf  name: conn_closed
> > 
> > Each of the multicast groups corresponds to MPTCP path manager events
> > supported by the kernel MPTCP stack.
> > 
> > Kernel Initiated Events
> > -----------------------
> > * new_connection
> >    * Called upon completion of new MPTCP-capable connection.
> >      Information for initial subflow is made available to the path
> >      manager.
> >    * Payload
> >       * Connection ID (globally unique for host)
> >       * Local address
> >       * Local port
> >       * Remote address
> >       * Remote port
> >       * Priority
> > * new_addr
> >    * Triggered when the host receives an ADD_ADDR MPTCP option, i.e. a
> >      new address is advertised by the remote side.
> >    * Payload
> >       * Connection ID
> >       * Remote address ID
> >       * Remote address
> >       * Remote port
> > * join_attempt
> >    * Called when a MP_JOIN has been ACKed.  The path manager is
> >      expected to respond with an allow_join event containing its
> >      decision based on the configured policy.
> 
> How would this be implemented? I'm not sure of the specifics, but many path
> manager functions are called from contexts in which sleeping is not
> possible. It seems like waiting on a user-space decision for join_attempt
> would require a decent amount of re-implementation. Plus, there is the
> possibility that userspace may never respond.
> 
> It seems like a more pragmatic approach could be to have some strategies
> implemented in the kernel. EG: default decline, default accept, accept on
> certain interface, etc. A global (per net namespace?) default could be set,
> and userspace could modify the global policy, or set the policy on a
> particular connection. This would allow the kernel implementation to
> respond immediately, while still giving userspace some flexibility.
> 
> >    * Payload
> >       * Connection ID
> >       * Local address ID
> >       * Local address
> >       * Local port
> >       * Remote address ID
> >       * Remote address
> >       * Remote port
> > * new_subflow
> >    * Called when final MP_JOIN ACK has been ACKed.
> >    * Payload
> >       * Connection ID
> >       * Subflow ID
> > * subflow_closed
> >    * Called when a subflow has been closed.  Allows path manager to
> >      clean up subflow related resources.
> >    * Payload
> >       * Connection ID
> >       * Subflow ID
> > * conn_closed
> >    * Call when an MPTCP connection as a whole, as opposed to a single
> >      subflow, has been closed.  This is the case when close(2) has
> >      been called on an MPTCP connection.
> >    * Payload
> >       * Connection ID
> > 
> 
> Is there any event for "new network interface added"? Or would the
> user-space daemon need to subscribe to that information elsewhere?
> 
> > Path Manager Initiated Events (Commands)
> > ----------------------------------------
> > * send_addr
> >    * Notify the kernel of the availability of new address for use in
> >      MPTCP connections.  Triggers an ADD_ADDR to be sent to the peer.
> >    * Payload
> >       * Connection ID
> >       * Address ID
> >       * Local address
> >       * Local port (optional, use same port as initial subflow if not
> >         specified)
> > * add_subflow
> >    * Add new subflow to the MPTCP connection.  This triggers an
> >      MP_JOIN to be sent to the peer.
> >    * Payload
> >       * Connection ID
> >       * Local address ID
> >       * Local address (optional, required if send_addr not previously
> >         sent to establish the local address ID)
> >       * Local port (optional, use same port as initial subflow if not
> >         specified)
> >       * Remote address ID (e.g. from a previously received new_addr or
> >         join_attempt event)
> 
> It seems like a command to list the available remote address IDs would be a
> useful addition to this, so a user-space daemon could learn of remote
> address IDs that are currently in use for a connection that was already
> established when the daemon started.
> 
> Similarly, a list_subflows command could be useful to allow an application
> to query the existing subflows of a connection.
> 
> >       * Backup priority flag (optional, use default priority if not
> >         specified)
> >       * Subflow ID
> > * allow_join
> >    * Allow MP_JOIN attempt from peer.
> >    * Payload
> >       * Connection ID
> >       * Remote address ID (e.g from a previously received join_attempt
> >         event).
> >       * Local address
> >       * Local port
> >       * Allow indication (optional, do not allow join if not
> >         specified)
> >       * Backup priority flag (optional, use default priority if not
> >         specified)
> >       * Subflow ID
> > * set_backup
> >    * Set subflow priority to backup priority.
> >    * Payload
> >       * Connection ID
> >       * Subflow ID
> >       * Backup priority flag (optional, use default priority if not
> >         specified)
> > * remove_subflow
> >    * Triggers a REMOVE_ADDR MPTCP option to be sent, ultimately
> >      resulting in subflows routed through that invalidated address to
> >      be closed.
> >    * Payload
> >       * Connection ID
> >       * Subflow ID
> > 
> > Security
> > --------
> > For security reasons, path management operations may only be performed
> > by privileged processes due to the GENL_ADMIN_PERM generic netlink
> > flag being set.  In particular, access to the MPTCP generic netlink
> > interface will require CAP_NET_ADMIN privileges.
> 
> It seems important to require additionally that a process can only act on a
> MPTCP connection if they are within the same network namespace. This might
> be implicit in your security model but I wanted to confirm it.
> 
> Regards,
> Stephen
> 
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-12  3:25 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2018-03-12  3:25 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 7890 bytes --]

Hello Ossama,

thanks for sharing this! Looks quite well thought-through.

Please find some comments inline:

On 08/03/18 - 12:48:40, Othman, Ossama wrote:
> Hi,
> 
> Following up on on a brief exchange between Matthieu and Mat regarding
> a MPTCP path manager netlink API, I'd like to share to our own
> proposed generic netlink API developed in parallel.
> 
> Please find the high level description below.  It'll be great to
> compare the two netlink based APIs to determine if either can be
> improved by leveraging different aspects from each one.
> 
> Thanks!
> 
> --
> Ossama Othman
> Intel OTC
> 
> ==============================================
> RFC: MPTCP Path Management Generic Netlink API
> ==============================================
> 
> A generic netlink socket is used to facilitate communication between
> the kernel and a user space daemon that handles MPTCP path management
> related operations, from here on in called the path manager.  Several
> multicast groups, attributes and operations are exposed by the "mptcp"
> generic netlink family, e.g.:
> 
> $ genl ctrl list
> ...
> Name: mptcp
>        ID: 0x1d  Version: 0x1  header size: 0  max attribs: 7
>        commands supported:
>                #1:  ID-0x0
>                #2:  ID-0x1
>                #3:  ID-0x2
>                #4:  ID-0x3
>                #5:  ID-0x4
> 
> 
>        multicast groups:
>                #1:  ID-0xa  name: new_connection
>                #2:  ID-0xb  name: new_addr
>                #3:  ID-0xc  name: join_attempt
>                #4:  ID-0xd  name: new_subflow
>                #5:  ID-0xe  name: subflow_closed
>                #6:  ID-0xf  name: conn_closed
> 
> Each of the multicast groups corresponds to MPTCP path manager events
> supported by the kernel MPTCP stack.
> 
> Kernel Initiated Events
> -----------------------
> * new_connection
>    * Called upon completion of new MPTCP-capable connection.
>      Information for initial subflow is made available to the path
>      manager.
>    * Payload
>       * Connection ID (globally unique for host)
>       * Local address
>       * Local port
>       * Remote address
>       * Remote port
>       * Priority

What is the meaning of "Priority" here? Is this whether the interface is a
backup one, based on the 'ip link set dev ... multipath backup'
configuration?

If that's the case, I suggest that this is not needed. Ultimately, the
netlink path-manager should configure the backup-flag of an interface.


Additional info that could be useful would be the pid of the process (or
does Linux has a notion of UUID (?)) that owns the socket. That would allow a
client-side path manager to apply policies based on the app. E.g., think of an
Android implementation where some apps are allowed to use cell, while others
aren't.

> * new_addr
>    * Triggered when the host receives an ADD_ADDR MPTCP option, i.e. a
>      new address is advertised by the remote side.
>    * Payload
>       * Connection ID
>       * Remote address ID
>       * Remote address
>       * Remote port
> * join_attempt
>    * Called when a MP_JOIN has been ACKed.  The path manager is
>      expected to respond with an allow_join event containing its
>      decision based on the configured policy.

I'm not sure what you mean with "has been ACKed". Can you explain? Is this
here a server-side, where we are receiving a SYN+MP_JOIN?

>    * Payload
>       * Connection ID
>       * Local address ID
>       * Local address
>       * Local port
>       * Remote address ID
>       * Remote address
>       * Remote port
> * new_subflow
>    * Called when final MP_JOIN ACK has been ACKed.
>    * Payload
>       * Connection ID
>       * Subflow ID

Same here as above. Maybe it would help if you clarify if we are the active
or the passive opener in these cases.

> * subflow_closed
>    * Called when a subflow has been closed.  Allows path manager to
>      clean up subflow related resources.

For the payload, an optional "error" would be good. That allows to see if
the subflow got closed because of an incoming RST, timeout,...

As for "regular" closing, I'm not sure how this applies here, because when
subflows are managed by the path-manager, it should not go away unless the
path-manager does the remove_subflow command.

One subtlety is that the peer could decide to close a subflow (thus, sends a
FIN), while we still keep this subflow alive.
So, maybe we need another event that says "subflow_read_closed", upon which
the path-manager can do a remove_subflow command.

>    * Payload
>       * Connection ID
>       * Subflow ID
> * conn_closed
>    * Call when an MPTCP connection as a whole, as opposed to a single
>      subflow, has been closed.  This is the case when close(2) has
>      been called on an MPTCP connection.
>    * Payload
>       * Connection ID
> 
> Path Manager Initiated Events (Commands)
> ----------------------------------------
> * send_addr
>    * Notify the kernel of the availability of new address for use in
>      MPTCP connections.  Triggers an ADD_ADDR to be sent to the peer.
>    * Payload
>       * Connection ID
>       * Address ID
>       * Local address
>       * Local port (optional, use same port as initial subflow if not
>         specified)
> * add_subflow
>    * Add new subflow to the MPTCP connection.  This triggers an
>      MP_JOIN to be sent to the peer.
>    * Payload
>       * Connection ID
>       * Local address ID
>       * Local address (optional, required if send_addr not previously
>         sent to establish the local address ID)
>       * Local port (optional, use same port as initial subflow if not
>         specified)
>       * Remote address ID (e.g. from a previously received new_addr or
>         join_attempt event)
>       * Backup priority flag (optional, use default priority if not
>         specified)
>       * Subflow ID

We need the interface-index as well here. There are use-cases where one has
multiple interfaces with the same IP address. Think of a host with multiple
interfaces, each behind a NAT and the gateways each give 192.168.1.2.

> * allow_join
>    * Allow MP_JOIN attempt from peer.
>    * Payload
>       * Connection ID
>       * Remote address ID (e.g from a previously received join_attempt
>         event).
>       * Local address
>       * Local port
>       * Allow indication (optional, do not allow join if not
>         specified)
>       * Backup priority flag (optional, use default priority if not
>         specified)
>       * Subflow ID

This means we are delaying the sending of the SYN/ACK+MP_JOIN, right?

From an implementation point-of-view, how does this look like? It seems
quite tricky to me.

Would it be easier to restrict the path-manager to active-opener only?


Cheers,
Christoph

> * set_backup
>    * Set subflow priority to backup priority.
>    * Payload
>       * Connection ID
>       * Subflow ID
>       * Backup priority flag (optional, use default priority if not
>         specified)
> * remove_subflow
>    * Triggers a REMOVE_ADDR MPTCP option to be sent, ultimately
>      resulting in subflows routed through that invalidated address to
>      be closed.
>    * Payload
>       * Connection ID
>       * Subflow ID
> 
> Security
> --------
> For security reasons, path management operations may only be performed
> by privileged processes due to the GENL_ADMIN_PERM generic netlink
> flag being set.  In particular, access to the MPTCP generic netlink
> interface will require CAP_NET_ADMIN privileges.
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [MPTCP] [RFC] MPTCP Path Management Generic Netlink API
@ 2018-03-08 20:48 Othman, Ossama
  0 siblings, 0 replies; 9+ messages in thread
From: Othman, Ossama @ 2018-03-08 20:48 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5276 bytes --]

Hi,

Following up on on a brief exchange between Matthieu and Mat regarding
a MPTCP path manager netlink API, I'd like to share to our own
proposed generic netlink API developed in parallel.

Please find the high level description below.  It'll be great to
compare the two netlink based APIs to determine if either can be
improved by leveraging different aspects from each one.

Thanks!

--
Ossama Othman
Intel OTC

==============================================
RFC: MPTCP Path Management Generic Netlink API
==============================================

A generic netlink socket is used to facilitate communication between
the kernel and a user space daemon that handles MPTCP path management
related operations, from here on in called the path manager.  Several
multicast groups, attributes and operations are exposed by the "mptcp"
generic netlink family, e.g.:

$ genl ctrl list
...
Name: mptcp
       ID: 0x1d  Version: 0x1  header size: 0  max attribs: 7
       commands supported:
               #1:  ID-0x0
               #2:  ID-0x1
               #3:  ID-0x2
               #4:  ID-0x3
               #5:  ID-0x4


       multicast groups:
               #1:  ID-0xa  name: new_connection
               #2:  ID-0xb  name: new_addr
               #3:  ID-0xc  name: join_attempt
               #4:  ID-0xd  name: new_subflow
               #5:  ID-0xe  name: subflow_closed
               #6:  ID-0xf  name: conn_closed

Each of the multicast groups corresponds to MPTCP path manager events
supported by the kernel MPTCP stack.

Kernel Initiated Events
-----------------------
* new_connection
   * Called upon completion of new MPTCP-capable connection.
     Information for initial subflow is made available to the path
     manager.
   * Payload
      * Connection ID (globally unique for host)
      * Local address
      * Local port
      * Remote address
      * Remote port
      * Priority
* new_addr
   * Triggered when the host receives an ADD_ADDR MPTCP option, i.e. a
     new address is advertised by the remote side.
   * Payload
      * Connection ID
      * Remote address ID
      * Remote address
      * Remote port
* join_attempt
   * Called when a MP_JOIN has been ACKed.  The path manager is
     expected to respond with an allow_join event containing its
     decision based on the configured policy.
   * Payload
      * Connection ID
      * Local address ID
      * Local address
      * Local port
      * Remote address ID
      * Remote address
      * Remote port
* new_subflow
   * Called when final MP_JOIN ACK has been ACKed.
   * Payload
      * Connection ID
      * Subflow ID
* subflow_closed
   * Called when a subflow has been closed.  Allows path manager to
     clean up subflow related resources.
   * Payload
      * Connection ID
      * Subflow ID
* conn_closed
   * Call when an MPTCP connection as a whole, as opposed to a single
     subflow, has been closed.  This is the case when close(2) has
     been called on an MPTCP connection.
   * Payload
      * Connection ID

Path Manager Initiated Events (Commands)
----------------------------------------
* send_addr
   * Notify the kernel of the availability of new address for use in
     MPTCP connections.  Triggers an ADD_ADDR to be sent to the peer.
   * Payload
      * Connection ID
      * Address ID
      * Local address
      * Local port (optional, use same port as initial subflow if not
        specified)
* add_subflow
   * Add new subflow to the MPTCP connection.  This triggers an
     MP_JOIN to be sent to the peer.
   * Payload
      * Connection ID
      * Local address ID
      * Local address (optional, required if send_addr not previously
        sent to establish the local address ID)
      * Local port (optional, use same port as initial subflow if not
        specified)
      * Remote address ID (e.g. from a previously received new_addr or
        join_attempt event)
      * Backup priority flag (optional, use default priority if not
        specified)
      * Subflow ID
* allow_join
   * Allow MP_JOIN attempt from peer.
   * Payload
      * Connection ID
      * Remote address ID (e.g from a previously received join_attempt
        event).
      * Local address
      * Local port
      * Allow indication (optional, do not allow join if not
        specified)
      * Backup priority flag (optional, use default priority if not
        specified)
      * Subflow ID
* set_backup
   * Set subflow priority to backup priority.
   * Payload
      * Connection ID
      * Subflow ID
      * Backup priority flag (optional, use default priority if not
        specified)
* remove_subflow
   * Triggers a REMOVE_ADDR MPTCP option to be sent, ultimately
     resulting in subflows routed through that invalidated address to
     be closed.
   * Payload
      * Connection ID
      * Subflow ID

Security
--------
For security reasons, path management operations may only be performed
by privileged processes due to the GENL_ADMIN_PERM generic netlink
flag being set.  In particular, access to the MPTCP generic netlink
interface will require CAP_NET_ADMIN privileges.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-13 18:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 23:46 [MPTCP] [RFC] MPTCP Path Management Generic Netlink API Stephen Brennan
  -- strict thread matches above, loose matches on Subject: below --
2018-03-13 18:26 'Christoph Paasch'
2018-03-13 18:24 Christoph Paasch
2018-03-12 21:10 Stephen Brennan
2018-03-12 20:55 Stephen Brennan
2018-03-12 17:27 Othman, Ossama
2018-03-12  3:27 Christoph Paasch
2018-03-12  3:25 Christoph Paasch
2018-03-08 20:48 Othman, Ossama

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.