All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-28 16:25 Mat Martineau
  0 siblings, 0 replies; 13+ messages in thread
From: Mat Martineau @ 2018-03-28 16:25 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6657 bytes --]


On Wed, 28 Mar 2018, Christoph Paasch wrote:

> On 27/03/18 - 16:29:26, Mat Martineau wrote:
>>
>> Hi Christoph,
>>
>> On Tue, 27 Mar 2018, Christoph Paasch wrote:
>>
>>> Hello,
>>>
>>>
>>> (first, sorry for the long e-mail)
>>>
>>>
>>> I now started working on cleaning up the input path to prepare it for
>>> input-processing without holding the MPTCP-level lock.
>>>
>>> To do this, I go through all the places where we access data-structures from
>>> the meta-socket and see if I can move it to mptcp_data_ready or
>>> mptcp_write_space (the callbacks that are called from
>>> sk_data_ready/sk_write_space).
>>>
>>> In tcp_check_space() we have the following:
>>>                if (mptcp(tcp_sk(sk)) ||
>>>                    (sk->sk_socket &&
>>>                     test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
>>>                        tcp_new_space(sk);
>>>                        if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
>>>                                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
>>>                }
>>>
>>> We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
>>> application's struct socket. Thus, we need to avoid the check for
>>> SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.
>>>
>>>
>>> Other kernel-modules that create TCP-connections rather have an in-kernel
>>> struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
>>> that the TCP-stack keeps on up-calling. I thought that this was a good thing
>>> to do in MPTCP as well.
>>>
>>> So, my goal became to have a fully functional struct socket for subflows.
>>> The benefit is also that we can end up using kernel_sendmsg,
>>> kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
>>> the MPTCP-level socket to receive new subflows (a problem I mentioned in an
>>> earlier mail).
>>
>> I think we'll have a patch set illustrating this approach ready to post
>> tomorrow.
>
> Awesome! I am waiting for your patch :)
>
>> Peter and I have been working on this net-next-based code with
>> in-kernel sockets for many months now. I missed the SOCK_NOSPACE detail
>> though - I'll take a look at that.
>>
>> Rather than using kernel_sendmsg and kernel_recvmsg, I use do_tcp_sendpages
>> (to reduce copying and let the MPTCP connection layer share data across skbs
>> and subflows as it needs to) and tcp_read_sock (to read intact skbs out of
>> the rx queue). The kernel_* functions are used for most everything else
>> though.
>
> Do you also handle kernel_accept ?
> If yes, how do you do it? Because, it's quite complex to avoid races.

Yes, but we don't support joins yet so it's maybe only the trivial case. 
The connection level socket owns a subflow listening socket (which it 
created at init time), and when the application calls accept() it's 
immediately passed through to kernel_accept() for the subflow.

>
>>> It also would allow us to expose subflows as file-descriptors to the
>>> user-space. That way the user-space can do setsockopt, getsockopt,... on the
>>> subflows. An idea that came up in the past when we were thinking on how to
>>> expose an MPTCP API that allows apps to control certain things on the
>>> subflows.
>>
>> I like that idea too. We hadn't come up with a good design idea for
>> propagating many of the getsockopt/setsockopt operations from the
>> connection-level socket to all of the subflows.
>>
>>>
>>> To get there, there are a few places where things would need to change:
>>>
>>> * mptcp_init4_subsockets - Here, this works perfectly. It also allows to
>>>  avoid "faking" the struct socket, as we are currently doing.
>>> * mptcp_alloc_mpcb for the active opener - This is the first problem. mptcp_alloc_mpcb() can be
>>>  called with bh disabled. But sock_create_lite() assumes that bh is enabled
>>>  as it ends up doing an alloc with GFP_KERNEL.
>>>  A few ways this could be solved:
>>>  - Schedule a work-queue item in mptcp_alloc_mpcb that creates the struct
>>>    socket. This looks a bit racy to me. Not sure what side-effects this
>>>    might have.
>>>  - Change things entirely, such that the master-sock is being allocated
>>>    when the connection is created. That way, we allocate all the necessary
>>>    struct socket's right away.
>>>    In the past, we decided to allocate the master-sk only when receiving
>>>    the SYN/ACK. We did that so as to minimize the impact on regular TCP
>>>    when the server does not support MPTCP. But, as we are moving towards
>>>    explicitly exposing MPTCP at the socket-layer, we can rethink that
>>>    decision.
>>>    Any thoughts? Is it ok to pay the cost of allocating a master-sk before
>>>    we know whether the server supports MPTCP?
>>>    I think, we should do this, and transition to that model.
>>
>> I prefer the latter. If MPTCP is exposed at the socket layer, it's only
>> MPTCP sockets that experience the overhead (relative to a regular TCP
>> socket) when regular TCP is used.
>
> Are you doing the latter in your implementation?

Yes. There's always a connection level socket and a subflow socket if an 
IPPROTO_MPTCP socket was requested, even if only regular TCP ends up being 
used.


Mat

>
>>
>>> * mptcp_alloc_mpcb for the passive opener - same problem as above but on the
>>>  other side. We could allocate the master's struct socket upon the accept()
>>>  call from the application. This again sounds a bit racy to me. The struct
>>>  socket will be there for the subflow potentially much later than it has
>>>  been established. What happens if the peer sends data or an
>>>  MP_FASTCLOSE,... ?
>>> * New subflows on the passive opener side - again, we are receiving those
>>>  subflows while bh is disabled. So, we have to schedule a work-queue to
>>>  do a kerne_accept() on the MPTCP-socket.
>>>  Again something that can potentially be racy.
>>>
>>> In general, subflow-establishment on the passive-opener side again seems to
>>> be a major pain-point. I think, we really need to redesign that.
>>>
>>>
>>> Any thoughts, feedback, suggestions?
>>> Or maybe, using real struct socket for subflows is not worth it? :)
>>
>> Using the struct socket for subflows will take some work, but I think it's
>> an important part of shaping the code for upstream. If the subflows look
>> more like regular TCP sockets I think that will help keep the MPTCP code
>> well partitioned from TCP.
>>
>>
>> --
>> Mat Martineau
>> Intel OTC
>

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-29 20:01 Rao Shoaib
  0 siblings, 0 replies; 13+ messages in thread
From: Rao Shoaib @ 2018-03-29 20:01 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3789 bytes --]



On 03/29/2018 12:19 PM, Rao Shoaib wrote:
> Hi Steve,
>
> Sorry I did not respond to your email, too many issues to worry about :-)
>
>
> On 03/28/2018 02:33 PM, Stephen Brennan wrote:
>> Hi Rao et al.,
>>
>> On Tue, Mar 27, 2018 at 06:53:46PM -0700, Rao Shoaib wrote:
>>>
>>> On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
>>>> On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com
>>>> <mailto:cpaasch(a)apple.com>> wrote:
>>>>
>>>>      It also would allow us to expose subflows as file-descriptors 
>>>> to the
>>>>      user-space. That way the user-space can do setsockopt,
>>>>      getsockopt,... on the
>>>>      subflows. An idea that came up in the past when we were thinking
>>>>      on how to
>>>>      expose an MPTCP API that allows apps to control certain things 
>>>> on the
>>>>      subflows.
>>>>
>>>>
>>>> +1 to this since it will allow client platforms to use setsockopt to,
>>>> for example, bind subflows to interfaces (e.g., with 
>>>> SO_BINDTODEVICE, or
>>>> with SO_MARK as used by Android).
>>> May I ask why it can not be done  -- Only flow id should be needed.
>> It sounds like you're referring to the proposed MPTCP_SUB_GETSOCKOPT and
>> MPTCP_SUB_SETSOCKOPT operations in this IETF draft [1], correct?
>>
>> [1]: https://tools.ietf.org/html/draft-hesmans-mptcp-socket-00
> Actually I have not read the draft, my comment was based on general 
> understanding of the kernel. I am coming of Solaris and we know how to 
> create custom socket, special setopt/getopt , special fd's to expose 
> something etc etc. It is not rocket science. If someone has already 
> thought of it then we have less work to do.
>
> Any change of this kind has to be discussed in a standards body so 
> vendors and users can comment.
>>
>>> Plus for
>>> policy enforcement the meta socket would have to be consulted. I am 
>>> new to
>>> Linux but based on what I have read SO_BINDTODEVICE does not 
>>> guarantee much
>>> for Tx if the routing says otherwise, quick check of ip_queue_xmit() 
>>> shows
>>> it first consults the routing table. I understand policy routing can 
>>> be used
>>> to steer the packets. Perhaps my understanding is incorrect.
>>>
>>> IMHO this whole idea of exposing individual flows to the application 
>>> seems
>>> to go against the basic design of MPTCP. I am not sure what the use 
>>> case is
>>> as TCP is streams based and MPTCP will spread out data on multiple 
>>> flows. I
>>> am not a involved in the cellular space so I don' know, feel free to
>>> enlighten me.
>> I'm also a bit confused at this. I've read the Socket API drafts from 
>> IETF,
>> which seem to be motivated by the idea that mobile apps will want to
>> directly control which subflows and paths are created (similar to the 
>> iOS
>> APIs released recently). It sounds like we have two approaches: (1) 
>> MPTCP
>> as a drop-in replacement for TCP, so unmodified applications may use it,
>> and (2) MPTCP as an explicitly requested protocol which user-space
>> applications request, and then manipulate via socket options. Given the
>> amount of drafts and other work put in so far, it seems that use case 
>> (2)
>> has enough demand that it ought to be supported.
> Can you point me to the drafts or requests. As I said with the current 
> design everything is possible. No change is needed.
>
> Shoaib

Steve take a look at the following -- The possibilities are endless.

https://arxiv.org/pdf/1707.03585.pdf

The best approach I think is to enhance netlink. I have a  pie in the 
sky idea, we can have meta/master socket register with netlink, 
identified by src and destination addresses for direct communication.

Shoaib

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-29 19:19 Rao Shoaib
  0 siblings, 0 replies; 13+ messages in thread
From: Rao Shoaib @ 2018-03-29 19:19 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4104 bytes --]

Hi Steve,

Sorry I did not respond to your email, too many issues to worry about :-)


On 03/28/2018 02:33 PM, Stephen Brennan wrote:
> Hi Rao et al.,
>
> On Tue, Mar 27, 2018 at 06:53:46PM -0700, Rao Shoaib wrote:
>>
>> On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
>>> On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com
>>> <mailto:cpaasch(a)apple.com>> wrote:
>>>
>>>      It also would allow us to expose subflows as file-descriptors to the
>>>      user-space. That way the user-space can do setsockopt,
>>>      getsockopt,... on the
>>>      subflows. An idea that came up in the past when we were thinking
>>>      on how to
>>>      expose an MPTCP API that allows apps to control certain things on the
>>>      subflows.
>>>
>>>
>>> +1 to this since it will allow client platforms to use setsockopt to,
>>> for example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or
>>> with SO_MARK as used by Android).
>> May I ask why it can not be done  -- Only flow id should be needed.
> It sounds like you're referring to the proposed MPTCP_SUB_GETSOCKOPT and
> MPTCP_SUB_SETSOCKOPT operations in this IETF draft [1], correct?
>
> [1]: https://tools.ietf.org/html/draft-hesmans-mptcp-socket-00
Actually I have not read the draft, my comment was based on general 
understanding of the kernel. I am coming of Solaris and we know how to 
create custom socket, special setopt/getopt , special fd's to expose 
something etc etc. It is not rocket science. If someone has already 
thought of it then we have less work to do.

Any change of this kind has to be discussed in a standards body so 
vendors and users can comment.
>
>> Plus for
>> policy enforcement the meta socket would have to be consulted. I am new to
>> Linux but based on what I have read SO_BINDTODEVICE does not guarantee much
>> for Tx if the routing says otherwise, quick check of ip_queue_xmit() shows
>> it first consults the routing table. I understand policy routing can be used
>> to steer the packets. Perhaps my understanding is incorrect.
>>
>> IMHO this whole idea of exposing individual flows to the application seems
>> to go against the basic design of MPTCP. I am not sure what the use case is
>> as TCP is streams based and MPTCP will spread out data on multiple flows. I
>> am not a involved in the cellular space so I don' know, feel free to
>> enlighten me.
> I'm also a bit confused at this. I've read the Socket API drafts from IETF,
> which seem to be motivated by the idea that mobile apps will want to
> directly control which subflows and paths are created (similar to the iOS
> APIs released recently). It sounds like we have two approaches: (1) MPTCP
> as a drop-in replacement for TCP, so unmodified applications may use it,
> and (2) MPTCP as an explicitly requested protocol which user-space
> applications request, and then manipulate via socket options. Given the
> amount of drafts and other work put in so far, it seems that use case (2)
> has enough demand that it ought to be supported.
Can you point me to the drafts or requests. As I said with the current 
design everything is possible. No change is needed.

Shoaib
>
> Going slightly off topic from this discussion, it seems that a lot of what
> is desired in use case (2) is allowing the application to manage its own
> paths. We now have a Netlink API for userspace path managers. Have we
> thought about the possibility of a userspace application creating an MPTCP
> socket, and also a netlink socket, so that it could manage its own paths?
> Just an off-the-wall idea. If such a thing were possible, it could give the
> userspace app much more control over its own path management, while
> reducing duplication of functionality in-kernel (e.g. wouldn't have both
> socket option for opening a subflow, plus a netlink command for opening a
> subflow). A userspace library could hide some of this complexity for users
> of use-case (2), while still allowing use-case (1) to exist through the
> normal socket API.
>
> Stephen
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-29 19:11 Rao Shoaib
  0 siblings, 0 replies; 13+ messages in thread
From: Rao Shoaib @ 2018-03-29 19:11 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5534 bytes --]



On 03/29/2018 11:27 AM, Christoph Paasch wrote:
> Hello,
>
> On 28/03/18 - 14:33:03, Stephen Brennan wrote:
>> Hi Rao et al.,
>>
>> On Tue, Mar 27, 2018 at 06:53:46PM -0700, Rao Shoaib wrote:
>>>
>>> On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
>>>> On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com
>>>> <mailto:cpaasch(a)apple.com>> wrote:
>>>>
>>>>      It also would allow us to expose subflows as file-descriptors to the
>>>>      user-space. That way the user-space can do setsockopt,
>>>>      getsockopt,... on the
>>>>      subflows. An idea that came up in the past when we were thinking
>>>>      on how to
>>>>      expose an MPTCP API that allows apps to control certain things on the
>>>>      subflows.
>>>>
>>>>
>>>> +1 to this since it will allow client platforms to use setsockopt to,
>>>> for example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or
>>>> with SO_MARK as used by Android).
>>> May I ask why it can not be done  -- Only flow id should be needed.
>> It sounds like you're referring to the proposed MPTCP_SUB_GETSOCKOPT and
>> MPTCP_SUB_SETSOCKOPT operations in this IETF draft [1], correct?
>>
>> [1]: https://tools.ietf.org/html/draft-hesmans-mptcp-socket-00
>>
>>> Plus for
>>> policy enforcement the meta socket would have to be consulted. I am new to
>>> Linux but based on what I have read SO_BINDTODEVICE does not guarantee much
>>> for Tx if the routing says otherwise, quick check of ip_queue_xmit() shows
>>> it first consults the routing table. I understand policy routing can be used
>>> to steer the packets. Perhaps my understanding is incorrect.
>>>
>>> IMHO this whole idea of exposing individual flows to the application seems
>>> to go against the basic design of MPTCP. I am not sure what the use case is
>>> as TCP is streams based and MPTCP will spread out data on multiple flows. I
>>> am not a involved in the cellular space so I don' know, feel free to
>>> enlighten me.
>> I'm also a bit confused at this. I've read the Socket API drafts from IETF,
>> which seem to be motivated by the idea that mobile apps will want to
>> directly control which subflows and paths are created (similar to the iOS
>> APIs released recently).
> The iOS APIs only allow an app to communicate its intention (handover,
> interactive, aggregate). At the end of the day, the app is not able to
> trigger the creation of a subflow on cellular.
>
> Just nit-picking ;-)
Creating subflows and managing them  can be done today.
>
>> It sounds like we have two approaches: (1) MPTCP
>> as a drop-in replacement for TCP, so unmodified applications may use it,
>> and (2) MPTCP as an explicitly requested protocol which user-space
>> applications request, and then manipulate via socket options. Given the
>> amount of drafts and other work put in so far, it seems that use case (2)
>> has enough demand that it ought to be supported.
> Yes, the drop-in replacement does not work IMO, because the MPTCP-stack
> needs more context to make decisions.
#1 does not exclude #2. You can have both options, system wide or just 
on demand. The current implementation hands over the full packet to 
MPCTP to do whatever it wants.

Provide an example that can not be handled by the current design that I 
have posted.

>
>> Going slightly off topic from this discussion, it seems that a lot of what
>> is desired in use case (2) is allowing the application to manage its own
>> paths. We now have a Netlink API for userspace path managers. Have we
>> thought about the possibility of a userspace application creating an MPTCP
>> socket, and also a netlink socket, so that it could manage its own paths?
>> Just an off-the-wall idea. If such a thing were possible, it could give the
>> userspace app much more control over its own path management, while
>> reducing duplication of functionality in-kernel (e.g. wouldn't have both
>> socket option for opening a subflow, plus a netlink command for opening a
>> subflow).
> Yes, this is definitely a viable solution. The app could use a library that
> is able to speak netlink-API and manage subflows.
>
> Your library? ;-)
Yes that is what I have in mind also but this does not require any 
change in the current design. I have not read the netlink design but I 
thought that it would already have that.
>
>> A userspace library could hide some of this complexity for users
>> of use-case (2), while still allowing use-case (1) to exist through the
>> normal socket API.
> The idea behind exposing subflows as FDs came from the need that apps would
> want to control socket-options on a per-subflow basis. E.g., TCP-keepalive,
> SO_MARK (for Android),...
> The app would want to get stats like TCP_INFO,...
>
> We don't have a good way at the moment to do this. And, using
> file-descriptors seemed like a viable solution.
No it does not. Application needs to handle only one fd. All the above 
can be done but is not in scope for the initial implementation.
>
>
> That doesn't mean that we will do this right away :)
> But - as from my original mail - having fully functional struct socket
> attached to the subflows allows us to move that way in the future.
I do not see any technical argument. In fact I argue that if for example 
fd are needed in the future it would be simple to expose the current 
subflows without requiring any change to MPTCP or TCP. The current 
design does not exclude anything.

Shoaib


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-29 18:27 Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2018-03-29 18:27 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4546 bytes --]

Hello,

On 28/03/18 - 14:33:03, Stephen Brennan wrote:
> Hi Rao et al.,
> 
> On Tue, Mar 27, 2018 at 06:53:46PM -0700, Rao Shoaib wrote:
> > 
> > 
> > On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
> > > On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com
> > > <mailto:cpaasch(a)apple.com>> wrote:
> > > 
> > >     It also would allow us to expose subflows as file-descriptors to the
> > >     user-space. That way the user-space can do setsockopt,
> > >     getsockopt,... on the
> > >     subflows. An idea that came up in the past when we were thinking
> > >     on how to
> > >     expose an MPTCP API that allows apps to control certain things on the
> > >     subflows.
> > > 
> > > 
> > > +1 to this since it will allow client platforms to use setsockopt to,
> > > for example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or
> > > with SO_MARK as used by Android).
> > May I ask why it can not be done  -- Only flow id should be needed.
> 
> It sounds like you're referring to the proposed MPTCP_SUB_GETSOCKOPT and
> MPTCP_SUB_SETSOCKOPT operations in this IETF draft [1], correct?
> 
> [1]: https://tools.ietf.org/html/draft-hesmans-mptcp-socket-00
> 
> > Plus for
> > policy enforcement the meta socket would have to be consulted. I am new to
> > Linux but based on what I have read SO_BINDTODEVICE does not guarantee much
> > for Tx if the routing says otherwise, quick check of ip_queue_xmit() shows
> > it first consults the routing table. I understand policy routing can be used
> > to steer the packets. Perhaps my understanding is incorrect.
> >
> > IMHO this whole idea of exposing individual flows to the application seems
> > to go against the basic design of MPTCP. I am not sure what the use case is
> > as TCP is streams based and MPTCP will spread out data on multiple flows. I
> > am not a involved in the cellular space so I don' know, feel free to
> > enlighten me.
> 
> I'm also a bit confused at this. I've read the Socket API drafts from IETF,
> which seem to be motivated by the idea that mobile apps will want to
> directly control which subflows and paths are created (similar to the iOS
> APIs released recently).

The iOS APIs only allow an app to communicate its intention (handover,
interactive, aggregate). At the end of the day, the app is not able to
trigger the creation of a subflow on cellular.

Just nit-picking ;-)

> It sounds like we have two approaches: (1) MPTCP
> as a drop-in replacement for TCP, so unmodified applications may use it,
> and (2) MPTCP as an explicitly requested protocol which user-space
> applications request, and then manipulate via socket options. Given the
> amount of drafts and other work put in so far, it seems that use case (2)
> has enough demand that it ought to be supported.

Yes, the drop-in replacement does not work IMO, because the MPTCP-stack
needs more context to make decisions.

> Going slightly off topic from this discussion, it seems that a lot of what
> is desired in use case (2) is allowing the application to manage its own
> paths. We now have a Netlink API for userspace path managers. Have we
> thought about the possibility of a userspace application creating an MPTCP
> socket, and also a netlink socket, so that it could manage its own paths?
> Just an off-the-wall idea. If such a thing were possible, it could give the
> userspace app much more control over its own path management, while
> reducing duplication of functionality in-kernel (e.g. wouldn't have both
> socket option for opening a subflow, plus a netlink command for opening a
> subflow).

Yes, this is definitely a viable solution. The app could use a library that
is able to speak netlink-API and manage subflows.

Your library? ;-)

> A userspace library could hide some of this complexity for users
> of use-case (2), while still allowing use-case (1) to exist through the
> normal socket API.

The idea behind exposing subflows as FDs came from the need that apps would
want to control socket-options on a per-subflow basis. E.g., TCP-keepalive,
SO_MARK (for Android),...
The app would want to get stats like TCP_INFO,...

We don't have a good way at the moment to do this. And, using
file-descriptors seemed like a viable solution.


That doesn't mean that we will do this right away :)
But - as from my original mail - having fully functional struct socket
attached to the subflows allows us to move that way in the future.


Christoph


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-29  8:05 Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2018-03-29  8:05 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3404 bytes --]

On 28/03/18 - 09:25:46, Mat Martineau wrote:
> 
> On Wed, 28 Mar 2018, Christoph Paasch wrote:
> 
> > On 27/03/18 - 16:29:26, Mat Martineau wrote:
> > > 
> > > Hi Christoph,
> > > 
> > > On Tue, 27 Mar 2018, Christoph Paasch wrote:
> > > 
> > > > Hello,
> > > > 
> > > > 
> > > > (first, sorry for the long e-mail)
> > > > 
> > > > 
> > > > I now started working on cleaning up the input path to prepare it for
> > > > input-processing without holding the MPTCP-level lock.
> > > > 
> > > > To do this, I go through all the places where we access data-structures from
> > > > the meta-socket and see if I can move it to mptcp_data_ready or
> > > > mptcp_write_space (the callbacks that are called from
> > > > sk_data_ready/sk_write_space).
> > > > 
> > > > In tcp_check_space() we have the following:
> > > >                if (mptcp(tcp_sk(sk)) ||
> > > >                    (sk->sk_socket &&
> > > >                     test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
> > > >                        tcp_new_space(sk);
> > > >                        if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
> > > >                                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
> > > >                }
> > > > 
> > > > We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
> > > > application's struct socket. Thus, we need to avoid the check for
> > > > SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.
> > > > 
> > > > 
> > > > Other kernel-modules that create TCP-connections rather have an in-kernel
> > > > struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
> > > > that the TCP-stack keeps on up-calling. I thought that this was a good thing
> > > > to do in MPTCP as well.
> > > > 
> > > > So, my goal became to have a fully functional struct socket for subflows.
> > > > The benefit is also that we can end up using kernel_sendmsg,
> > > > kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
> > > > the MPTCP-level socket to receive new subflows (a problem I mentioned in an
> > > > earlier mail).
> > > 
> > > I think we'll have a patch set illustrating this approach ready to post
> > > tomorrow.
> > 
> > Awesome! I am waiting for your patch :)
> > 
> > > Peter and I have been working on this net-next-based code with
> > > in-kernel sockets for many months now. I missed the SOCK_NOSPACE detail
> > > though - I'll take a look at that.
> > > 
> > > Rather than using kernel_sendmsg and kernel_recvmsg, I use do_tcp_sendpages
> > > (to reduce copying and let the MPTCP connection layer share data across skbs
> > > and subflows as it needs to) and tcp_read_sock (to read intact skbs out of
> > > the rx queue). The kernel_* functions are used for most everything else
> > > though.
> > 
> > Do you also handle kernel_accept ?
> > If yes, how do you do it? Because, it's quite complex to avoid races.
> 
> Yes, but we don't support joins yet so it's maybe only the trivial case. The
> connection level socket owns a subflow listening socket (which it created at
> init time), and when the application calls accept() it's immediately passed
> through to kernel_accept() for the subflow.

I see you posted your patches. I will try to go over them before our
meeting tonight.


Christoph


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-28 21:33 Stephen Brennan
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Brennan @ 2018-03-28 21:33 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]

Hi Rao et al.,

On Tue, Mar 27, 2018 at 06:53:46PM -0700, Rao Shoaib wrote:
> 
> 
> On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
> > On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com
> > <mailto:cpaasch(a)apple.com>> wrote:
> > 
> >     It also would allow us to expose subflows as file-descriptors to the
> >     user-space. That way the user-space can do setsockopt,
> >     getsockopt,... on the
> >     subflows. An idea that came up in the past when we were thinking
> >     on how to
> >     expose an MPTCP API that allows apps to control certain things on the
> >     subflows.
> > 
> > 
> > +1 to this since it will allow client platforms to use setsockopt to,
> > for example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or
> > with SO_MARK as used by Android).
> May I ask why it can not be done  -- Only flow id should be needed.

It sounds like you're referring to the proposed MPTCP_SUB_GETSOCKOPT and
MPTCP_SUB_SETSOCKOPT operations in this IETF draft [1], correct?

[1]: https://tools.ietf.org/html/draft-hesmans-mptcp-socket-00

> Plus for
> policy enforcement the meta socket would have to be consulted. I am new to
> Linux but based on what I have read SO_BINDTODEVICE does not guarantee much
> for Tx if the routing says otherwise, quick check of ip_queue_xmit() shows
> it first consults the routing table. I understand policy routing can be used
> to steer the packets. Perhaps my understanding is incorrect.
>
> IMHO this whole idea of exposing individual flows to the application seems
> to go against the basic design of MPTCP. I am not sure what the use case is
> as TCP is streams based and MPTCP will spread out data on multiple flows. I
> am not a involved in the cellular space so I don' know, feel free to
> enlighten me.

I'm also a bit confused at this. I've read the Socket API drafts from IETF,
which seem to be motivated by the idea that mobile apps will want to
directly control which subflows and paths are created (similar to the iOS
APIs released recently). It sounds like we have two approaches: (1) MPTCP
as a drop-in replacement for TCP, so unmodified applications may use it,
and (2) MPTCP as an explicitly requested protocol which user-space
applications request, and then manipulate via socket options. Given the
amount of drafts and other work put in so far, it seems that use case (2)
has enough demand that it ought to be supported.

Going slightly off topic from this discussion, it seems that a lot of what
is desired in use case (2) is allowing the application to manage its own
paths. We now have a Netlink API for userspace path managers. Have we
thought about the possibility of a userspace application creating an MPTCP
socket, and also a netlink socket, so that it could manage its own paths?
Just an off-the-wall idea. If such a thing were possible, it could give the
userspace app much more control over its own path management, while
reducing duplication of functionality in-kernel (e.g. wouldn't have both
socket option for opening a subflow, plus a netlink command for opening a
subflow). A userspace library could hide some of this complexity for users
of use-case (2), while still allowing use-case (1) to exist through the
normal socket API.

Stephen



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-28  9:34 Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2018-03-28  9:34 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6104 bytes --]

On 27/03/18 - 16:29:26, Mat Martineau wrote:
> 
> Hi Christoph,
> 
> On Tue, 27 Mar 2018, Christoph Paasch wrote:
> 
> > Hello,
> > 
> > 
> > (first, sorry for the long e-mail)
> > 
> > 
> > I now started working on cleaning up the input path to prepare it for
> > input-processing without holding the MPTCP-level lock.
> > 
> > To do this, I go through all the places where we access data-structures from
> > the meta-socket and see if I can move it to mptcp_data_ready or
> > mptcp_write_space (the callbacks that are called from
> > sk_data_ready/sk_write_space).
> > 
> > In tcp_check_space() we have the following:
> >                if (mptcp(tcp_sk(sk)) ||
> >                    (sk->sk_socket &&
> >                     test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
> >                        tcp_new_space(sk);
> >                        if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
> >                                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
> >                }
> > 
> > We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
> > application's struct socket. Thus, we need to avoid the check for
> > SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.
> > 
> > 
> > Other kernel-modules that create TCP-connections rather have an in-kernel
> > struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
> > that the TCP-stack keeps on up-calling. I thought that this was a good thing
> > to do in MPTCP as well.
> > 
> > So, my goal became to have a fully functional struct socket for subflows.
> > The benefit is also that we can end up using kernel_sendmsg,
> > kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
> > the MPTCP-level socket to receive new subflows (a problem I mentioned in an
> > earlier mail).
> 
> I think we'll have a patch set illustrating this approach ready to post
> tomorrow.

Awesome! I am waiting for your patch :)

> Peter and I have been working on this net-next-based code with
> in-kernel sockets for many months now. I missed the SOCK_NOSPACE detail
> though - I'll take a look at that.
> 
> Rather than using kernel_sendmsg and kernel_recvmsg, I use do_tcp_sendpages
> (to reduce copying and let the MPTCP connection layer share data across skbs
> and subflows as it needs to) and tcp_read_sock (to read intact skbs out of
> the rx queue). The kernel_* functions are used for most everything else
> though.

Do you also handle kernel_accept ?
If yes, how do you do it? Because, it's quite complex to avoid races.

> > It also would allow us to expose subflows as file-descriptors to the
> > user-space. That way the user-space can do setsockopt, getsockopt,... on the
> > subflows. An idea that came up in the past when we were thinking on how to
> > expose an MPTCP API that allows apps to control certain things on the
> > subflows.
> 
> I like that idea too. We hadn't come up with a good design idea for
> propagating many of the getsockopt/setsockopt operations from the
> connection-level socket to all of the subflows.
> 
> > 
> > To get there, there are a few places where things would need to change:
> > 
> > * mptcp_init4_subsockets - Here, this works perfectly. It also allows to
> >  avoid "faking" the struct socket, as we are currently doing.
> > * mptcp_alloc_mpcb for the active opener - This is the first problem. mptcp_alloc_mpcb() can be
> >  called with bh disabled. But sock_create_lite() assumes that bh is enabled
> >  as it ends up doing an alloc with GFP_KERNEL.
> >  A few ways this could be solved:
> >  - Schedule a work-queue item in mptcp_alloc_mpcb that creates the struct
> >    socket. This looks a bit racy to me. Not sure what side-effects this
> >    might have.
> >  - Change things entirely, such that the master-sock is being allocated
> >    when the connection is created. That way, we allocate all the necessary
> >    struct socket's right away.
> >    In the past, we decided to allocate the master-sk only when receiving
> >    the SYN/ACK. We did that so as to minimize the impact on regular TCP
> >    when the server does not support MPTCP. But, as we are moving towards
> >    explicitly exposing MPTCP at the socket-layer, we can rethink that
> >    decision.
> >    Any thoughts? Is it ok to pay the cost of allocating a master-sk before
> >    we know whether the server supports MPTCP?
> >    I think, we should do this, and transition to that model.
> 
> I prefer the latter. If MPTCP is exposed at the socket layer, it's only
> MPTCP sockets that experience the overhead (relative to a regular TCP
> socket) when regular TCP is used.

Are you doing the latter in your implementation?


Christoph

> 
> > * mptcp_alloc_mpcb for the passive opener - same problem as above but on the
> >  other side. We could allocate the master's struct socket upon the accept()
> >  call from the application. This again sounds a bit racy to me. The struct
> >  socket will be there for the subflow potentially much later than it has
> >  been established. What happens if the peer sends data or an
> >  MP_FASTCLOSE,... ?
> > * New subflows on the passive opener side - again, we are receiving those
> >  subflows while bh is disabled. So, we have to schedule a work-queue to
> >  do a kerne_accept() on the MPTCP-socket.
> >  Again something that can potentially be racy.
> > 
> > In general, subflow-establishment on the passive-opener side again seems to
> > be a major pain-point. I think, we really need to redesign that.
> > 
> > 
> > Any thoughts, feedback, suggestions?
> > Or maybe, using real struct socket for subflows is not worth it? :)
> 
> Using the struct socket for subflows will take some work, but I think it's
> an important part of shaping the code for upstream. If the subflows look
> more like regular TCP sockets I think that will help keep the MPTCP code
> well partitioned from TCP.
> 
> 
> --
> Mat Martineau
> Intel OTC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-28  1:53 Rao Shoaib
  0 siblings, 0 replies; 13+ messages in thread
From: Rao Shoaib @ 2018-03-28  1:53 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]



On 03/27/2018 05:04 AM, Lorenzo Colitti wrote:
> On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com 
> <mailto:cpaasch(a)apple.com>> wrote:
>
>     It also would allow us to expose subflows as file-descriptors to the
>     user-space. That way the user-space can do setsockopt,
>     getsockopt,... on the
>     subflows. An idea that came up in the past when we were thinking
>     on how to
>     expose an MPTCP API that allows apps to control certain things on the
>     subflows.
>
>
> +1 to this since it will allow client platforms to use setsockopt to, 
> for example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, 
> or with SO_MARK as used by Android).
May I ask why it can not be done  -- Only flow id should be needed. Plus 
for policy enforcement the meta socket would have to be consulted. I am 
new to Linux but based on what I have read SO_BINDTODEVICE does not 
guarantee much for Tx if the routing says otherwise, quick check of 
ip_queue_xmit() shows it first consults the routing table. I understand 
policy routing can be used to steer the packets. Perhaps my 
understanding is incorrect.

IMHO this whole idea of exposing individual flows to the application 
seems to go against the basic design of MPTCP. I am not sure what the 
use case is as TCP is streams based and MPTCP will spread out data on 
multiple flows. I am not a involved in the cellular space so I don' 
know, feel free to enlighten me.

Shoaib

>
>
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3053 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-27 23:29 Mat Martineau
  0 siblings, 0 replies; 13+ messages in thread
From: Mat Martineau @ 2018-03-27 23:29 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5593 bytes --]


Hi Christoph,

On Tue, 27 Mar 2018, Christoph Paasch wrote:

> Hello,
>
>
> (first, sorry for the long e-mail)
>
>
> I now started working on cleaning up the input path to prepare it for
> input-processing without holding the MPTCP-level lock.
>
> To do this, I go through all the places where we access data-structures from
> the meta-socket and see if I can move it to mptcp_data_ready or
> mptcp_write_space (the callbacks that are called from
> sk_data_ready/sk_write_space).
>
> In tcp_check_space() we have the following:
>                if (mptcp(tcp_sk(sk)) ||
>                    (sk->sk_socket &&
>                     test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
>                        tcp_new_space(sk);
>                        if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
>                                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
>                }
>
> We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
> application's struct socket. Thus, we need to avoid the check for
> SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.
>
>
> Other kernel-modules that create TCP-connections rather have an in-kernel
> struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
> that the TCP-stack keeps on up-calling. I thought that this was a good thing
> to do in MPTCP as well.
>
> So, my goal became to have a fully functional struct socket for subflows.
> The benefit is also that we can end up using kernel_sendmsg,
> kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
> the MPTCP-level socket to receive new subflows (a problem I mentioned in an
> earlier mail).

I think we'll have a patch set illustrating this approach ready to post 
tomorrow. Peter and I have been working on this net-next-based code with 
in-kernel sockets for many months now. I missed the SOCK_NOSPACE detail 
though - I'll take a look at that.

Rather than using kernel_sendmsg and kernel_recvmsg, I use 
do_tcp_sendpages (to reduce copying and let the MPTCP connection layer 
share data across skbs and subflows as it needs to) and tcp_read_sock (to 
read intact skbs out of the rx queue). The kernel_* functions are used for 
most everything else though.

> It also would allow us to expose subflows as file-descriptors to the
> user-space. That way the user-space can do setsockopt, getsockopt,... on the
> subflows. An idea that came up in the past when we were thinking on how to
> expose an MPTCP API that allows apps to control certain things on the
> subflows.

I like that idea too. We hadn't come up with a good design idea for 
propagating many of the getsockopt/setsockopt operations from the 
connection-level socket to all of the subflows.

>
> To get there, there are a few places where things would need to change:
>
> * mptcp_init4_subsockets - Here, this works perfectly. It also allows to
>  avoid "faking" the struct socket, as we are currently doing.
> * mptcp_alloc_mpcb for the active opener - This is the first problem. mptcp_alloc_mpcb() can be
>  called with bh disabled. But sock_create_lite() assumes that bh is enabled
>  as it ends up doing an alloc with GFP_KERNEL.
>  A few ways this could be solved:
>  - Schedule a work-queue item in mptcp_alloc_mpcb that creates the struct
>    socket. This looks a bit racy to me. Not sure what side-effects this
>    might have.
>  - Change things entirely, such that the master-sock is being allocated
>    when the connection is created. That way, we allocate all the necessary
>    struct socket's right away.
>    In the past, we decided to allocate the master-sk only when receiving
>    the SYN/ACK. We did that so as to minimize the impact on regular TCP
>    when the server does not support MPTCP. But, as we are moving towards
>    explicitly exposing MPTCP at the socket-layer, we can rethink that
>    decision.
>    Any thoughts? Is it ok to pay the cost of allocating a master-sk before
>    we know whether the server supports MPTCP?
>    I think, we should do this, and transition to that model.

I prefer the latter. If MPTCP is exposed at the socket layer, it's only 
MPTCP sockets that experience the overhead (relative to a regular TCP 
socket) when regular TCP is used.

> * mptcp_alloc_mpcb for the passive opener - same problem as above but on the
>  other side. We could allocate the master's struct socket upon the accept()
>  call from the application. This again sounds a bit racy to me. The struct
>  socket will be there for the subflow potentially much later than it has
>  been established. What happens if the peer sends data or an
>  MP_FASTCLOSE,... ?
> * New subflows on the passive opener side - again, we are receiving those
>  subflows while bh is disabled. So, we have to schedule a work-queue to
>  do a kerne_accept() on the MPTCP-socket.
>  Again something that can potentially be racy.
>
> In general, subflow-establishment on the passive-opener side again seems to
> be a major pain-point. I think, we really need to redesign that.
>
>
> Any thoughts, feedback, suggestions?
> Or maybe, using real struct socket for subflows is not worth it? :)

Using the struct socket for subflows will take some work, but I think it's 
an important part of shaping the code for upstream. If the subflows look 
more like regular TCP sockets I think that will help keep the MPTCP code 
well partitioned from TCP.


--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-27 12:42 Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2018-03-27 12:42 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On 27/03/18 - 14:04:28, Lorenzo Colitti wrote:
> On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com>
> wrote:
> 
> > It also would allow us to expose subflows as file-descriptors to the
> > user-space. That way the user-space can do setsockopt, getsockopt,... on
> > the
> > subflows. An idea that came up in the past when we were thinking on how to
> > expose an MPTCP API that allows apps to control certain things on the
> > subflows.
> >
> 
> +1 to this since it will allow client platforms to use setsockopt to, for
> example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or with
> SO_MARK as used by Android).

Thanks, Lorenzo!

Have you seen the netlink path-manager patches that have been sent to the
mptcp-dev list?
(https://sympa-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev/2018-03/msg00057.html)

The goal of them is to allow a daemon to control the creation and
destruction of subflows through netlink.


Would that be something that fits your model?


Christoph


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [MPTCP] Using struct socket for subflows
@ 2018-03-27 12:04 Lorenzo Colitti
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Colitti @ 2018-03-27 12:04 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

On Tue, Mar 27, 2018 at 11:18 AM, Christoph Paasch <cpaasch(a)apple.com>
wrote:

> It also would allow us to expose subflows as file-descriptors to the
> user-space. That way the user-space can do setsockopt, getsockopt,... on
> the
> subflows. An idea that came up in the past when we were thinking on how to
> expose an MPTCP API that allows apps to control certain things on the
> subflows.
>

+1 to this since it will allow client platforms to use setsockopt to, for
example, bind subflows to interfaces (e.g., with SO_BINDTODEVICE, or with
SO_MARK as used by Android).

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 886 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [MPTCP] Using struct socket for subflows
@ 2018-03-27  9:18 Christoph Paasch
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Paasch @ 2018-03-27  9:18 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4185 bytes --]

Hello,


(first, sorry for the long e-mail)


I now started working on cleaning up the input path to prepare it for
input-processing without holding the MPTCP-level lock.

To do this, I go through all the places where we access data-structures from
the meta-socket and see if I can move it to mptcp_data_ready or
mptcp_write_space (the callbacks that are called from
sk_data_ready/sk_write_space).

In tcp_check_space() we have the following:
                if (mptcp(tcp_sk(sk)) ||
                    (sk->sk_socket &&
                     test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
                        tcp_new_space(sk);
                        if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
                                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
                }

We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
application's struct socket. Thus, we need to avoid the check for
SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.


Other kernel-modules that create TCP-connections rather have an in-kernel
struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
that the TCP-stack keeps on up-calling. I thought that this was a good thing
to do in MPTCP as well.

So, my goal became to have a fully functional struct socket for subflows.
The benefit is also that we can end up using kernel_sendmsg,
kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
the MPTCP-level socket to receive new subflows (a problem I mentioned in an
earlier mail).
It also would allow us to expose subflows as file-descriptors to the
user-space. That way the user-space can do setsockopt, getsockopt,... on the
subflows. An idea that came up in the past when we were thinking on how to
expose an MPTCP API that allows apps to control certain things on the
subflows.

To get there, there are a few places where things would need to change:

* mptcp_init4_subsockets - Here, this works perfectly. It also allows to
  avoid "faking" the struct socket, as we are currently doing.
* mptcp_alloc_mpcb for the active opener - This is the first problem. mptcp_alloc_mpcb() can be
  called with bh disabled. But sock_create_lite() assumes that bh is enabled
  as it ends up doing an alloc with GFP_KERNEL.
  A few ways this could be solved:
  - Schedule a work-queue item in mptcp_alloc_mpcb that creates the struct
    socket. This looks a bit racy to me. Not sure what side-effects this
    might have.
  - Change things entirely, such that the master-sock is being allocated
    when the connection is created. That way, we allocate all the necessary
    struct socket's right away.
    In the past, we decided to allocate the master-sk only when receiving
    the SYN/ACK. We did that so as to minimize the impact on regular TCP
    when the server does not support MPTCP. But, as we are moving towards
    explicitly exposing MPTCP at the socket-layer, we can rethink that
    decision.
    Any thoughts? Is it ok to pay the cost of allocating a master-sk before
    we know whether the server supports MPTCP?
    I think, we should do this, and transition to that model.
* mptcp_alloc_mpcb for the passive opener - same problem as above but on the
  other side. We could allocate the master's struct socket upon the accept()
  call from the application. This again sounds a bit racy to me. The struct
  socket will be there for the subflow potentially much later than it has
  been established. What happens if the peer sends data or an
  MP_FASTCLOSE,... ?
* New subflows on the passive opener side - again, we are receiving those
  subflows while bh is disabled. So, we have to schedule a work-queue to
  do a kerne_accept() on the MPTCP-socket.
  Again something that can potentially be racy.

In general, subflow-establishment on the passive-opener side again seems to
be a major pain-point. I think, we really need to redesign that.


Any thoughts, feedback, suggestions?
Or maybe, using real struct socket for subflows is not worth it? :)


Thanks,
Christoph


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-03-29 20:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28 16:25 [MPTCP] Using struct socket for subflows Mat Martineau
  -- strict thread matches above, loose matches on Subject: below --
2018-03-29 20:01 Rao Shoaib
2018-03-29 19:19 Rao Shoaib
2018-03-29 19:11 Rao Shoaib
2018-03-29 18:27 Christoph Paasch
2018-03-29  8:05 Christoph Paasch
2018-03-28 21:33 Stephen Brennan
2018-03-28  9:34 Christoph Paasch
2018-03-28  1:53 Rao Shoaib
2018-03-27 23:29 Mat Martineau
2018-03-27 12:42 Christoph Paasch
2018-03-27 12:04 Lorenzo Colitti
2018-03-27  9:18 Christoph Paasch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.