All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] roadmap proposal/discussion
@ 2019-07-17 17:00 Paolo Abeni
  0 siblings, 0 replies; 2+ messages in thread
From: Paolo Abeni @ 2019-07-17 17:00 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2153 bytes --]

Hi all,

I spent some time reviewing the current code-base looking for missing
pieces required for multiple subflow support, and I think there are a
bunch of things that would be needed no matter what.

I'd like to share the list, with some random implementation details, to
possibly avoid duplicate effort and/or large misunderstanding on my
side. Here it goes:

* Update msk data_ack:
  - hook in mptcp_attach_dss() - possibly rename the function to    
      mptcp_data_ready()
  - cope with 32bit ack 
  - possibly some additional msk-level [spin-]lock would be required to
    protect from concurrent updates when multiple sub-flows are running
    - while at this, eventually re-consider (again!) conn_list
      locking ?!?

* Store in msk pending data, to allow retransmissions
  - in sendmsg(), queue pending (unacked) xmit msg on msk rtx_queue
    - does it need to be a rb tree?
      - sort by mptcp seq
        - cope with 32bit seq
  - allocate the node element from the msk page frag's allocator
    - each node must contains: all the data frag, mptcp seq, 
      [mptcp seq_end?]
  - NO need to check vs mptcp window before tcp_push()
        unless we want to update the window size as per RFC
          (max of subflow's window size)
        because msk ws is greater equal than subflows ws
  - free acked nodes (and data) still in sendmsg(),
    just after acquiring msk lock, before doing any real
    sendmsg related work

* Implement msk-level retransmission
  - add msk-level [hr-]timer
  - schedule it in sendmsg, after tcp_push()
    - e.g. if msk ack != msk seq 
    - which timeout? subflow's rtt based ?
  - when the timer expires, trigger retransmit if msk ack is not
    changed (increased) since the timer's scheduling
    - reuse/factor-out part of current sendmsg_frag() retransmit
      a bunch of already-in-kernel page frags.

I hope the all above can be quite independent from e.g. path management
implementation.

If there is agreement on that, my plan would be to work on the listed
items, in order. As usual, any feedback more than welcome!

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [MPTCP] roadmap proposal/discussion
@ 2019-07-26 22:46 Mat Martineau
  0 siblings, 0 replies; 2+ messages in thread
From: Mat Martineau @ 2019-07-26 22:46 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4606 bytes --]

On Wed, 17 Jul 2019, Paolo Abeni wrote:

> Hi all,
>
> I spent some time reviewing the current code-base looking for missing
> pieces required for multiple subflow support, and I think there are a
> bunch of things that would be needed no matter what.
>
> I'd like to share the list, with some random implementation details, to
> possibly avoid duplicate effort and/or large misunderstanding on my
> side. Here it goes:
>
> * Update msk data_ack:
>  - hook in mptcp_attach_dss() - possibly rename the function to
>      mptcp_data_ready()

That's a good place for a hook. Peter's current MP_JOIN patch set has a 
rename already.

>  - cope with 32bit ack

The current prototype cheats by ignoring the incoming acks. With the 
single-subflow assumption, the MPTCP-level socket isn't yet keeping a copy 
of the transmitted data and there's no action to take with incoming acks. 
The peer may send 32-bit acks so we do need to handle both ack lengths 
when we add ack handling.

>  - possibly some additional msk-level [spin-]lock would be required to
>    protect from concurrent updates when multiple sub-flows are running

Possibly. cmpxchg (with appropriate barriers) might work well for updating 
the ack since it keeps moving forward. If there was a concurrent update it 
would get detected and it would be clear whether the update should be 
attempted again. The mptcp-level transmit buffer could be trimmed later 
when the mptcp socket can be properly locked.

>    - while at this, eventually re-consider (again!) conn_list
>      locking ?!?
>
> * Store in msk pending data, to allow retransmissions
>  - in sendmsg(), queue pending (unacked) xmit msg on msk rtx_queue
>    - does it need to be a rb tree?
>      - sort by mptcp seq
>        - cope with 32bit seq

I don't think we need an rb tree. Most of the usage will be appending new 
data and trimming the head of the list as acks arrive. If we need to 
retransmit, we start from the first non-acked byte, which is the head of 
the list. Since we don't have SACK functionality, it doesn't seem like we 
will attempt to retransmit from the middle of the unacked data unless 
there's a subflow failure we're trying to recover from - and that's a rare 
enough event we can just go through the pending data linearly. Maybe the 
multipath-tcp.org kernel has some lessons for us here :)

>  - allocate the node element from the msk page frag's allocator
>    - each node must contains: all the data frag, mptcp seq,
>      [mptcp seq_end?]

Do we need much beyond a list of page frags, the mptcp seq for the 
beginning byte, and the length?

If we get fancy about handling retransmissions we could keep track of 
which subflow the data was already sent on, but it looks like we don't 
need that to get basic functionality up and running.

>  - NO need to check vs mptcp window before tcp_push()
>        unless we want to update the window size as per RFC
>          (max of subflow's window size)
>        because msk ws is greater equal than subflows ws
>  - free acked nodes (and data) still in sendmsg(),
>    just after acquiring msk lock, before doing any real
>    sendmsg related work

We can also check on the recvmsg side and in a workqueue (maybe shared 
with the retransmit check?) so memory gets freed up even when the socket 
isn't sending all the time.

>
> * Implement msk-level retransmission
>  - add msk-level [hr-]timer
>  - schedule it in sendmsg, after tcp_push()
>    - e.g. if msk ack != msk seq
>    - which timeout? subflow's rtt based ?

The RFC mentions "a few subflow-level retransmission timeouts" as a 
conservative policy (section 3.3.6).

>  - when the timer expires, trigger retransmit if msk ack is not
>    changed (increased) since the timer's scheduling
>    - reuse/factor-out part of current sendmsg_frag() retransmit
>      a bunch of already-in-kernel page frags.

That's what I've been thinking. Whether a first-time transmission or a 
retransmission, we'll take the data from a list of page frags and 
do_tcp_sendpages().

>
> I hope the all above can be quite independent from e.g. path management
> implementation.

Yes, it should be.

>
> If there is agreement on that, my plan would be to work on the listed
> items, in order. As usual, any feedback more than welcome!

Ok. I've been planning to work on these tasks too, but am not ready to 
start on them today. You should dive in. After I get a couple of other 
tasks out of the way I can coordinate with you.


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-07-26 22:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17 17:00 [MPTCP] roadmap proposal/discussion Paolo Abeni
2019-07-26 22:46 Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.