All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-23  7:46 Paolo Abeni
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Abeni @ 2019-09-23  7:46 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2387 bytes --]

On Fri, 2019-09-20 at 15:45 -0700, Mat Martineau wrote:
> On Fri, 20 Sep 2019, Paolo Abeni wrote:
> 
> > Currently we can hit a stream's corruption with the following sequence:
> > - sendmsg_frag is invoked while the write_queue tail skb's len is equal to
> >  the current size_goal.
> >  sendmsg_frag sets can_collapse to false, even if mptcp data are in sequence.
> >  Anyway it does not set eor in the write_queue skb.
> > - sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new skb -
> >  as the 'size_goal - skb->len <= 0' condition is true.
> >  But it hits memory limit and/or such allocation fails, so do_tcp_sendmsg()
> >  jumps to the 'wait_for_memory:' label and invokes tcp_push()
> > - tcp_push() tries to transmit the write queue tail skb, but due to window size
> >  limit, split it, send the first half and leave in the write queue a smaller
> >  skb - without any ext attached
> > - do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again for skb
> >  allocation, this time skb->len is less than size_goal and do_tcp_sendmsg()
> >  collapse the data on the existing skb
> > - sendmsg_frag adds a new ext to the tail skb in write queue, but due to the above
> >  it is associated with a wrong/unexpected subflow sequence number: it maps
> >  a future ssn -> stream corruption
> > 
> > This change addresses the above issue explicitly setting the 'eor' when hitting
> > the initial condition so that tcp will not collapse the data after tcp_push().
> 
> Thanks for the detailed description. Code looks good to me and tested fine 
> (although failures remain, but I don't think this was intended to fix 
> everything).

I haven't seen any failure on top of this patch (plus recvmsg
refactor), could you share debug enabled dmsg and/or pcap trace for
failure you see?

There is still at least a possible known bad scenario: it's quite alike
the above, but additionally the window size shrinks at
sk_stream_wait_memory() invocation time so that on next iterations
do_tcp_sendpages() ends-up creating multiple skbs. All except the last
one are sent without any DSS attached and that will corrupt the stream.

AFAICS there is no way of fixing the above, except replacing
do_tcp_sendpages() call with a chunk of almost duplicate code in mptcp,
that handles such corner case explicitly.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-24  9:12 Matthieu Baerts
  0 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2019-09-24  9:12 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

Hi Paolo,

On 24/09/2019 08:55, Paolo Abeni wrote:
> On Mon, 2019-09-23 at 16:35 -0700, Mat Martineau wrote:
>> On Mon, 23 Sep 2019, Matthieu Baerts wrote:
>> On 23/09/2019 09:46, Paolo Abeni wrote:
>>>> On Fri, 2019-09-20 at 15:45 -0700, Mat Martineau wrote:
>>>>> Thanks for the detailed description. Code looks good to me and tested fine
>>>>> (although failures remain, but I don't think this was intended to fix
>>>>> everything).
>>>>
>>>> I haven't seen any failure on top of this patch (plus recvmsg
>>>> refactor), could you share debug enabled dmsg and/or pcap trace for
>>>> failure you see?
>>>
>>> I also saw failed tests but everything was fixed once "recvmsg refactor"
>>> was applied too!
>>
>> I re-ran the tests with both patch sets, and all pass. Sorry for the 
>> false alarm - I had simply grabbed the export branch and applied the 
>> series.
> 
> Nice! thank you for the confirm! 
> 
> Note: I think/I'm pretty sure there is still at least another possible
> stream/data corruption pending into the current code base - see my
> previous email for the details - but so far I was unable to
> programmatically trigger it - likely pktdrill should help there.
> 
> Anyhow, given the difficulty to trigger it and amount of pending tasks,
> I don't plan to try to address such issue in the short term.

I think that's a good idea not to address such issue now but should we
log the analyse you did and the possible bug somewhere? e.g.

  https://github.com/multipath-tcp/mptcp_net-next/issues

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-24  6:55 Paolo Abeni
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Abeni @ 2019-09-24  6:55 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

On Mon, 2019-09-23 at 16:35 -0700, Mat Martineau wrote:
> On Mon, 23 Sep 2019, Matthieu Baerts wrote:
> On 23/09/2019 09:46, Paolo Abeni wrote:
> > > On Fri, 2019-09-20 at 15:45 -0700, Mat Martineau wrote:
> > > > Thanks for the detailed description. Code looks good to me and tested fine
> > > > (although failures remain, but I don't think this was intended to fix
> > > > everything).
> > > 
> > > I haven't seen any failure on top of this patch (plus recvmsg
> > > refactor), could you share debug enabled dmsg and/or pcap trace for
> > > failure you see?
> > 
> > I also saw failed tests but everything was fixed once "recvmsg refactor"
> > was applied too!
> 
> I re-ran the tests with both patch sets, and all pass. Sorry for the 
> false alarm - I had simply grabbed the export branch and applied the 
> series.

Nice! thank you for the confirm! 

Note: I think/I'm pretty sure there is still at least another possible
stream/data corruption pending into the current code base - see my
previous email for the details - but so far I was unable to
programmatically trigger it - likely pktdrill should help there.

Anyhow, given the difficulty to trigger it and amount of pending tasks,
I don't plan to try to address such issue in the short term.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-23 23:35 Mat Martineau
  0 siblings, 0 replies; 8+ messages in thread
From: Mat Martineau @ 2019-09-23 23:35 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On Mon, 23 Sep 2019, Matthieu Baerts wrote:

> Hi Paolo,
>
> On 23/09/2019 09:46, Paolo Abeni wrote:
>> On Fri, 2019-09-20 at 15:45 -0700, Mat Martineau wrote:
>>> On Fri, 20 Sep 2019, Paolo Abeni wrote:
>>>
>>>> Currently we can hit a stream's corruption with the following sequence:
>>>> - sendmsg_frag is invoked while the write_queue tail skb's len is equal to
>>>>  the current size_goal.
>>>>  sendmsg_frag sets can_collapse to false, even if mptcp data are in sequence.
>>>>  Anyway it does not set eor in the write_queue skb.
>>>> - sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new skb -
>>>>  as the 'size_goal - skb->len <= 0' condition is true.
>>>>  But it hits memory limit and/or such allocation fails, so do_tcp_sendmsg()
>>>>  jumps to the 'wait_for_memory:' label and invokes tcp_push()
>>>> - tcp_push() tries to transmit the write queue tail skb, but due to window size
>>>>  limit, split it, send the first half and leave in the write queue a smaller
>>>>  skb - without any ext attached
>>>> - do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again for skb
>>>>  allocation, this time skb->len is less than size_goal and do_tcp_sendmsg()
>>>>  collapse the data on the existing skb
>>>> - sendmsg_frag adds a new ext to the tail skb in write queue, but due to the above
>>>>  it is associated with a wrong/unexpected subflow sequence number: it maps
>>>>  a future ssn -> stream corruption
>>>>
>>>> This change addresses the above issue explicitly setting the 'eor' when hitting
>>>> the initial condition so that tcp will not collapse the data after tcp_push().
>>>
>>> Thanks for the detailed description. Code looks good to me and tested fine
>>> (although failures remain, but I don't think this was intended to fix
>>> everything).
>>
>> I haven't seen any failure on top of this patch (plus recvmsg
>> refactor), could you share debug enabled dmsg and/or pcap trace for
>> failure you see?
>
> I also saw failed tests but everything was fixed once "recvmsg refactor"
> was applied too!

I re-ran the tests with both patch sets, and all pass. Sorry for the 
false alarm - I had simply grabbed the export branch and applied the 
series.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-23  8:17 Matthieu Baerts
  0 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2019-09-23  8:17 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

Hi Paolo,

On 23/09/2019 09:46, Paolo Abeni wrote:
> On Fri, 2019-09-20 at 15:45 -0700, Mat Martineau wrote:
>> On Fri, 20 Sep 2019, Paolo Abeni wrote:
>>
>>> Currently we can hit a stream's corruption with the following sequence:
>>> - sendmsg_frag is invoked while the write_queue tail skb's len is equal to
>>>  the current size_goal.
>>>  sendmsg_frag sets can_collapse to false, even if mptcp data are in sequence.
>>>  Anyway it does not set eor in the write_queue skb.
>>> - sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new skb -
>>>  as the 'size_goal - skb->len <= 0' condition is true.
>>>  But it hits memory limit and/or such allocation fails, so do_tcp_sendmsg()
>>>  jumps to the 'wait_for_memory:' label and invokes tcp_push()
>>> - tcp_push() tries to transmit the write queue tail skb, but due to window size
>>>  limit, split it, send the first half and leave in the write queue a smaller
>>>  skb - without any ext attached
>>> - do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again for skb
>>>  allocation, this time skb->len is less than size_goal and do_tcp_sendmsg()
>>>  collapse the data on the existing skb
>>> - sendmsg_frag adds a new ext to the tail skb in write queue, but due to the above
>>>  it is associated with a wrong/unexpected subflow sequence number: it maps
>>>  a future ssn -> stream corruption
>>>
>>> This change addresses the above issue explicitly setting the 'eor' when hitting
>>> the initial condition so that tcp will not collapse the data after tcp_push().
>>
>> Thanks for the detailed description. Code looks good to me and tested fine 
>> (although failures remain, but I don't think this was intended to fix 
>> everything).
> 
> I haven't seen any failure on top of this patch (plus recvmsg
> refactor), could you share debug enabled dmsg and/or pcap trace for
> failure you see?

I also saw failed tests but everything was fixed once "recvmsg refactor"
was applied too!

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-21 10:32 Matthieu Baerts
  0 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2019-09-21 10:32 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 7282 bytes --]

Hi Paolo, Mat,

On 21/09/2019 00:45, Mat Martineau wrote:
> 
> On Fri, 20 Sep 2019, Paolo Abeni wrote:
> 
>> Currently we can hit a stream's corruption with the following sequence:
>> - sendmsg_frag is invoked while the write_queue tail skb's len is
>> equal to
>>  the current size_goal.
>>  sendmsg_frag sets can_collapse to false, even if mptcp data are in
>> sequence.
>>  Anyway it does not set eor in the write_queue skb.
>> - sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new
>> skb -
>>  as the 'size_goal - skb->len <= 0' condition is true.
>>  But it hits memory limit and/or such allocation fails, so
>> do_tcp_sendmsg()
>>  jumps to the 'wait_for_memory:' label and invokes tcp_push()
>> - tcp_push() tries to transmit the write queue tail skb, but due to
>> window size
>>  limit, split it, send the first half and leave in the write queue a
>> smaller
>>  skb - without any ext attached
>> - do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again
>> for skb
>>  allocation, this time skb->len is less than size_goal and
>> do_tcp_sendmsg()
>>  collapse the data on the existing skb
>> - sendmsg_frag adds a new ext to the tail skb in write queue, but due
>> to the above
>>  it is associated with a wrong/unexpected subflow sequence number: it
>> maps
>>  a future ssn -> stream corruption
>>
>> This change addresses the above issue explicitly setting the 'eor'
>> when hitting
>> the initial condition so that tcp will not collapse the data after
>> tcp_push().
> 
> Thanks for the detailed description. Code looks good to me and tested
> fine (although failures remain, but I don't think this was intended to
> fix everything).

Thank you for this patch, the nice description and the review!

- ca356e76c1fe: "squashed" in "mptcp: allow collapsing consecutive
sendpages on the same substream"
- signed-off was already there
- 68f9e3f86b09: conflict in
t/mptcp-rework-mptcp_sendmsg_frag-to-accept-optional-dfrag
- 1b7998a9a0aa..782be479bf43: result

Here is the output of kselftest exec:

# MPTCP is disabled by default as expected	[ OK ]
# ns1  MPTCP -> ns1  (10.0.1.1:10000) MPTCP	[ OK ]
# ns1  MPTCP -> ns1  (10.0.1.1:10001) TCP  	[ OK ]
# ns1  TCP   -> ns1  (10.0.1.1:10002) MPTCP	[ OK ]
# ns1  MPTCP -> ns2  (10.0.1.2:10003) MPTCP	[ OK ]
# ns1  MPTCP -> ns2  (10.0.1.2:10004) TCP  	[ OK ]
# ns1  TCP   -> ns2  (10.0.1.2:10005) MPTCP	[ OK ]
# ns1  MPTCP -> ns2  (10.0.2.1:10006) MPTCP	[ OK ]
# ns1  MPTCP -> ns2  (10.0.2.1:10007) TCP  	[ OK ]
# ns1  TCP   -> ns2  (10.0.2.1:10008) MPTCP	[ OK ]
# ns1  MPTCP -> ns3  (10.0.2.2:10009) MPTCP	[ OK ]
# ns1  MPTCP -> ns3  (10.0.2.2:10010) TCP  	[ OK ]
# ns1  TCP   -> ns3  (10.0.2.2:10011) MPTCP	[ OK ]
# ns1  MPTCP -> ns3  (10.0.3.2:10012) MPTCP	[ OK ]
# ns1  MPTCP -> ns3  (10.0.3.2:10013) TCP  	[ OK ]
# ns1  TCP   -> ns3  (10.0.3.2:10014) MPTCP	[ OK ]
# ns1  MPTCP -> ns4  (10.0.3.1:10015) MPTCP	[ FAIL ] file received by
server does not match (in, out):
# -rw------- 1 root root 5095452 Sep 21 10:23 /tmp/tmp.zY3i7pjyW3
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 4900057 Sep 21 10:23 /tmp/tmp.6H8H1PP7hu
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns2  MPTCP -> ns1  (10.0.1.1:10016) MPTCP	[ OK ]
# ns2  MPTCP -> ns1  (10.0.1.1:10017) TCP  	[ OK ]
# ns2  TCP   -> ns1  (10.0.1.1:10018) MPTCP	[ OK ]
# ns2  MPTCP -> ns2  (10.0.1.2:10019) MPTCP	[ OK ]
# ns2  MPTCP -> ns2  (10.0.1.2:10020) TCP  	[ OK ]
# ns2  TCP   -> ns2  (10.0.1.2:10021) MPTCP	[ OK ]
# ns2  MPTCP -> ns2  (10.0.2.1:10022) MPTCP	[ OK ]
# ns2  MPTCP -> ns2  (10.0.2.1:10023) TCP  	[ OK ]
# ns2  TCP   -> ns2  (10.0.2.1:10024) MPTCP	[ OK ]
# ns2  MPTCP -> ns3  (10.0.2.2:10025) MPTCP	[ OK ]
# ns2  MPTCP -> ns3  (10.0.2.2:10026) TCP  	[ OK ]
# ns2  TCP   -> ns3  (10.0.2.2:10027) MPTCP	[ OK ]
# ns2  MPTCP -> ns3  (10.0.3.2:10028) MPTCP	[ OK ]
# ns2  MPTCP -> ns3  (10.0.3.2:10029) TCP  	[ OK ]
# ns2  TCP   -> ns3  (10.0.3.2:10030) MPTCP	[ OK ]
# ns2  MPTCP -> ns4  (10.0.3.1:10031) MPTCP	[ FAIL ] file received by
server does not match (in, out):
# -rw------- 1 root root 5095452 Sep 21 10:23 /tmp/tmp.zY3i7pjyW3
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 5049520 Sep 21 10:24 /tmp/tmp.6H8H1PP7hu
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns3  MPTCP -> ns1  (10.0.1.1:10032) MPTCP	[ OK ]
# ns3  MPTCP -> ns1  (10.0.1.1:10033) TCP  	[ OK ]
# ns3  TCP   -> ns1  (10.0.1.1:10034) MPTCP	[ OK ]
# ns3  MPTCP -> ns2  (10.0.1.2:10035) MPTCP	[ OK ]
# ns3  MPTCP -> ns2  (10.0.1.2:10036) TCP  	[ OK ]
# ns3  TCP   -> ns2  (10.0.1.2:10037) MPTCP	[ OK ]
# ns3  MPTCP -> ns2  (10.0.2.1:10038) MPTCP	[ OK ]
# ns3  MPTCP -> ns2  (10.0.2.1:10039) TCP  	[ OK ]
# ns3  TCP   -> ns2  (10.0.2.1:10040) MPTCP	[ OK ]
# ns3  MPTCP -> ns3  (10.0.2.2:10041) MPTCP	[ OK ]
# ns3  MPTCP -> ns3  (10.0.2.2:10042) TCP  	[ OK ]
# ns3  TCP   -> ns3  (10.0.2.2:10043) MPTCP	[ OK ]
# ns3  MPTCP -> ns3  (10.0.3.2:10044) MPTCP	[ OK ]
# ns3  MPTCP -> ns3  (10.0.3.2:10045) TCP  	[ OK ]
# ns3  TCP   -> ns3  (10.0.3.2:10046) MPTCP	[ OK ]
# ns3  MPTCP -> ns4  (10.0.3.1:10047) MPTCP	[ OK ]
# ns3  MPTCP -> ns4  (10.0.3.1:10048) TCP  	[ OK ]
# ns3  TCP   -> ns4  (10.0.3.1:10049) MPTCP	[ OK ]
# ns4  MPTCP -> ns1  (10.0.1.1:10050) MPTCP	[ FAIL ] file received by
client does not match (in, out):
# -rw------- 1 root root 2296860 Sep 21 10:23 /tmp/tmp.eiXCYMBq69
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 2241152 Sep 21 10:24 /tmp/tmp.ypdqO6UziE
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns4  MPTCP -> ns2  (10.0.1.2:10051) MPTCP	[ OK ]
# ns4  MPTCP -> ns2  (10.0.1.2:10052) TCP  	[ OK ]
# ns4  TCP   -> ns2  (10.0.1.2:10053) MPTCP	[ OK ]
# ns4  MPTCP -> ns2  (10.0.2.1:10054) MPTCP	[ FAIL ] file received by
client does not match (in, out):
# -rw------- 1 root root 2296860 Sep 21 10:23 /tmp/tmp.eiXCYMBq69
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 2254482 Sep 21 10:24 /tmp/tmp.ypdqO6UziE
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns4  MPTCP -> ns3  (10.0.2.2:10055) MPTCP	[ FAIL ] file received by
client does not match (in, out):
# -rw------- 1 root root 2296860 Sep 21 10:23 /tmp/tmp.eiXCYMBq69
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 2280476 Sep 21 10:25 /tmp/tmp.ypdqO6UziE
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns4  MPTCP -> ns3  (10.0.3.2:10056) MPTCP	[ FAIL ] file received by
client does not match (in, out):
# -rw------- 1 root root 2296860 Sep 21 10:23 /tmp/tmp.eiXCYMBq69
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# -rw------- 1 root root 2212876 Sep 21 10:25 /tmp/tmp.ypdqO6UziE
# Trailing bytes are:
# MPTCP_TEST_FILE_END_MARKER
# ns4  MPTCP -> ns4  (10.0.3.1:10057) MPTCP	[ OK ]
# ns4  MPTCP -> ns4  (10.0.3.1:10058) TCP  	[ OK ]
# ns4  TCP   -> ns4  (10.0.3.1:10059) MPTCP	[ OK ]
not ok 1 selftests: mptcp: mptcp_connect.sh


Cheers,
Matt


-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-20 22:45 Mat Martineau
  0 siblings, 0 replies; 8+ messages in thread
From: Mat Martineau @ 2019-09-20 22:45 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3515 bytes --]


On Fri, 20 Sep 2019, Paolo Abeni wrote:

> Currently we can hit a stream's corruption with the following sequence:
> - sendmsg_frag is invoked while the write_queue tail skb's len is equal to
>  the current size_goal.
>  sendmsg_frag sets can_collapse to false, even if mptcp data are in sequence.
>  Anyway it does not set eor in the write_queue skb.
> - sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new skb -
>  as the 'size_goal - skb->len <= 0' condition is true.
>  But it hits memory limit and/or such allocation fails, so do_tcp_sendmsg()
>  jumps to the 'wait_for_memory:' label and invokes tcp_push()
> - tcp_push() tries to transmit the write queue tail skb, but due to window size
>  limit, split it, send the first half and leave in the write queue a smaller
>  skb - without any ext attached
> - do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again for skb
>  allocation, this time skb->len is less than size_goal and do_tcp_sendmsg()
>  collapse the data on the existing skb
> - sendmsg_frag adds a new ext to the tail skb in write queue, but due to the above
>  it is associated with a wrong/unexpected subflow sequence number: it maps
>  a future ssn -> stream corruption
>
> This change addresses the above issue explicitly setting the 'eor' when hitting
> the initial condition so that tcp will not collapse the data after tcp_push().

Thanks for the detailed description. Code looks good to me and tested fine 
(although failures remain, but I don't think this was intended to fix 
everything).

Mat


>
> Fixes: 6c80a73a33c2 ("mptcp: allow collapsing consecutive sendpages on the same substream")
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> --
> Notes:
> - based on top of "mptcp: implement retransmit infrastructure",
> - the issue is present even without that series, but it become somewhat
>  reproducible with both the above triggering MP_JOIN.
> - I guess squashing to the specific commit may cause a bit of rebase conflict,
>  otherwise it may be squashed to "mptcp: rework mptcp_sendmsg_frag to accept optional dfrag"
> ---
> net/mptcp/protocol.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 308ced6955a8..e91f761d0da6 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -217,19 +217,16 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
>
> 		/* Limit the write to the size available in the
> 		 * current skb, if any, so that we create at most a new skb.
> -		 * If we run out of space in the current skb (e.g. the window
> -		 * size shrunk from last sent) a new skb will be allocated even
> -		 * is collapsing was allowed: collapsing is effectively
> -		 * disabled.
> +		 * Explicitly tells TCP internals to avoid collapsing on later
> +		 * queue management operation, to avoid breaking the ext <->
> +		 * SSN association set here
> 		 */
> -		can_collapse = mptcp_skb_can_collapse_to(*write_seq, skb,
> -							 mpext);
> +		can_collapse = (size_goal - skb->len > 0) &&
> +			      mptcp_skb_can_collapse_to(*write_seq, skb, mpext);
> 		if (!can_collapse)
> 			TCP_SKB_CB(skb)->eor = 1;
> -		else if (size_goal - skb->len > 0)
> -			avail_size = size_goal - skb->len;
> 		else
> -			can_collapse = false;
> +			avail_size = size_goal - skb->len;
> 	}
>
> 	if (!retransmission) {
> -- 
> 2.21.0
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution
@ 2019-09-20 16:58 Paolo Abeni
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Abeni @ 2019-09-20 16:58 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3163 bytes --]

Currently we can hit a stream's corruption with the following sequence:
- sendmsg_frag is invoked while the write_queue tail skb's len is equal to
  the current size_goal.
  sendmsg_frag sets can_collapse to false, even if mptcp data are in sequence.
  Anyway it does not set eor in the write_queue skb.
- sendmsg_frag invokes do_tcp_sendmsg(), which tries to allocate a new skb -
  as the 'size_goal - skb->len <= 0' condition is true.
  But it hits memory limit and/or such allocation fails, so do_tcp_sendmsg()
  jumps to the 'wait_for_memory:' label and invokes tcp_push()
- tcp_push() tries to transmit the write queue tail skb, but due to window size
  limit, split it, send the first half and leave in the write queue a smaller
  skb - without any ext attached
- do_tcp_sendmsg() invokes sk_stream_wait_memory and than checks again for skb
  allocation, this time skb->len is less than size_goal and do_tcp_sendmsg()
  collapse the data on the existing skb
- sendmsg_frag adds a new ext to the tail skb in write queue, but due to the above
  it is associated with a wrong/unexpected subflow sequence number: it maps
  a future ssn -> stream corruption

This change addresses the above issue explicitly setting the 'eor' when hitting
the initial condition so that tcp will not collapse the data after tcp_push().

Fixes: 6c80a73a33c2 ("mptcp: allow collapsing consecutive sendpages on the same substream")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
--
Notes:
- based on top of "mptcp: implement retransmit infrastructure",
- the issue is present even without that series, but it become somewhat
  reproducible with both the above triggering MP_JOIN.
- I guess squashing to the specific commit may cause a bit of rebase conflict, 
  otherwise it may be squashed to "mptcp: rework mptcp_sendmsg_frag to accept optional dfrag"
---
 net/mptcp/protocol.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 308ced6955a8..e91f761d0da6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -217,19 +217,16 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 
 		/* Limit the write to the size available in the
 		 * current skb, if any, so that we create at most a new skb.
-		 * If we run out of space in the current skb (e.g. the window
-		 * size shrunk from last sent) a new skb will be allocated even
-		 * is collapsing was allowed: collapsing is effectively
-		 * disabled.
+		 * Explicitly tells TCP internals to avoid collapsing on later
+		 * queue management operation, to avoid breaking the ext <->
+		 * SSN association set here
 		 */
-		can_collapse = mptcp_skb_can_collapse_to(*write_seq, skb,
-							 mpext);
+		can_collapse = (size_goal - skb->len > 0) &&
+			      mptcp_skb_can_collapse_to(*write_seq, skb, mpext);
 		if (!can_collapse)
 			TCP_SKB_CB(skb)->eor = 1;
-		else if (size_goal - skb->len > 0)
-			avail_size = size_goal - skb->len;
 		else
-			can_collapse = false;
+			avail_size = size_goal - skb->len;
 	}
 
 	if (!retransmission) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-24  9:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23  7:46 [MPTCP] [PATCH] mptcp: sendmsg: fix stream corrution Paolo Abeni
  -- strict thread matches above, loose matches on Subject: below --
2019-09-24  9:12 Matthieu Baerts
2019-09-24  6:55 Paolo Abeni
2019-09-23 23:35 Mat Martineau
2019-09-23  8:17 Matthieu Baerts
2019-09-21 10:32 Matthieu Baerts
2019-09-20 22:45 Mat Martineau
2019-09-20 16:58 Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.