All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: ack when we get an OOO/lost packet
@ 2015-08-12 15:16 Josef Bacik
  2015-08-13  8:19 ` Andrei Borzenkov
  2015-12-07 17:59 ` Andrei Borzenkov
  0 siblings, 2 replies; 9+ messages in thread
From: Josef Bacik @ 2015-08-12 15:16 UTC (permalink / raw)
  To: kernel-team, grub-devel; +Cc: Josef Bacik

While adding tcp window scaling support I was finding that I'd get some packet
loss or reordering when transferring from large distances and grub would just
timeout.  This is because we weren't ack'ing when we got our OOO packet, so the
sender didn't know it needed to retransmit anything, so eventually it would fill
the window and stop transmitting, and we'd time out.  Fix this by ACK'ing when
we don't find our next sequence numbered packet.  With this fix I no longer time
out.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 grub-core/net/tcp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/grub-core/net/tcp.c b/grub-core/net/tcp.c
index 25720b1..6b411dd 100644
--- a/grub-core/net/tcp.c
+++ b/grub-core/net/tcp.c
@@ -902,7 +902,10 @@ grub_net_recv_tcp_packet (struct grub_net_buff *nb,
 	  grub_priority_queue_pop (sock->pq);
 	}
       if (grub_be_to_cpu32 (tcph->seqnr) != sock->their_cur_seq)
-	return GRUB_ERR_NONE;
+	{
+	  ack (sock);
+	  return GRUB_ERR_NONE;
+	}
       while (1)
 	{
 	  nb_top_p = grub_priority_queue_top (sock->pq);
-- 
1.8.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-12 15:16 [PATCH] tcp: ack when we get an OOO/lost packet Josef Bacik
@ 2015-08-13  8:19 ` Andrei Borzenkov
  2015-08-13 13:59   ` Josef Bacik
  2015-12-07 17:59 ` Andrei Borzenkov
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2015-08-13  8:19 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Josef Bacik, kernel-team

On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
> While adding tcp window scaling support I was finding that I'd get some packet
> loss or reordering when transferring from large distances and grub would just
> timeout.  This is because we weren't ack'ing when we got our OOO packet, so the
> sender didn't know it needed to retransmit anything, so eventually it would fill
> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing when
> we don't find our next sequence numbered packet.  With this fix I no longer time
> out.  Thanks,

I have a feeling that your description is misleading. Patch simply
sends duplicated ACK, but partner does not know what has been received
and what has not, so it must wait for ACK timeout anyway before
retransmitting. What this patch may fix would be lost ACK packet
*from* GRUB, by increasing rate of ACK packets it sends. Do you have
packet trace for timeout case, ideally from both sides simultaneously?

Did you consider implementing receive side SACK BTW? You have the
right environment to test it :)

>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  grub-core/net/tcp.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/grub-core/net/tcp.c b/grub-core/net/tcp.c
> index 25720b1..6b411dd 100644
> --- a/grub-core/net/tcp.c
> +++ b/grub-core/net/tcp.c
> @@ -902,7 +902,10 @@ grub_net_recv_tcp_packet (struct grub_net_buff *nb,
>           grub_priority_queue_pop (sock->pq);
>         }
>        if (grub_be_to_cpu32 (tcph->seqnr) != sock->their_cur_seq)
> -       return GRUB_ERR_NONE;
> +       {
> +         ack (sock);
> +         return GRUB_ERR_NONE;
> +       }
>        while (1)
>         {
>           nb_top_p = grub_priority_queue_top (sock->pq);
> --
> 1.8.1
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-13  8:19 ` Andrei Borzenkov
@ 2015-08-13 13:59   ` Josef Bacik
  2015-08-13 17:13     ` Andrei Borzenkov
  2015-08-17 12:38     ` Andrei Borzenkov
  0 siblings, 2 replies; 9+ messages in thread
From: Josef Bacik @ 2015-08-13 13:59 UTC (permalink / raw)
  To: Andrei Borzenkov, The development of GNU GRUB; +Cc: kernel-team

On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>> While adding tcp window scaling support I was finding that I'd get some packet
>> loss or reordering when transferring from large distances and grub would just
>> timeout.  This is because we weren't ack'ing when we got our OOO packet, so the
>> sender didn't know it needed to retransmit anything, so eventually it would fill
>> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing when
>> we don't find our next sequence numbered packet.  With this fix I no longer time
>> out.  Thanks,
>
> I have a feeling that your description is misleading. Patch simply
> sends duplicated ACK, but partner does not know what has been received
> and what has not, so it must wait for ACK timeout anyway before
> retransmitting. What this patch may fix would be lost ACK packet
> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
> packet trace for timeout case, ideally from both sides simultaneously?
>

The way linux works is that if you get <configurable amount> of DUP 
ack's it triggers a retransmit.  I only have traces from the server 
since tcpdump doesn't work in grub (or if it does I don't know how to do 
it).  The server is definitely getting all of the ACK's, and from my 
printf()'ing in grub we are either getting re-ordered packets (the most 
likely) or we are simply losing a packet here or there.  This is a 
pretty long distance and we have a lot of networking between Sweden and 
California so reordering or packet loss isn't out of the question.

Regardless we definitely need to be ACK'ing packets that come in with 
the last seq we had as the spec says so the sender knows the last bit we 
had, otherwise we see timeouts once the window is full.

> Did you consider implementing receive side SACK BTW? You have the
> right environment to test it :)
>

So I found this bug while implementing SACK, and decided it was faster 
to just do this rather than add SACK.  This method is still exceedingly 
slow, I only get around 800kb/s over the entire transfer whereas I can 
sustain around 5.5 mb/s before we start losing stuff, so I'm definitely 
going to go back and try the timestamp echo stuff since the timeout 
stuff takes like 6 seconds, and then if that doesn't work bite the 
bullet and add SACK.

But first I want to get my ipv6 patches right ;).  Thanks,

Josef



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-13 13:59   ` Josef Bacik
@ 2015-08-13 17:13     ` Andrei Borzenkov
  2015-08-13 17:40       ` Josef Bacik
  2015-08-17 12:38     ` Andrei Borzenkov
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2015-08-13 17:13 UTC (permalink / raw)
  To: Josef Bacik, The development of GNU GRUB; +Cc: kernel-team



On 13.08.2015 16:59, Josef Bacik wrote:
> On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
>> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>>> While adding tcp window scaling support I was finding that I'd get
>>> some packet
>>> loss or reordering when transferring from large distances and grub
>>> would just
>>> timeout.  This is because we weren't ack'ing when we got our OOO
>>> packet, so the
>>> sender didn't know it needed to retransmit anything, so eventually it
>>> would fill
>>> the window and stop transmitting, and we'd time out.  Fix this by
>>> ACK'ing when
>>> we don't find our next sequence numbered packet.  With this fix I no
>>> longer time
>>> out.  Thanks,
>>
>> I have a feeling that your description is misleading. Patch simply
>> sends duplicated ACK, but partner does not know what has been received
>> and what has not, so it must wait for ACK timeout anyway before
>> retransmitting. What this patch may fix would be lost ACK packet
>> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
>> packet trace for timeout case, ideally from both sides simultaneously?
>>
>
> The way linux works is that if you get <configurable amount> of DUP
> ack's it triggers a retransmit.

Do you have pointers to documentation and code?

>                                I only have traces from the server
> since tcpdump doesn't work in grub (or if it does I don't know how to do
> it).

GRUB does not have tcpdump, but your switch quite likely has port mirroring.

>     The server is definitely getting all of the ACK's, and from my
> printf()'ing in grub we are either getting re-ordered packets (the most
> likely) or we are simply losing a packet here or there.  This is a
> pretty long distance and we have a lot of networking between Sweden and
> California so reordering or packet loss isn't out of the question.
>
> Regardless we definitely need to be ACK'ing packets that come in with
> the last seq we had as the spec says so the sender knows the last bit we
> had, otherwise we see timeouts once the window is full.
>

I'm fine with it, I just want to understand why it fixes anything and 
get commit message right.

>> Did you consider implementing receive side SACK BTW? You have the
>> right environment to test it :)
>>
>
> So I found this bug while implementing SACK, and decided it was faster
> to just do this rather than add SACK.  This method is still exceedingly
> slow, I only get around 800kb/s over the entire transfer whereas I can
> sustain around 5.5 mb/s before we start losing stuff, so I'm definitely
> going to go back and try the timestamp echo stuff since the timeout
> stuff takes like 6 seconds, and then if that doesn't work bite the
> bullet and add SACK.
>
> But first I want to get my ipv6 patches right ;).  Thanks,
>
> Josef
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-13 17:13     ` Andrei Borzenkov
@ 2015-08-13 17:40       ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2015-08-13 17:40 UTC (permalink / raw)
  To: Andrei Borzenkov, The development of GNU GRUB; +Cc: kernel-team

On 08/13/2015 01:13 PM, Andrei Borzenkov wrote:
>
>
> On 13.08.2015 16:59, Josef Bacik wrote:
>> On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
>>> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>> While adding tcp window scaling support I was finding that I'd get
>>>> some packet
>>>> loss or reordering when transferring from large distances and grub
>>>> would just
>>>> timeout.  This is because we weren't ack'ing when we got our OOO
>>>> packet, so the
>>>> sender didn't know it needed to retransmit anything, so eventually it
>>>> would fill
>>>> the window and stop transmitting, and we'd time out.  Fix this by
>>>> ACK'ing when
>>>> we don't find our next sequence numbered packet.  With this fix I no
>>>> longer time
>>>> out.  Thanks,
>>>
>>> I have a feeling that your description is misleading. Patch simply
>>> sends duplicated ACK, but partner does not know what has been received
>>> and what has not, so it must wait for ACK timeout anyway before
>>> retransmitting. What this patch may fix would be lost ACK packet
>>> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
>>> packet trace for timeout case, ideally from both sides simultaneously?
>>>
>>
>> The way linux works is that if you get <configurable amount> of DUP
>> ack's it triggers a retransmit.
>
> Do you have pointers to documentation and code?

The tcp_reordering systctl allows you to set how many DUP acks you get 
before retransmitting, you can see the comment above the function 
tcp_time_to_recover in the kernel.  With no SACK support we rely on 
getting a certain number of DUP ACKs before retransmitting, as we could 
get the out of order packets we want in time and not have to retransmit.

>
>>                                I only have traces from the server
>> since tcpdump doesn't work in grub (or if it does I don't know how to do
>> it).
>
> GRUB does not have tcpdump, but your switch quite likely has port
> mirroring.
>

Big comapny, big datacenters etc, etc.  I'm a file system developer, you 
are lucky I know how to spell tcpdump to begin with ;).  The tcpdump on 
the server side supports my hypothesis, we send lots and lots of stuff, 
the grub box starts falling behind in it's ACK responses because it's 
waiting for the next SEQ packet to come in, it ACK's when it does 
finally come in with the new next expected SEQ, and this degrades to the 
point where the sender has maxed out its send window and the grub box 
either has lost or has yet to receive the next packet it is waiting for 
and times out.  I can say for sure that we aren't getting the next 
packet we are looking for while getting a bunch of others just from my 
instrumentation on the grub side, I _can't_ say for sure if it is just 
simple re-ordering or packet loss somewhere.  With this patch we're 
definitely getting all of the DUP ACK's, at least there doesn't appear 
to be any missing in the range (like I see DUP ACK #1-#300 all in a row, 
not missing anybody.)

If you want I can change the commit log to say something like

"If we get an out of order packet we still need to ACK with the expected 
SEQ number so the sender knows we haven't received that packet yet and 
may need a retransmission."

To clear up any ambiguity.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-13 13:59   ` Josef Bacik
  2015-08-13 17:13     ` Andrei Borzenkov
@ 2015-08-17 12:38     ` Andrei Borzenkov
  2015-08-18 17:58       ` Josef Bacik
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2015-08-17 12:38 UTC (permalink / raw)
  To: Josef Bacik; +Cc: The development of GNU GRUB, kernel-team

On Thu, Aug 13, 2015 at 4:59 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
>>
>> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>
>>> While adding tcp window scaling support I was finding that I'd get some
>>> packet
>>> loss or reordering when transferring from large distances and grub would
>>> just
>>> timeout.  This is because we weren't ack'ing when we got our OOO packet,
>>> so the
>>> sender didn't know it needed to retransmit anything, so eventually it
>>> would fill
>>> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing
>>> when
>>> we don't find our next sequence numbered packet.  With this fix I no
>>> longer time
>>> out.  Thanks,
>>
>>
>> I have a feeling that your description is misleading. Patch simply
>> sends duplicated ACK, but partner does not know what has been received
>> and what has not, so it must wait for ACK timeout anyway before
>> retransmitting. What this patch may fix would be lost ACK packet
>> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
>> packet trace for timeout case, ideally from both sides simultaneously?
>>
>
> The way linux works is that if you get <configurable amount> of DUP ack's it
> triggers a retransmit.  I only have traces from the server since tcpdump
> doesn't work in grub (or if it does I don't know how to do it).  The server
> is definitely getting all of the ACK's,

In packet trace you sent me there was almost certain ACK loss for the
segment 20801001- 20805881 (frame 19244). Note that here recovery was
rather fast - server started retransmission after ~0.5sec. It is
unlikely lost packet from server - next ACK from GRUB received by
server was for 20803441, which means it actually got at least initial
half of this segment. Unfortunately some packets are missing in
capture (even packets *from* server), which makes it harder to
interpret. After this server went down to 512 segment size and
everything went more or less well, until frame 19949. Here the server
behavior is rather interesting. It starts retransmission with initial
timeout ~6sec, even though it received quite a lot of DUP ACKs; and
doubling it every time until it hits GRUB timeout (~34 seconds).

Note the difference in behavior between the former and the latter. Did
you try to ask on Linux networking list why they are so different?

OTOH GRUB probably times out too early. Initial TCP RFC suggests 5
minutes general timeout and RFC1122 - at least 100 seconds. It would
be interesting to increase connection timeout to see if it recovers.
You could try bumping GRUB_NET_TRIES to 82  which result in timeout
slightly over 101 sec.

Also it seems that huge window may aggravate the issue. According to
trace, 10K is enough to fill pipe and you set it to 1M. It would be
interesting to see the same with default windows size.


>                                                     and from my printf()'ing in grub we
> are either getting re-ordered packets (the most likely) or we are simply
> losing a packet here or there.  This is a pretty long distance and we have a
> lot of networking between Sweden and California so reordering or packet loss
> isn't out of the question.
>
> Regardless we definitely need to be ACK'ing packets that come in with the
> last seq we had as the spec says so the sender knows the last bit we had,
> otherwise we see timeouts once the window is full.
>
>> Did you consider implementing receive side SACK BTW? You have the
>> right environment to test it :)
>>
>
> So I found this bug while implementing SACK, and decided it was faster to
> just do this rather than add SACK.  This method is still exceedingly slow, I
> only get around 800kb/s over the entire transfer whereas I can sustain
> around 5.5 mb/s before we start losing stuff, so I'm definitely going to go
> back and try the timestamp echo stuff since the timeout stuff takes like 6
> seconds, and then if that doesn't work bite the bullet and add SACK.
>
> But first I want to get my ipv6 patches right ;).  Thanks,
>
> Josef
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-17 12:38     ` Andrei Borzenkov
@ 2015-08-18 17:58       ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2015-08-18 17:58 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: The development of GNU GRUB, kernel-team

On 08/17/2015 05:38 AM, Andrei Borzenkov wrote:
> On Thu, Aug 13, 2015 at 4:59 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
>>>
>>> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>>
>>>> While adding tcp window scaling support I was finding that I'd get some
>>>> packet
>>>> loss or reordering when transferring from large distances and grub would
>>>> just
>>>> timeout.  This is because we weren't ack'ing when we got our OOO packet,
>>>> so the
>>>> sender didn't know it needed to retransmit anything, so eventually it
>>>> would fill
>>>> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing
>>>> when
>>>> we don't find our next sequence numbered packet.  With this fix I no
>>>> longer time
>>>> out.  Thanks,
>>>
>>>
>>> I have a feeling that your description is misleading. Patch simply
>>> sends duplicated ACK, but partner does not know what has been received
>>> and what has not, so it must wait for ACK timeout anyway before
>>> retransmitting. What this patch may fix would be lost ACK packet
>>> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
>>> packet trace for timeout case, ideally from both sides simultaneously?
>>>
>>
>> The way linux works is that if you get <configurable amount> of DUP ack's it
>> triggers a retransmit.  I only have traces from the server since tcpdump
>> doesn't work in grub (or if it does I don't know how to do it).  The server
>> is definitely getting all of the ACK's,
>

(Sorry was traveling for Linux Plumbers.)

> In packet trace you sent me there was almost certain ACK loss for the
> segment 20801001- 20805881 (frame 19244). Note that here recovery was
> rather fast - server started retransmission after ~0.5sec. It is
> unlikely lost packet from server - next ACK from GRUB received by
> server was for 20803441, which means it actually got at least initial
> half of this segment. Unfortunately some packets are missing in
> capture (even packets *from* server), which makes it harder to
> interpret. After this server went down to 512 segment size and
> everything went more or less well, until frame 19949. Here the server
> behavior is rather interesting. It starts retransmission with initial
> timeout ~6sec, even though it received quite a lot of DUP ACKs; and
> doubling it every time until it hits GRUB timeout (~34 seconds).
>

Yeah that's the normal re-transmission timeout.  This tcpdump was on a 
non-patched grub.  We only sent 3 dup acks, we have the dup ack counter 
stuff set to like 13 or something like that so we have to get a lot 
before it triggers the dup ack retransmit logic.

> Note the difference in behavior between the former and the latter. Did
> you try to ask on Linux networking list why they are so different?
>

I'll run it by our networking guys when they show up.

> OTOH GRUB probably times out too early. Initial TCP RFC suggests 5
> minutes general timeout and RFC1122 - at least 100 seconds. It would
> be interesting to increase connection timeout to see if it recovers.
> You could try bumping GRUB_NET_TRIES to 82  which result in timeout
> slightly over 101 sec.
>
> Also it seems that huge window may aggravate the issue. According to
> trace, 10K is enough to fill pipe and you set it to 1M. It would be
> interesting to see the same with default windows size.
>

Oh yeah the problem doesn't happen with a normal window size, it's only 
with the giant window size.  I'm not sure where you are getting the 10k 
number, believe me if I could have gotten around this by just jacking up 
the normal window size I would have done it.  When I set it to the max 
(64k I think?) I get a transfer rate of around 200 kb/s, which is not 
fast enough to pull down our 250mb image.  With the 1mb window I get 5.5 
mb/s, so there is a real benefit to the giant window.  Thanks,

Josef



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-08-12 15:16 [PATCH] tcp: ack when we get an OOO/lost packet Josef Bacik
  2015-08-13  8:19 ` Andrei Borzenkov
@ 2015-12-07 17:59 ` Andrei Borzenkov
  2015-12-07 18:28   ` Josef Bacik
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2015-12-07 17:59 UTC (permalink / raw)
  To: The development of GNU GRUB, kernel-team; +Cc: Josef Bacik

12.08.2015 18:16, Josef Bacik пишет:
> While adding tcp window scaling support I was finding that I'd get some packet
> loss or reordering when transferring from large distances and grub would just
> timeout.  This is because we weren't ack'ing when we got our OOO packet, so the
> sender didn't know it needed to retransmit anything, so eventually it would fill
> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing when
> we don't find our next sequence numbered packet.  With this fix I no longer time
> out.  Thanks,
> 

Applied. Sorry, it somehow slipped through.

More ideas in the same direction.

1. GRUB timeout for receiving currently is ~33 seconds. It is too small
comparing with anything else. I am pretty sure in situation from tcpdump
you sent me we could recover if timeout was in order of several minutes :)

2. We may consider sending ACK in grub_net_tcp_retransmit()
additionally, although it probably needs proper rate-limiting based on RTT.

3. Using timestamp option may improve RTT detection for partner and is
pretty cheap to implement.

> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  grub-core/net/tcp.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/grub-core/net/tcp.c b/grub-core/net/tcp.c
> index 25720b1..6b411dd 100644
> --- a/grub-core/net/tcp.c
> +++ b/grub-core/net/tcp.c
> @@ -902,7 +902,10 @@ grub_net_recv_tcp_packet (struct grub_net_buff *nb,
>  	  grub_priority_queue_pop (sock->pq);
>  	}
>        if (grub_be_to_cpu32 (tcph->seqnr) != sock->their_cur_seq)
> -	return GRUB_ERR_NONE;
> +	{
> +	  ack (sock);
> +	  return GRUB_ERR_NONE;
> +	}
>        while (1)
>  	{
>  	  nb_top_p = grub_priority_queue_top (sock->pq);
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: ack when we get an OOO/lost packet
  2015-12-07 17:59 ` Andrei Borzenkov
@ 2015-12-07 18:28   ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2015-12-07 18:28 UTC (permalink / raw)
  To: Andrei Borzenkov, The development of GNU GRUB, kernel-team

On 12/07/2015 12:59 PM, Andrei Borzenkov wrote:
> 12.08.2015 18:16, Josef Bacik пишет:
>> While adding tcp window scaling support I was finding that I'd get some packet
>> loss or reordering when transferring from large distances and grub would just
>> timeout.  This is because we weren't ack'ing when we got our OOO packet, so the
>> sender didn't know it needed to retransmit anything, so eventually it would fill
>> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing when
>> we don't find our next sequence numbered packet.  With this fix I no longer time
>> out.  Thanks,
>>
>
> Applied. Sorry, it somehow slipped through.
>
> More ideas in the same direction.
>
> 1. GRUB timeout for receiving currently is ~33 seconds. It is too small
> comparing with anything else. I am pretty sure in situation from tcpdump
> you sent me we could recover if timeout was in order of several minutes :)
>

Yeah I jacked up the receive timeout in one of my iterations and that 
helped as well.  Could probably make it configurable.

> 2. We may consider sending ACK in grub_net_tcp_retransmit()
> additionally, although it probably needs proper rate-limiting based on RTT.
>
> 3. Using timestamp option may improve RTT detection for partner and is
> pretty cheap to implement.
>

I'm trying to get some standard testing set up internally so I can test 
all of our hardware types whenever I make changes.  Once I get that 
stuff set up I'll look at adding this and some other features.  Thanks,

Josef



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-07 18:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-12 15:16 [PATCH] tcp: ack when we get an OOO/lost packet Josef Bacik
2015-08-13  8:19 ` Andrei Borzenkov
2015-08-13 13:59   ` Josef Bacik
2015-08-13 17:13     ` Andrei Borzenkov
2015-08-13 17:40       ` Josef Bacik
2015-08-17 12:38     ` Andrei Borzenkov
2015-08-18 17:58       ` Josef Bacik
2015-12-07 17:59 ` Andrei Borzenkov
2015-12-07 18:28   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.