All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
@ 2014-07-25 23:30 Stephen Warren
  2014-08-06 15:58 ` Stephen Warren
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Stephen Warren @ 2014-07-25 23:30 UTC (permalink / raw)
  To: u-boot

From: Stephen Warren <swarren@nvidia.com>

Currently, the BOOTP code sends out its initial request as soon as the
Ethernet driver indicates "link up". If this packet is lost or not
replied to for some reason, the code waits for a 1s timeout before
retrying. For some reason, such early packets are often lost on my
system, so this causes an annoying delay.

To optimize this, modify the BOOTP code to have very short timeouts for
the first packet transmitted, but gradually increase the timeout each
time a timeout occurs. This way, if the first packet is lost, the second
packet is transmitted quite quickly and hence the overall delay is low.
However, if there's still no response, we don't keep spewing out packets
at an insane speed.

It's arguably more correct to try and find out why the first packet is
lost. However, it seems to disappear inside my Ethenet chip; the TX chip
indicates no error during TX (not that it has much in the way of
reporting...), yet wireshark on the RX side doesn't see any packet.
FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
reported too early or based on the wrong condition in HW, and we should
add some fixed extra delay into the driver. However, this would slow down
every link up event even if it ends up not being needed in some cases.
Having BOOTP retry quickly applies the fix/WAR to every possible
Ethernet device, and is quite simple to implement, so seems a better
solution.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
 net/bootp.c | 38 +++++++++++++++++++++++++++++++-------
 net/bootp.h |  3 +--
 net/net.c   |  4 ++--
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/net/bootp.c b/net/bootp.c
index 189a00383543..697184f22d73 100644
--- a/net/bootp.c
+++ b/net/bootp.c
@@ -23,12 +23,22 @@
 
 #define BOOTP_VENDOR_MAGIC	0x63825363	/* RFC1048 Magic Cookie */
 
-#define TIMEOUT		5000UL	/* Milliseconds before trying BOOTP again */
+/*
+ * The timeout for the initial BOOTP/DHCP request used to be described by a
+ * counter of fixed-length timeout periods. TIMEOUT_COUNT represents
+ * that counter
+ *
+ * Now that the timeout periods are variable (exponential backoff and retry)
+ * we convert the timeout count to the absolute time it would have take to
+ * execute that many retries, and keep sending retry packets until that time
+ * is reached.
+ */
 #ifndef CONFIG_NET_RETRY_COUNT
 # define TIMEOUT_COUNT	5		/* # of timeouts before giving up */
 #else
 # define TIMEOUT_COUNT	(CONFIG_NET_RETRY_COUNT)
 #endif
+#define TIMEOUT_MS	((3 + (TIMEOUT_COUNT * 5)) * 1000)
 
 #define PORT_BOOTPS	67		/* BOOTP server UDP port */
 #define PORT_BOOTPC	68		/* BOOTP client UDP port */
@@ -39,6 +49,8 @@
 
 ulong		BootpID;
 int		BootpTry;
+ulong		bootp_start;
+ulong		bootp_timeout;
 
 #if defined(CONFIG_CMD_DHCP)
 static dhcp_state_t dhcp_state = INIT;
@@ -327,16 +339,21 @@ BootpHandler(uchar *pkt, unsigned dest, IPaddr_t sip, unsigned src,
 static void
 BootpTimeout(void)
 {
-	if (BootpTry >= TIMEOUT_COUNT) {
+	ulong time_taken = get_timer(bootp_start);
+
+	if (time_taken >= TIMEOUT_MS) {
 #ifdef CONFIG_BOOTP_MAY_FAIL
-		puts("\nRetry count exceeded\n");
+		puts("\nRetry time exceeded\n");
 		net_set_state(NETLOOP_FAIL);
 #else
-		puts("\nRetry count exceeded; starting again\n");
+		puts("\nRetry time exceeded; starting again\n");
 		NetStartAgain();
 #endif
 	} else {
-		NetSetTimeout(TIMEOUT, BootpTimeout);
+		bootp_timeout *= 2;
+		if (bootp_timeout > 1000)
+			bootp_timeout = 1000;
+		NetSetTimeout(bootp_timeout, BootpTimeout);
 		BootpRequest();
 	}
 }
@@ -597,6 +614,13 @@ static int BootpExtended(u8 *e)
 }
 #endif
 
+void BootpReset(void)
+{
+	BootpTry = 0;
+	bootp_start = get_timer(0);
+	bootp_timeout = 10;
+}
+
 void
 BootpRequest(void)
 {
@@ -686,7 +710,7 @@ BootpRequest(void)
 	iplen = BOOTP_HDR_SIZE - OPT_FIELD_SIZE + extlen;
 	pktlen = eth_hdr_size + IP_UDP_HDR_SIZE + iplen;
 	net_set_udp_header(iphdr, 0xFFFFFFFFL, PORT_BOOTPS, PORT_BOOTPC, iplen);
-	NetSetTimeout(SELECT_TIMEOUT, BootpTimeout);
+	NetSetTimeout(bootp_timeout, BootpTimeout);
 
 #if defined(CONFIG_CMD_DHCP)
 	dhcp_state = SELECTING;
@@ -919,7 +943,7 @@ DhcpHandler(uchar *pkt, unsigned dest, IPaddr_t sip, unsigned src,
 						htonl(BOOTP_VENDOR_MAGIC))
 				DhcpOptionsProcess((u8 *)&bp->bp_vend[4], bp);
 
-			NetSetTimeout(TIMEOUT, BootpTimeout);
+			NetSetTimeout(5000, BootpTimeout);
 			DhcpSendRequestPkt(bp);
 #ifdef CONFIG_SYS_BOOTFILE_PREFIX
 		}
diff --git a/net/bootp.h b/net/bootp.h
index ecbcc4d5093c..3b95a0a2ded8 100644
--- a/net/bootp.h
+++ b/net/bootp.h
@@ -65,6 +65,7 @@ extern int	BootpTry;
 
 
 /* Send a BOOTP request */
+extern void BootpReset(void);
 extern void BootpRequest(void);
 
 /****************** DHCP Support *********************/
@@ -88,8 +89,6 @@ typedef enum { INIT,
 #define DHCP_NAK      6
 #define DHCP_RELEASE  7
 
-#define SELECT_TIMEOUT 3000UL	/* Milliseconds to wait for offers */
-
 /**********************************************************************/
 
 #endif /* __BOOTP_H__ */
diff --git a/net/net.c b/net/net.c
index 0f7625fde1dc..722089f3b931 100644
--- a/net/net.c
+++ b/net/net.c
@@ -385,14 +385,14 @@ restart:
 #endif
 #if defined(CONFIG_CMD_DHCP)
 		case DHCP:
-			BootpTry = 0;
+			BootpReset();
 			NetOurIP = 0;
 			DhcpRequest();		/* Basically same as BOOTP */
 			break;
 #endif
 
 		case BOOTP:
-			BootpTry = 0;
+			BootpReset();
 			NetOurIP = 0;
 			BootpRequest();
 			break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-07-25 23:30 [U-Boot] [PATCH] net: BOOTP retry timeout improvements Stephen Warren
@ 2014-08-06 15:58 ` Stephen Warren
  2014-08-06 18:10   ` Tom Rini
  2014-08-06 20:03 ` Joe Hershberger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Stephen Warren @ 2014-08-06 15:58 UTC (permalink / raw)
  To: u-boot

On 07/25/2014 05:30 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> Currently, the BOOTP code sends out its initial request as soon as the
> Ethernet driver indicates "link up". If this packet is lost or not
> replied to for some reason, the code waits for a 1s timeout before
> retrying. For some reason, such early packets are often lost on my
> system, so this causes an annoying delay.
>
> To optimize this, modify the BOOTP code to have very short timeouts for
> the first packet transmitted, but gradually increase the timeout each
> time a timeout occurs. This way, if the first packet is lost, the second
> packet is transmitted quite quickly and hence the overall delay is low.
> However, if there's still no response, we don't keep spewing out packets
> at an insane speed.
>
> It's arguably more correct to try and find out why the first packet is
> lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> indicates no error during TX (not that it has much in the way of
> reporting...), yet wireshark on the RX side doesn't see any packet.
> FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> reported too early or based on the wrong condition in HW, and we should
> add some fixed extra delay into the driver. However, this would slow down
> every link up event even if it ends up not being needed in some cases.
> Having BOOTP retry quickly applies the fix/WAR to every possible
> Ethernet device, and is quite simple to implement, so seems a better
> solution.

Joe, Tom,

Does this patch look OK?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-06 15:58 ` Stephen Warren
@ 2014-08-06 18:10   ` Tom Rini
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Rini @ 2014-08-06 18:10 UTC (permalink / raw)
  To: u-boot

On Wed, Aug 06, 2014 at 09:58:12AM -0600, Stephen Warren wrote:
> On 07/25/2014 05:30 PM, Stephen Warren wrote:
> >From: Stephen Warren <swarren@nvidia.com>
> >
> >Currently, the BOOTP code sends out its initial request as soon as the
> >Ethernet driver indicates "link up". If this packet is lost or not
> >replied to for some reason, the code waits for a 1s timeout before
> >retrying. For some reason, such early packets are often lost on my
> >system, so this causes an annoying delay.
> >
> >To optimize this, modify the BOOTP code to have very short timeouts for
> >the first packet transmitted, but gradually increase the timeout each
> >time a timeout occurs. This way, if the first packet is lost, the second
> >packet is transmitted quite quickly and hence the overall delay is low.
> >However, if there's still no response, we don't keep spewing out packets
> >at an insane speed.
> >
> >It's arguably more correct to try and find out why the first packet is
> >lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> >indicates no error during TX (not that it has much in the way of
> >reporting...), yet wireshark on the RX side doesn't see any packet.
> >FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> >reported too early or based on the wrong condition in HW, and we should
> >add some fixed extra delay into the driver. However, this would slow down
> >every link up event even if it ends up not being needed in some cases.
> >Having BOOTP retry quickly applies the fix/WAR to every possible
> >Ethernet device, and is quite simple to implement, so seems a better
> >solution.
> 
> Joe, Tom,
> 
> Does this patch look OK?

I was a bit worried about this impacting boards that don't have a
problem today but a quick test shows they're still getting the first
reply, so I'm OK with it.  I'll be grabbing things soon.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140806/41fa218a/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-07-25 23:30 [U-Boot] [PATCH] net: BOOTP retry timeout improvements Stephen Warren
  2014-08-06 15:58 ` Stephen Warren
@ 2014-08-06 20:03 ` Joe Hershberger
  2014-08-10 22:22 ` [U-Boot] " Tom Rini
  2014-08-15 12:39 ` [U-Boot] [PATCH] " Thierry Reding
  3 siblings, 0 replies; 12+ messages in thread
From: Joe Hershberger @ 2014-08-06 20:03 UTC (permalink / raw)
  To: u-boot

On Fri, Jul 25, 2014 at 6:30 PM, Stephen Warren <swarren@wwwdotorg.org>
wrote:
>
> From: Stephen Warren <swarren@nvidia.com>
>
> Currently, the BOOTP code sends out its initial request as soon as the
> Ethernet driver indicates "link up". If this packet is lost or not
> replied to for some reason, the code waits for a 1s timeout before
> retrying. For some reason, such early packets are often lost on my
> system, so this causes an annoying delay.
>
> To optimize this, modify the BOOTP code to have very short timeouts for
> the first packet transmitted, but gradually increase the timeout each
> time a timeout occurs. This way, if the first packet is lost, the second
> packet is transmitted quite quickly and hence the overall delay is low.
> However, if there's still no response, we don't keep spewing out packets
> at an insane speed.
>
> It's arguably more correct to try and find out why the first packet is
> lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> indicates no error during TX (not that it has much in the way of
> reporting...), yet wireshark on the RX side doesn't see any packet.
> FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> reported too early or based on the wrong condition in HW, and we should
> add some fixed extra delay into the driver. However, this would slow down
> every link up event even if it ends up not being needed in some cases.
> Having BOOTP retry quickly applies the fix/WAR to every possible
> Ethernet device, and is quite simple to implement, so seems a better
> solution.
>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---

Acked-by: Joe Hershberger <joe.hershberger@ni.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] net: BOOTP retry timeout improvements
  2014-07-25 23:30 [U-Boot] [PATCH] net: BOOTP retry timeout improvements Stephen Warren
  2014-08-06 15:58 ` Stephen Warren
  2014-08-06 20:03 ` Joe Hershberger
@ 2014-08-10 22:22 ` Tom Rini
  2014-08-15 12:39 ` [U-Boot] [PATCH] " Thierry Reding
  3 siblings, 0 replies; 12+ messages in thread
From: Tom Rini @ 2014-08-10 22:22 UTC (permalink / raw)
  To: u-boot

On Fri, Jul 25, 2014 at 05:30:48PM -0600, Stephen Warren wrote:

> From: Stephen Warren <swarren@nvidia.com>
> 
> Currently, the BOOTP code sends out its initial request as soon as the
> Ethernet driver indicates "link up". If this packet is lost or not
> replied to for some reason, the code waits for a 1s timeout before
> retrying. For some reason, such early packets are often lost on my
> system, so this causes an annoying delay.
> 
> To optimize this, modify the BOOTP code to have very short timeouts for
> the first packet transmitted, but gradually increase the timeout each
> time a timeout occurs. This way, if the first packet is lost, the second
> packet is transmitted quite quickly and hence the overall delay is low.
> However, if there's still no response, we don't keep spewing out packets
> at an insane speed.
> 
> It's arguably more correct to try and find out why the first packet is
> lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> indicates no error during TX (not that it has much in the way of
> reporting...), yet wireshark on the RX side doesn't see any packet.
> FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> reported too early or based on the wrong condition in HW, and we should
> add some fixed extra delay into the driver. However, this would slow down
> every link up event even if it ends up not being needed in some cases.
> Having BOOTP retry quickly applies the fix/WAR to every possible
> Ethernet device, and is quite simple to implement, so seems a better
> solution.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> Acked-by: Joe Hershberger <joe.hershberger@ni.com>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140810/eb0e8f98/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-07-25 23:30 [U-Boot] [PATCH] net: BOOTP retry timeout improvements Stephen Warren
                   ` (2 preceding siblings ...)
  2014-08-10 22:22 ` [U-Boot] " Tom Rini
@ 2014-08-15 12:39 ` Thierry Reding
  2014-08-15 12:49   ` Thierry Reding
  3 siblings, 1 reply; 12+ messages in thread
From: Thierry Reding @ 2014-08-15 12:39 UTC (permalink / raw)
  To: u-boot

On Fri, Jul 25, 2014 at 05:30:48PM -0600, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Currently, the BOOTP code sends out its initial request as soon as the
> Ethernet driver indicates "link up". If this packet is lost or not
> replied to for some reason, the code waits for a 1s timeout before
> retrying. For some reason, such early packets are often lost on my
> system, so this causes an annoying delay.
> 
> To optimize this, modify the BOOTP code to have very short timeouts for
> the first packet transmitted, but gradually increase the timeout each
> time a timeout occurs. This way, if the first packet is lost, the second
> packet is transmitted quite quickly and hence the overall delay is low.
> However, if there's still no response, we don't keep spewing out packets
> at an insane speed.
> 
> It's arguably more correct to try and find out why the first packet is
> lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> indicates no error during TX (not that it has much in the way of
> reporting...), yet wireshark on the RX side doesn't see any packet.
> FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> reported too early or based on the wrong condition in HW, and we should
> add some fixed extra delay into the driver. However, this would slow down
> every link up event even if it ends up not being needed in some cases.
> Having BOOTP retry quickly applies the fix/WAR to every possible
> Ethernet device, and is quite simple to implement, so seems a better
> solution.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
>  net/bootp.c | 38 +++++++++++++++++++++++++++++++-------
>  net/bootp.h |  3 +--
>  net/net.c   |  4 ++--
>  3 files changed, 34 insertions(+), 11 deletions(-)

Sorry for jumping in a little late, but I only encountered this after
rebasing on latest origin/master today.

With this patch applied, it takes about 8 seconds until U-Boot manages
to get a DHCP address (it needs to broadcast 14 times) whereas with the
patch reverted I get a reply almost instantly (with only a single
broadcast).

I'm testing this on a Jetson TK1 (with local patches for PCIe and
ethernet support).

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140815/f9fb13a8/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-15 12:39 ` [U-Boot] [PATCH] " Thierry Reding
@ 2014-08-15 12:49   ` Thierry Reding
  2014-08-15 16:02     ` Stephen Warren
  0 siblings, 1 reply; 12+ messages in thread
From: Thierry Reding @ 2014-08-15 12:49 UTC (permalink / raw)
  To: u-boot

On Fri, Aug 15, 2014 at 02:39:45PM +0200, Thierry Reding wrote:
> On Fri, Jul 25, 2014 at 05:30:48PM -0600, Stephen Warren wrote:
> > From: Stephen Warren <swarren@nvidia.com>
> > 
> > Currently, the BOOTP code sends out its initial request as soon as the
> > Ethernet driver indicates "link up". If this packet is lost or not
> > replied to for some reason, the code waits for a 1s timeout before
> > retrying. For some reason, such early packets are often lost on my
> > system, so this causes an annoying delay.
> > 
> > To optimize this, modify the BOOTP code to have very short timeouts for
> > the first packet transmitted, but gradually increase the timeout each
> > time a timeout occurs. This way, if the first packet is lost, the second
> > packet is transmitted quite quickly and hence the overall delay is low.
> > However, if there's still no response, we don't keep spewing out packets
> > at an insane speed.
> > 
> > It's arguably more correct to try and find out why the first packet is
> > lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> > indicates no error during TX (not that it has much in the way of
> > reporting...), yet wireshark on the RX side doesn't see any packet.
> > FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> > reported too early or based on the wrong condition in HW, and we should
> > add some fixed extra delay into the driver. However, this would slow down
> > every link up event even if it ends up not being needed in some cases.
> > Having BOOTP retry quickly applies the fix/WAR to every possible
> > Ethernet device, and is quite simple to implement, so seems a better
> > solution.
> > 
> > Signed-off-by: Stephen Warren <swarren@nvidia.com>
> > ---
> >  net/bootp.c | 38 +++++++++++++++++++++++++++++++-------
> >  net/bootp.h |  3 +--
> >  net/net.c   |  4 ++--
> >  3 files changed, 34 insertions(+), 11 deletions(-)
> 
> Sorry for jumping in a little late, but I only encountered this after
> rebasing on latest origin/master today.
> 
> With this patch applied, it takes about 8 seconds until U-Boot manages
> to get a DHCP address (it needs to broadcast 14 times) whereas with the
> patch reverted I get a reply almost instantly (with only a single
> broadcast).
> 
> I'm testing this on a Jetson TK1 (with local patches for PCIe and
> ethernet support).

Further tests show that if I increase the initial bootp_timeout to
anywhere beyond 750 ms things don't regress for me. If I go down to 500
ms it will send out a second broadcast (presumably because the DHCP
server in my network hasn't replied yet) but still get a reasonably
quick reply (~ 1 second).

So it would seem that this parameter depends to a large degree on the
network infrastructure and needs to be carefully tweaked to work well in
most circumstances.

Could you try if 500, 750 and 1000 ms as initial bootp_timeout value
give you acceptable results?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140815/f1931ab2/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-15 12:49   ` Thierry Reding
@ 2014-08-15 16:02     ` Stephen Warren
  2014-08-15 21:39       ` Thierry Reding
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Warren @ 2014-08-15 16:02 UTC (permalink / raw)
  To: u-boot

On 08/15/2014 06:49 AM, Thierry Reding wrote:
> On Fri, Aug 15, 2014 at 02:39:45PM +0200, Thierry Reding wrote:
>> On Fri, Jul 25, 2014 at 05:30:48PM -0600, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> Currently, the BOOTP code sends out its initial request as soon as the
>>> Ethernet driver indicates "link up". If this packet is lost or not
>>> replied to for some reason, the code waits for a 1s timeout before
>>> retrying. For some reason, such early packets are often lost on my
>>> system, so this causes an annoying delay.
>>>
>>> To optimize this, modify the BOOTP code to have very short timeouts for
>>> the first packet transmitted, but gradually increase the timeout each
>>> time a timeout occurs. This way, if the first packet is lost, the second
>>> packet is transmitted quite quickly and hence the overall delay is low.
>>> However, if there's still no response, we don't keep spewing out packets
>>> at an insane speed.
>>>
>>> It's arguably more correct to try and find out why the first packet is
>>> lost. However, it seems to disappear inside my Ethenet chip; the TX chip
>>> indicates no error during TX (not that it has much in the way of
>>> reporting...), yet wireshark on the RX side doesn't see any packet.
>>> FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
>>> reported too early or based on the wrong condition in HW, and we should
>>> add some fixed extra delay into the driver. However, this would slow down
>>> every link up event even if it ends up not being needed in some cases.
>>> Having BOOTP retry quickly applies the fix/WAR to every possible
>>> Ethernet device, and is quite simple to implement, so seems a better
>>> solution.
>>>
>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>> ---
>>>   net/bootp.c | 38 +++++++++++++++++++++++++++++++-------
>>>   net/bootp.h |  3 +--
>>>   net/net.c   |  4 ++--
>>>   3 files changed, 34 insertions(+), 11 deletions(-)
>>
>> Sorry for jumping in a little late, but I only encountered this after
>> rebasing on latest origin/master today.
>>
>> With this patch applied, it takes about 8 seconds until U-Boot manages
>> to get a DHCP address (it needs to broadcast 14 times) whereas with the
>> patch reverted I get a reply almost instantly (with only a single
>> broadcast).
>>
>> I'm testing this on a Jetson TK1 (with local patches for PCIe and
>> ethernet support).
>
> Further tests show that if I increase the initial bootp_timeout to
> anywhere beyond 750 ms things don't regress for me. If I go down to 500
> ms it will send out a second broadcast (presumably because the DHCP
> server in my network hasn't replied yet) but still get a reasonably
> quick reply (~ 1 second).
>
> So it would seem that this parameter depends to a large degree on the
> network infrastructure and needs to be carefully tweaked to work well in
> most circumstances.
>
> Could you try if 500, 750 and 1000 ms as initial bootp_timeout value
> give you acceptable results?

That will still introduce extremely large delays into the boot process, 
which is exactly what this patch was trying to avoid. 500..1000ms are 
the same order of magnitude as the initial delay without this patch applied.

Re-transmitting a DHCP request shouldn't prevent a response to the 
previous request from being processed - AFAIK each request is 
idempotent. Can you debug what is causing the 8s delay? Are earlier 
responses received and rejected for some reason, or is your DHCP server 
getting confused by the multiple requests and not responding, or ...?

(and as an aside, how on earth is your DHCP server taking >500ms to 
respond, yet still actually responding?)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-15 16:02     ` Stephen Warren
@ 2014-08-15 21:39       ` Thierry Reding
  2014-08-15 22:09         ` Thierry Reding
  2014-08-18 16:04         ` Stephen Warren
  0 siblings, 2 replies; 12+ messages in thread
From: Thierry Reding @ 2014-08-15 21:39 UTC (permalink / raw)
  To: u-boot

On Fri, Aug 15, 2014 at 10:02:40AM -0600, Stephen Warren wrote:
> On 08/15/2014 06:49 AM, Thierry Reding wrote:
> >On Fri, Aug 15, 2014 at 02:39:45PM +0200, Thierry Reding wrote:
> >>On Fri, Jul 25, 2014 at 05:30:48PM -0600, Stephen Warren wrote:
> >>>From: Stephen Warren <swarren@nvidia.com>
> >>>
> >>>Currently, the BOOTP code sends out its initial request as soon as the
> >>>Ethernet driver indicates "link up". If this packet is lost or not
> >>>replied to for some reason, the code waits for a 1s timeout before
> >>>retrying. For some reason, such early packets are often lost on my
> >>>system, so this causes an annoying delay.
> >>>
> >>>To optimize this, modify the BOOTP code to have very short timeouts for
> >>>the first packet transmitted, but gradually increase the timeout each
> >>>time a timeout occurs. This way, if the first packet is lost, the second
> >>>packet is transmitted quite quickly and hence the overall delay is low.
> >>>However, if there's still no response, we don't keep spewing out packets
> >>>at an insane speed.
> >>>
> >>>It's arguably more correct to try and find out why the first packet is
> >>>lost. However, it seems to disappear inside my Ethenet chip; the TX chip
> >>>indicates no error during TX (not that it has much in the way of
> >>>reporting...), yet wireshark on the RX side doesn't see any packet.
> >>>FWIW, I'm using an ASIX USB Ethernet adapter. Perhaps "link up" is
> >>>reported too early or based on the wrong condition in HW, and we should
> >>>add some fixed extra delay into the driver. However, this would slow down
> >>>every link up event even if it ends up not being needed in some cases.
> >>>Having BOOTP retry quickly applies the fix/WAR to every possible
> >>>Ethernet device, and is quite simple to implement, so seems a better
> >>>solution.
> >>>
> >>>Signed-off-by: Stephen Warren <swarren@nvidia.com>
> >>>---
> >>>  net/bootp.c | 38 +++++++++++++++++++++++++++++++-------
> >>>  net/bootp.h |  3 +--
> >>>  net/net.c   |  4 ++--
> >>>  3 files changed, 34 insertions(+), 11 deletions(-)
> >>
> >>Sorry for jumping in a little late, but I only encountered this after
> >>rebasing on latest origin/master today.
> >>
> >>With this patch applied, it takes about 8 seconds until U-Boot manages
> >>to get a DHCP address (it needs to broadcast 14 times) whereas with the
> >>patch reverted I get a reply almost instantly (with only a single
> >>broadcast).
> >>
> >>I'm testing this on a Jetson TK1 (with local patches for PCIe and
> >>ethernet support).
> >
> >Further tests show that if I increase the initial bootp_timeout to
> >anywhere beyond 750 ms things don't regress for me. If I go down to 500
> >ms it will send out a second broadcast (presumably because the DHCP
> >server in my network hasn't replied yet) but still get a reasonably
> >quick reply (~ 1 second).
> >
> >So it would seem that this parameter depends to a large degree on the
> >network infrastructure and needs to be carefully tweaked to work well in
> >most circumstances.
> >
> >Could you try if 500, 750 and 1000 ms as initial bootp_timeout value
> >give you acceptable results?
> 
> That will still introduce extremely large delays into the boot process,
> which is exactly what this patch was trying to avoid. 500..1000ms are the
> same order of magnitude as the initial delay without this patch applied.

It will introduce a 1000 ms delay /if/ you have broken hardware/drivers
where the first packet is lost. If the first packet isn't lost there's
no additional delay except for the DHCP server's latency in replying. So
in effect the patch trades long latency on one setup for longer latency
on another set up.

Also the timeout was previously 5000 ms, so by adjusting the timeout to
1000 ms we'd still be gaining quite a few seconds.

> Re-transmitting a DHCP request shouldn't prevent a response to the previous
> request from being processed - AFAIK each request is idempotent. Can you
> debug what is causing the 8s delay? Are earlier responses received and
> rejected for some reason, or is your DHCP server getting confused by the
> multiple requests and not responding, or ...?

I haven't really tested this, but by looking at the code in net/bootp.c
(function BootpCheckPkt()) the U-Boot implementation will reject all
packets that don't match the BootpID (which is composed of the lower 4
bytes of the ethernet address plus the time in milliseconds when the
discover packet was sent, see BootpRequest()).

So indeed earlier responses will be rejected, which would also explain
the 8 second delay since that's about the time it takes for the backoff
to reach the timeout that the server requires to reply before the next
retransmission.

> (and as an aside, how on earth is your DHCP server taking >500ms to respond,
> yet still actually responding?)

It's a black box mostly but it seems to be doing the same for any of the
other devices (tcpdump on the machine that I'm typing this from gives me
roughly 430 ms in one session and 580 ms in another between the DISCOVER
message and the server's reply) on the network.

This seems absurdly long, but it's consumer network equipment (I'd even
say high-end by the price-tag) and there's not a lot I can configure to
make it faster. I also suspect that other people may have hardware with
similar latency.

I've also looked briefly at RFC 2131 (DHCP) which recommends an initial
retransmission timeout of 4 (+- 1) seconds for 10 Mbps ethernet. The RFC
goes on to say that this should be chosen based on the characteristics
of the network, so unless we really do choose the initial timeout based
on the link speed (even then you could still have many layers of bridges
and WiFi between client and server) the default initial timeout should
probably be larger than 10 ms. On 100 Mbps 10 ms is about the round-trip
time for a ping, so it's cutting it rather close as timeout for
retransmission.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140815/abf2bf80/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-15 21:39       ` Thierry Reding
@ 2014-08-15 22:09         ` Thierry Reding
  2014-08-18 16:04         ` Stephen Warren
  1 sibling, 0 replies; 12+ messages in thread
From: Thierry Reding @ 2014-08-15 22:09 UTC (permalink / raw)
  To: u-boot

On Fri, Aug 15, 2014 at 11:39:08PM +0200, Thierry Reding wrote:
> On Fri, Aug 15, 2014 at 10:02:40AM -0600, Stephen Warren wrote:
[...]
> > (and as an aside, how on earth is your DHCP server taking >500ms to respond,
> > yet still actually responding?)
> 
> It's a black box mostly but it seems to be doing the same for any of the
> other devices (tcpdump on the machine that I'm typing this from gives me
> roughly 430 ms in one session and 580 ms in another between the DISCOVER
> message and the server's reply) on the network.
> 
> This seems absurdly long, but it's consumer network equipment (I'd even
> say high-end by the price-tag) and there's not a lot I can configure to
> make it faster. I also suspect that other people may have hardware with
> similar latency.

Oh, looking further at tcpdump output it seems like the DHCP server is
sending out an ARP request for the IP address that it's about to OFFER
to make sure nobody is using it yet. Waiting for an ARP reply probably
accounts for most of the delay.

Looking at various websites this seems to be a common practice in DHCP
servers.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140816/3ac4922b/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-15 21:39       ` Thierry Reding
  2014-08-15 22:09         ` Thierry Reding
@ 2014-08-18 16:04         ` Stephen Warren
  2014-08-18 16:15           ` Tom Rini
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Warren @ 2014-08-18 16:04 UTC (permalink / raw)
  To: u-boot

On 08/15/2014 03:39 PM, Thierry Reding wrote:
> On Fri, Aug 15, 2014 at 10:02:40AM -0600, Stephen Warren wrote:
...
>> Re-transmitting a DHCP request shouldn't prevent a response to the previous
>> request from being processed - AFAIK each request is idempotent. Can you
>> debug what is causing the 8s delay? Are earlier responses received and
>> rejected for some reason, or is your DHCP server getting confused by the
>> multiple requests and not responding, or ...?
>
> I haven't really tested this, but by looking at the code in net/bootp.c
> (function BootpCheckPkt()) the U-Boot implementation will reject all
> packets that don't match the BootpID (which is composed of the lower 4
> bytes of the ethernet address plus the time in milliseconds when the
> discover packet was sent, see BootpRequest()).
>
> So indeed earlier responses will be rejected, which would also explain
> the 8 second delay since that's about the time it takes for the backoff
> to reach the timeout that the server requires to reply before the next
> retransmission.

Ah, that's a problem then, especially given that Thierry mentioned that 
(some) DHCP servers send out an ARP address and wait for any responses, 
to avoid address conflicts.

I suppose we have two options here:

a) Make each of U-Boot's DHCP requests identical, so that responses to 
an earlier request are accepted even if a later request has been sent. I 
don't know what the implications are here; perhaps some dumb DHCP server 
might give out different results for the different responses, so we 
actively don't want to accept the stale data?

b) Revert my patch. Perhaps think about tweaking the initial retry 
time-out later.

(b) seems simplest. Tom, it's f59be6e850b3 "net: BOOTP retry timeout 
improvements". It looks like nothing has touched the same files since it 
was applied, so it should be a simple revert.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] [PATCH] net: BOOTP retry timeout improvements
  2014-08-18 16:04         ` Stephen Warren
@ 2014-08-18 16:15           ` Tom Rini
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Rini @ 2014-08-18 16:15 UTC (permalink / raw)
  To: u-boot

On Mon, Aug 18, 2014 at 10:04:58AM -0600, Stephen Warren wrote:
> On 08/15/2014 03:39 PM, Thierry Reding wrote:
> >On Fri, Aug 15, 2014 at 10:02:40AM -0600, Stephen Warren wrote:
> ...
> >>Re-transmitting a DHCP request shouldn't prevent a response to the previous
> >>request from being processed - AFAIK each request is idempotent. Can you
> >>debug what is causing the 8s delay? Are earlier responses received and
> >>rejected for some reason, or is your DHCP server getting confused by the
> >>multiple requests and not responding, or ...?
> >
> >I haven't really tested this, but by looking at the code in net/bootp.c
> >(function BootpCheckPkt()) the U-Boot implementation will reject all
> >packets that don't match the BootpID (which is composed of the lower 4
> >bytes of the ethernet address plus the time in milliseconds when the
> >discover packet was sent, see BootpRequest()).
> >
> >So indeed earlier responses will be rejected, which would also explain
> >the 8 second delay since that's about the time it takes for the backoff
> >to reach the timeout that the server requires to reply before the next
> >retransmission.
> 
> Ah, that's a problem then, especially given that Thierry mentioned
> that (some) DHCP servers send out an ARP address and wait for any
> responses, to avoid address conflicts.
> 
> I suppose we have two options here:
> 
> a) Make each of U-Boot's DHCP requests identical, so that responses
> to an earlier request are accepted even if a later request has been
> sent. I don't know what the implications are here; perhaps some dumb
> DHCP server might give out different results for the different
> responses, so we actively don't want to accept the stale data?
> 
> b) Revert my patch. Perhaps think about tweaking the initial retry
> time-out later.
> 
> (b) seems simplest. Tom, it's f59be6e850b3 "net: BOOTP retry timeout
> improvements". It looks like nothing has touched the same files
> since it was applied, so it should be a simple revert.

I'm leaning this way as well for now and I see some "funny" behavior
sometimes in my setup, but not consistent (but I bet it's this same
thing) and I haven't had time to tcpdump it.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20140818/fc44326e/attachment.pgp>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-18 16:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25 23:30 [U-Boot] [PATCH] net: BOOTP retry timeout improvements Stephen Warren
2014-08-06 15:58 ` Stephen Warren
2014-08-06 18:10   ` Tom Rini
2014-08-06 20:03 ` Joe Hershberger
2014-08-10 22:22 ` [U-Boot] " Tom Rini
2014-08-15 12:39 ` [U-Boot] [PATCH] " Thierry Reding
2014-08-15 12:49   ` Thierry Reding
2014-08-15 16:02     ` Stephen Warren
2014-08-15 21:39       ` Thierry Reding
2014-08-15 22:09         ` Thierry Reding
2014-08-18 16:04         ` Stephen Warren
2014-08-18 16:15           ` Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.