linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: tcp: Fix a PTO timing granularity issue
@ 2015-05-26 14:25 Ido Yariv
  2015-05-26 16:23 ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-26 14:25 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel
  Cc: Ido Yariv, Ido Yariv

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer might expire
prematurely. This is especially problematic on systems in which HZ=100.

To work around this issue, increase the number of jiffies by one,
ensuring that the timeout won't expire in less than 10ms.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 net/ipv4/tcp_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fd..6f57d3d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	if (tp->packets_out == 1)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
-	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+	timeout = max_t(u32, timeout, msecs_to_jiffies(10) + 1);
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 14:25 [PATCH] net: tcp: Fix a PTO timing granularity issue Ido Yariv
@ 2015-05-26 16:23 ` Eric Dumazet
  2015-05-26 17:02   ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-26 16:23 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Tue, 2015-05-26 at 10:25 -0400, Ido Yariv wrote:
> The Tail Loss Probe RFC specifies that the PTO value should be set to
> max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> 
> The PTO value is converted to jiffies, so the timer might expire
> prematurely. This is especially problematic on systems in which HZ=100.
> 
> To work around this issue, increase the number of jiffies by one,
> ensuring that the timeout won't expire in less than 10ms.
> 
> Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> ---
>  net/ipv4/tcp_output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 534e5fd..6f57d3d 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  	if (tp->packets_out == 1)
>  		timeout = max_t(u32, timeout,
>  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
> -	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> +	timeout = max_t(u32, timeout, msecs_to_jiffies(10) + 1);
>  
>  	/* If RTO is shorter, just schedule TLP in its place. */
>  	tlp_time_stamp = tcp_time_stamp + timeout;

Have you really hit an issue, or did you send this patch after all these
msecs_to_jiffies() discussions on lkml/netdev ?

Not sure this is the right fix.

TLP was really tested with an effective min delay of 10ms.

Adding 10% for the sake of crazy HZ=100 builds seems a high price.
(All recent TCP changes were tested with HZ=1000 BTW ...)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fdb04c11152bae36f47a786e8b10b823cd3..5321df89af9b59c6727395c489e6f9b2770dcd5e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
 	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+#if HZ <= 100
+	timeout = max_t(u32, timeout, 2);
+#endif
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 16:23 ` Eric Dumazet
@ 2015-05-26 17:02   ` Ido Yariv
  2015-05-26 17:13     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-26 17:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Tue, May 26, 2015 at 09:23:55AM -0700, Eric Dumazet wrote:
> On Tue, 2015-05-26 at 10:25 -0400, Ido Yariv wrote:
> > The Tail Loss Probe RFC specifies that the PTO value should be set to
> > max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> > 
> > The PTO value is converted to jiffies, so the timer might expire
> > prematurely. This is especially problematic on systems in which HZ=100.
> > 
> > To work around this issue, increase the number of jiffies by one,
> > ensuring that the timeout won't expire in less than 10ms.
> > 
> > Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> > ---
> >  net/ipv4/tcp_output.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 534e5fd..6f57d3d 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >  	if (tp->packets_out == 1)
> >  		timeout = max_t(u32, timeout,
> >  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
> > -	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> > +	timeout = max_t(u32, timeout, msecs_to_jiffies(10) + 1);
> >  
> >  	/* If RTO is shorter, just schedule TLP in its place. */
> >  	tlp_time_stamp = tcp_time_stamp + timeout;
> 
> Have you really hit an issue, or did you send this patch after all these
> msecs_to_jiffies() discussions on lkml/netdev ?

This actually fixed a specific issue I ran into. This issue caused a
degradation in throughput in a benchmark which sent relatively small
chunks of data (100KB) in a loop. The impact was quite substantial -
with this patch, throughput increased by up to 50% on the platform this
was tested on.

> 
> Not sure this is the right fix.
> 
> TLP was really tested with an effective min delay of 10ms.
> 
> Adding 10% for the sake of crazy HZ=100 builds seems a high price.
> (All recent TCP changes were tested with HZ=1000 BTW ...)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 534e5fdb04c11152bae36f47a786e8b10b823cd3..5321df89af9b59c6727395c489e6f9b2770dcd5e 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  		timeout = max_t(u32, timeout,
>  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
>  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> +#if HZ <= 100
> +	timeout = max_t(u32, timeout, 2);
> +#endif
>  
>  	/* If RTO is shorter, just schedule TLP in its place. */
>  	tlp_time_stamp = tcp_time_stamp + timeout;

This was actually the first incarnation of this patch. However, while
the impact of this issue when HZ=100 is the greatest, it can also impact
other settings as well. For instance, if HZ=250, the timer could expire
after a bit over 8ms instead of 10ms, and 9ms for HZ=1000.

By increasing the number of jiffies, we ensure that we'll wait at least
10ms but never less than that, so for HZ=1000, it'll be anywhere between
10ms and 11ms instead of 9ms and 10ms.

Thanks,
Ido.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 17:02   ` Ido Yariv
@ 2015-05-26 17:13     ` Eric Dumazet
  2015-05-26 17:55       ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-26 17:13 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Tue, 2015-05-26 at 13:02 -0400, Ido Yariv wrote:
> Hi Eric,
> 
> On Tue, May 26, 2015 at 09:23:55AM -0700, Eric Dumazet wrote:
> > On Tue, 2015-05-26 at 10:25 -0400, Ido Yariv wrote:
> > > The Tail Loss Probe RFC specifies that the PTO value should be set to
> > > max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> > > 
> > > The PTO value is converted to jiffies, so the timer might expire
> > > prematurely. This is especially problematic on systems in which HZ=100.
> > > 
> > > To work around this issue, increase the number of jiffies by one,
> > > ensuring that the timeout won't expire in less than 10ms.
> > > 
> > > Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> > > ---
> > >  net/ipv4/tcp_output.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 534e5fd..6f57d3d 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> > >  	if (tp->packets_out == 1)
> > >  		timeout = max_t(u32, timeout,
> > >  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
> > > -	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> > > +	timeout = max_t(u32, timeout, msecs_to_jiffies(10) + 1);
> > >  
> > >  	/* If RTO is shorter, just schedule TLP in its place. */
> > >  	tlp_time_stamp = tcp_time_stamp + timeout;
> > 
> > Have you really hit an issue, or did you send this patch after all these
> > msecs_to_jiffies() discussions on lkml/netdev ?
> 
> This actually fixed a specific issue I ran into. This issue caused a
> degradation in throughput in a benchmark which sent relatively small
> chunks of data (100KB) in a loop. The impact was quite substantial -
> with this patch, throughput increased by up to 50% on the platform this
> was tested on.


Really ? You have more problems if your benchmark relies on TLP.

Please share your setup, because I suspect you hit other more serious
bugs.



> This was actually the first incarnation of this patch. However, while
> the impact of this issue when HZ=100 is the greatest, it can also impact
> other settings as well. For instance, if HZ=250, the timer could expire
> after a bit over 8ms instead of 10ms, and 9ms for HZ=1000.
> 
> By increasing the number of jiffies, we ensure that we'll wait at least
> 10ms but never less than that, so for HZ=1000, it'll be anywhere between
> 10ms and 11ms instead of 9ms and 10ms.

Yes, but we do not want to blindly increase this timeout, tested few
years ago with this exact value : between 9 and 10 ms. Not between 10
and 11 ms, with an added 10% in max latencies.






^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 17:13     ` Eric Dumazet
@ 2015-05-26 17:55       ` Ido Yariv
  2015-05-26 18:13         ` Eric Dumazet
  2015-05-26 18:25         ` [PATCH] " Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Ido Yariv @ 2015-05-26 17:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Tue, May 26, 2015 at 10:13:40AM -0700, Eric Dumazet wrote:
> On Tue, 2015-05-26 at 13:02 -0400, Ido Yariv wrote:
> > Hi Eric,
> > 
> > On Tue, May 26, 2015 at 09:23:55AM -0700, Eric Dumazet wrote:
> > > Have you really hit an issue, or did you send this patch after all these
> > > msecs_to_jiffies() discussions on lkml/netdev ?
> > 
> > This actually fixed a specific issue I ran into. This issue caused a
> > degradation in throughput in a benchmark which sent relatively small
> > chunks of data (100KB) in a loop. The impact was quite substantial -
> > with this patch, throughput increased by up to 50% on the platform this
> > was tested on.
> 
> 
> Really ? You have more problems if your benchmark relies on TLP.
> 
> Please share your setup, because I suspect you hit other more serious
> bugs.

The platform this was tested on was an embedded platform with a wifi
module (11n, 20MHZ). The other end was a computer running Windows, and
the benchmarking software was IxChariot.
The whole setup was running in a shielded box with minimal
interferences.

As it seems, the throughput was limited by the congestion window.
Further analysis led to TLP - the fact that its timer was expiring
prematurely impacted cwnd, which in turn prevented the wireless driver
from having enough skbs to buffer and send.

Increasing the size of the chunks being sent had a similar impact on
throughput, presumably because the congestion window had enough time to
increase.

Changing the congestion window to Westwood from cubic/reno also had a
similar impact on throughput.

> > This was actually the first incarnation of this patch. However, while
> > the impact of this issue when HZ=100 is the greatest, it can also impact
> > other settings as well. For instance, if HZ=250, the timer could expire
> > after a bit over 8ms instead of 10ms, and 9ms for HZ=1000.
> > 
> > By increasing the number of jiffies, we ensure that we'll wait at least
> > 10ms but never less than that, so for HZ=1000, it'll be anywhere between
> > 10ms and 11ms instead of 9ms and 10ms.
> 
> Yes, but we do not want to blindly increase this timeout, tested few
> years ago with this exact value : between 9 and 10 ms. Not between 10
> and 11 ms, with an added 10% in max latencies.

I understand, and I also suspect that having it expire after 9ms will
have very little impact, if at all.

Since it mainly affects HZ=100 systems, we can simply go with having at
least 2 jiffies on these systems, and leave everything else as is.

However, if the 10ms has a special meaning (couldn't find reasoning for
it in the RFC), making sure this timer doesn't expire prematurely could
be beneficial. I'm afraid this was not tested on the setup mentioned
above though.

Thanks,
Ido.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 17:55       ` Ido Yariv
@ 2015-05-26 18:13         ` Eric Dumazet
  2015-05-26 20:17           ` Ido Yariv
  2015-05-26 18:25         ` [PATCH] " Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-26 18:13 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:
> Hi Eric,
> 

> 
> I understand, and I also suspect that having it expire after 9ms will
> have very little impact, if at all.
> 
> Since it mainly affects HZ=100 systems, we can simply go with having at
> least 2 jiffies on these systems, and leave everything else as is.
> 
> However, if the 10ms has a special meaning (couldn't find reasoning for
> it in the RFC), making sure this timer doesn't expire prematurely could
> be beneficial. I'm afraid this was not tested on the setup mentioned
> above though.
> 


RFC did not explain how 10ms delay was implemented. This is the kind of
dark side. RFC are full of 'unsaid things', like OS bugs.

What is not said in TLP paper is : linux timers have a 'jiffie'
granularity that might be 1/100, 1/250, 1/1000, or even 1/64 on Alpha
processors...

Fact is : We did TLP implementation and experimentations and paper at
the same time, and we do not want to change the current behavior on
HZ=1000 hosts. This is the kind of change that would require lot of
tests for Google. 

Please resend your patch so that only HZ <= 100 is changed, we will
happily acknowledge it.

Thanks



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 17:55       ` Ido Yariv
  2015-05-26 18:13         ` Eric Dumazet
@ 2015-05-26 18:25         ` Eric Dumazet
  2015-05-26 19:39           ` Ido Yariv
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-26 18:25 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:

> 
> The platform this was tested on was an embedded platform with a wifi
> module (11n, 20MHZ). The other end was a computer running Windows, and
> the benchmarking software was IxChariot.
> The whole setup was running in a shielded box with minimal
> interferences.
> 
> As it seems, the throughput was limited by the congestion window.
> Further analysis led to TLP - the fact that its timer was expiring
> prematurely impacted cwnd, which in turn prevented the wireless driver
> from having enough skbs to buffer and send.
> 
> Increasing the size of the chunks being sent had a similar impact on
> throughput, presumably because the congestion window had enough time to
> increase.
> 
> Changing the congestion window to Westwood from cubic/reno also had a
> similar impact on throughput.
> 

Have you tested what results you had by completely disabling TLP ?

Maybe a timer of 10 to 20ms is too short anyway in your testbed.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 18:25         ` [PATCH] " Eric Dumazet
@ 2015-05-26 19:39           ` Ido Yariv
  0 siblings, 0 replies; 21+ messages in thread
From: Ido Yariv @ 2015-05-26 19:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Tue, May 26, 2015 at 11:25:21AM -0700, Eric Dumazet wrote:
> On Tue, 2015-05-26 at 13:55 -0400, Ido Yariv wrote:
> 
> > 
> > The platform this was tested on was an embedded platform with a wifi
> > module (11n, 20MHZ). The other end was a computer running Windows, and
> > the benchmarking software was IxChariot.
> > The whole setup was running in a shielded box with minimal
> > interferences.
> > 
> > As it seems, the throughput was limited by the congestion window.
> > Further analysis led to TLP - the fact that its timer was expiring
> > prematurely impacted cwnd, which in turn prevented the wireless driver
> > from having enough skbs to buffer and send.
> > 
> > Increasing the size of the chunks being sent had a similar impact on
> > throughput, presumably because the congestion window had enough time to
> > increase.
> > 
> > Changing the congestion window to Westwood from cubic/reno also had a
> > similar impact on throughput.
> > 
> 
> Have you tested what results you had by completely disabling TLP ?
> 
> Maybe a timer of 10 to 20ms is too short anyway in your testbed.

Yes, I have (by writing 2 to /proc/sys/net/ipv4/tcp_early_retrans), and
it also had a similar effect. That was actually the first workaround for
this issue, before the issue in TLP was traced.

With 10ms to 20ms timers the throughput was just the same as disabling
TLP altogether, so it seems it was just enough to handle.

Cheers,
Ido.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 18:13         ` Eric Dumazet
@ 2015-05-26 20:17           ` Ido Yariv
  2015-05-27 11:36             ` David Laight
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-26 20:17 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati,
	Eric Dumazet, netdev, linux-kernel
  Cc: Ido Yariv, Ido Yariv

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 net/ipv4/tcp_output.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fd..5321df8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
 	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+#if HZ <= 100
+	timeout = max_t(u32, timeout, 2);
+#endif
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-26 20:17           ` Ido Yariv
@ 2015-05-27 11:36             ` David Laight
  2015-05-27 13:41               ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2015-05-27 11:36 UTC (permalink / raw)
  To: 'Ido Yariv',
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati,
	Eric Dumazet, netdev, linux-kernel
  Cc: Ido Yariv

From: Of Ido Yariv
> Sent: 26 May 2015 21:17
> The Tail Loss Probe RFC specifies that the PTO value should be set to
> max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> 
> The PTO value is converted to jiffies, so the timer may expire
> prematurely.
> 
> This is especially problematic on systems in which HZ <= 100, so work
> around this by setting the timeout to at least 2 jiffies on such
> systems.
> 
> The 10ms figure was originally selected based on tests performed with
> the current implementation and HZ = 1000. Thus, leave the behavior on
> systems with HZ > 100 unchanged.
> 
> Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> ---
>  net/ipv4/tcp_output.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 534e5fd..5321df8 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  		timeout = max_t(u32, timeout,
>  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
>  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> +#if HZ <= 100
> +	timeout = max_t(u32, timeout, 2);
> +#endif

Why not:
	timeout = max_t(u32, timeout, max_t(u32, 2, msecs_to_jiffies(10)));
I think the RH max_t() is a compile time constant.

You need 2 jiffies to guarantee a non-zero timeout.
Even if HZ=199 with a 'rounding down' msecs_to_jiffies() you get 1 jiffy
and a possible immediate timeout.

There are probably other places where msecs_to_jiffies() better not return 1.

	David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 11:36             ` David Laight
@ 2015-05-27 13:41               ` Eric Dumazet
  2015-05-27 14:40                 ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-27 13:41 UTC (permalink / raw)
  To: David Laight
  Cc: 'Ido Yariv',
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Wed, 2015-05-27 at 11:36 +0000, David Laight wrote:
> From: Of Ido Yariv
> > Sent: 26 May 2015 21:17
> > The Tail Loss Probe RFC specifies that the PTO value should be set to
> > max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> > 
> > The PTO value is converted to jiffies, so the timer may expire
> > prematurely.
> > 
> > This is especially problematic on systems in which HZ <= 100, so work
> > around this by setting the timeout to at least 2 jiffies on such
> > systems.
> > 
> > The 10ms figure was originally selected based on tests performed with
> > the current implementation and HZ = 1000. Thus, leave the behavior on
> > systems with HZ > 100 unchanged.
> > 
> > Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> > ---
> >  net/ipv4/tcp_output.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 534e5fd..5321df8 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >  		timeout = max_t(u32, timeout,
> >  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
> >  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> > +#if HZ <= 100
> > +	timeout = max_t(u32, timeout, 2);
> > +#endif
> 
> Why not:
> 	timeout = max_t(u32, timeout, max_t(u32, 2, msecs_to_jiffies(10)));
> I think the RH max_t() is a compile time constant.
> 
> You need 2 jiffies to guarantee a non-zero timeout.
> Even if HZ=199 with a 'rounding down' msecs_to_jiffies() you get 1 jiffy
> and a possible immediate timeout.
> 

Have you followed previous discussions ?

I guess we can have a helper macro, but for the moment only one spot was
found.

Its kind of depressing having to deal with HZ=100 issues, with modern
NO_HZ configurations.

TCP rtts have now usec resolution, so HZ=100 is pushing TCP to very
imprecise behavior.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 13:41               ` Eric Dumazet
@ 2015-05-27 14:40                 ` Ido Yariv
  2015-05-27 14:56                   ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-27 14:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Wed, May 27, 2015 at 06:41:17AM -0700, Eric Dumazet wrote:
> On Wed, 2015-05-27 at 11:36 +0000, David Laight wrote:
> > From: Of Ido Yariv
> > > Sent: 26 May 2015 21:17
> > > The Tail Loss Probe RFC specifies that the PTO value should be set to
> > > max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.
> > > 
> > > The PTO value is converted to jiffies, so the timer may expire
> > > prematurely.
> > > 
> > > This is especially problematic on systems in which HZ <= 100, so work
> > > around this by setting the timeout to at least 2 jiffies on such
> > > systems.
> > > 
> > > The 10ms figure was originally selected based on tests performed with
> > > the current implementation and HZ = 1000. Thus, leave the behavior on
> > > systems with HZ > 100 unchanged.
> > > 
> > > Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> > > ---
> > >  net/ipv4/tcp_output.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 534e5fd..5321df8 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2208,6 +2208,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> > >  		timeout = max_t(u32, timeout,
> > >  				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
> > >  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> > > +#if HZ <= 100
> > > +	timeout = max_t(u32, timeout, 2);
> > > +#endif
> > 
> > Why not:
> > 	timeout = max_t(u32, timeout, max_t(u32, 2, msecs_to_jiffies(10)));
> > I think the RH max_t() is a compile time constant.
> > 
> > You need 2 jiffies to guarantee a non-zero timeout.
> > Even if HZ=199 with a 'rounding down' msecs_to_jiffies() you get 1 jiffy
> > and a possible immediate timeout.
> > 
> 
> Have you followed previous discussions ?
> 
> I guess we can have a helper macro, but for the moment only one spot was
> found.
> 
> Its kind of depressing having to deal with HZ=100 issues, with modern
> NO_HZ configurations.
> 
> TCP rtts have now usec resolution, so HZ=100 is pushing TCP to very
> imprecise behavior.

HZ=100 is used on some embedded platforms, so it's still something we
have to deal with unfortunately..

Since the '2' here is a lower bound, and msecs_to_jiffies(10) will
return values greater than 2 for HZ>100 anyway, always ensuring the
2 jiffies lower bound shouldn't impact the behavior when HZ=1000.

However, as far as I can tell, comparing msecs_to_jiffies(10) to 2, or
comparing the whole timeout to 2 doesn't make much difference, since
msecs_to_jiffies isn't inlined.

In other words, keeping the #if shouldn't make much difference in behavior,
but will save the small comparison.

Cheers,
Ido.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 14:40                 ` Ido Yariv
@ 2015-05-27 14:56                   ` Eric Dumazet
  2015-05-27 15:23                     ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-27 14:56 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Wed, 2015-05-27 at 10:40 -0400, Ido Yariv wrote:

> HZ=100 is used on some embedded platforms, so it's still something we
> have to deal with unfortunately..
> 
> Since the '2' here is a lower bound, and msecs_to_jiffies(10) will
> return values greater than 2 for HZ>100 anyway, always ensuring the
> 2 jiffies lower bound shouldn't impact the behavior when HZ=1000.
> 
> However, as far as I can tell, comparing msecs_to_jiffies(10) to 2, or
> comparing the whole timeout to 2 doesn't make much difference, since
> msecs_to_jiffies isn't inlined.
> 
> In other words, keeping the #if shouldn't make much difference in behavior,
> but will save the small comparison.

Yes, I guess David point is to have a macro in include/linux/tcp.h so
that we can have a nice comment, and not having #if ... in a C file.

Maybe other timers in TCP need the same care (I am not asking you to
find them, but having a macro would ease things perhaps)




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 14:56                   ` Eric Dumazet
@ 2015-05-27 15:23                     ` Ido Yariv
  2015-05-27 16:23                       ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-27 15:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Wed, May 27, 2015 at 07:56:25AM -0700, Eric Dumazet wrote:
> On Wed, 2015-05-27 at 10:40 -0400, Ido Yariv wrote:
> 
> > HZ=100 is used on some embedded platforms, so it's still something we
> > have to deal with unfortunately..
> > 
> > Since the '2' here is a lower bound, and msecs_to_jiffies(10) will
> > return values greater than 2 for HZ>100 anyway, always ensuring the
> > 2 jiffies lower bound shouldn't impact the behavior when HZ=1000.
> > 
> > However, as far as I can tell, comparing msecs_to_jiffies(10) to 2, or
> > comparing the whole timeout to 2 doesn't make much difference, since
> > msecs_to_jiffies isn't inlined.
> > 
> > In other words, keeping the #if shouldn't make much difference in behavior,
> > but will save the small comparison.
> 
> Yes, I guess David point is to have a macro in include/linux/tcp.h so
> that we can have a nice comment, and not having #if ... in a C file.
> 
> Maybe other timers in TCP need the same care (I am not asking you to
> find them, but having a macro would ease things perhaps)

Something along the lines of the patch below?

Thanks,
Ido.

>From 562019884d1b2c7619ce3f49ecb595147d28bbdd Mon Sep 17 00:00:00 2001
From: Ido Yariv <ido@wizery.com>
Date: Thu, 21 May 2015 08:23:13 +0200
Subject: [PATCH v3] net: tcp: Fix a PTO timing granularity issue

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 include/net/tcp.h     | 9 +++++++++
 net/ipv4/tcp_output.c | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2bb2bad..86090b6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1751,4 +1751,13 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
 	skb->truesize = 2;
 }
 
+/* Convert msecs to jiffies, ensuring that the return value is always at least
+ * 2. This can be used when setting tick-based timers to guarantee that they
+ * won't expire right away.
+ */
+static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)
+{
+	return max_t(u32, 2, msecs_to_jiffies(m));
+}
+
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fd..83021c5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	if (tp->packets_out == 1)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
-	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+	timeout = max_t(u32, timeout, tcp_safe_msecs_to_jiffies(10));
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 15:23                     ` Ido Yariv
@ 2015-05-27 16:23                       ` Eric Dumazet
  2015-05-27 16:54                         ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-27 16:23 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Wed, 2015-05-27 at 11:23 -0400, Ido Yariv wrote:

> Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> ---
>  include/net/tcp.h     | 9 +++++++++
>  net/ipv4/tcp_output.c | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2bb2bad..86090b6 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1751,4 +1751,13 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
>  	skb->truesize = 2;
>  }
>  
> +/* Convert msecs to jiffies, ensuring that the return value is always at least
> + * 2. This can be used when setting tick-based timers to guarantee that they
> + * won't expire right away.
> + */
> +static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)
> +{
> +	return max_t(u32, 2, msecs_to_jiffies(m));
> +}

Note that you can do slightly better if m is a constant at compile
time ;)

if (__builtin_constant_p(m) && m * 1000 >= 2 * HZ)
	return msecs_to_jiffies(m);
return max_t(u32, 2, msecs_to_jiffies(m));

Or something like that ?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 16:23                       ` Eric Dumazet
@ 2015-05-27 16:54                         ` Ido Yariv
  2015-05-27 17:24                           ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-27 16:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Wed, May 27, 2015 at 09:23:36AM -0700, Eric Dumazet wrote:
> On Wed, 2015-05-27 at 11:23 -0400, Ido Yariv wrote:
> 
> > Signed-off-by: Ido Yariv <idox.yariv@intel.com>
> > ---
> >  include/net/tcp.h     | 9 +++++++++
> >  net/ipv4/tcp_output.c | 2 +-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 2bb2bad..86090b6 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -1751,4 +1751,13 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
> >  	skb->truesize = 2;
> >  }
> >  
> > +/* Convert msecs to jiffies, ensuring that the return value is always at least
> > + * 2. This can be used when setting tick-based timers to guarantee that they
> > + * won't expire right away.
> > + */
> > +static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)
> > +{
> > +	return max_t(u32, 2, msecs_to_jiffies(m));
> > +}
> 
> Note that you can do slightly better if m is a constant at compile
> time ;)
> 
> if (__builtin_constant_p(m) && m * 1000 >= 2 * HZ)
> 	return msecs_to_jiffies(m);
> return max_t(u32, 2, msecs_to_jiffies(m));
> 
> Or something like that ?

That's a nice optimization ;)

However, I think that with Nicholas Mc Guire's recent changes to
msecs_to_jiffies (http://marc.info/?l=linux-kernel&m=143195210010666),
we should get this for free, no?

Cheers,
Ido.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 16:54                         ` Ido Yariv
@ 2015-05-27 17:24                           ` Eric Dumazet
  2015-05-27 19:15                             ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2015-05-27 17:24 UTC (permalink / raw)
  To: Ido Yariv
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

On Wed, 2015-05-27 at 12:54 -0400, Ido Yariv wrote:
> Hi Eric,

> That's a nice optimization ;)
> 
> However, I think that with Nicholas Mc Guire's recent changes to
> msecs_to_jiffies (http://marc.info/?l=linux-kernel&m=143195210010666),
> we should get this for free, no?

Well, on net and net-next tree we currently have :

$ grep msecs_to_jiffies include/linux/jiffies.h
extern unsigned long msecs_to_jiffies(const unsigned int m);

Given your patch is for stable, I would not mind having this done
anyway.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 17:24                           ` Eric Dumazet
@ 2015-05-27 19:15                             ` Ido Yariv
  2015-05-28  4:37                               ` Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-27 19:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Wed, May 27, 2015 at 10:24:16AM -0700, Eric Dumazet wrote:
> On Wed, 2015-05-27 at 12:54 -0400, Ido Yariv wrote:
> > Hi Eric,
> 
> > That's a nice optimization ;)
> > 
> > However, I think that with Nicholas Mc Guire's recent changes to
> > msecs_to_jiffies (http://marc.info/?l=linux-kernel&m=143195210010666),
> > we should get this for free, no?
> 
> Well, on net and net-next tree we currently have :
> 
> $ grep msecs_to_jiffies include/linux/jiffies.h
> extern unsigned long msecs_to_jiffies(const unsigned int m);
> 
> Given your patch is for stable, I would not mind having this done
> anyway.

I believe these changes are in tip, but not in net/net-next just yet.

I actually didn't think this patch is for stable, but we can certainly
do that.

Would you be fine with the patch below? Please note that I modified your
optimization a bit.

Cheers,
Ido.

>From 6b406d4746fe5523f29863c10e43ab8bba913307 Mon Sep 17 00:00:00 2001
From: Ido Yariv <ido@wizery.com>
Date: Thu, 21 May 2015 08:23:13 +0200
Subject: [PATCH v4] net: tcp: Fix a PTO timing granularity issue

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 include/net/tcp.h     | 17 +++++++++++++++++
 net/ipv4/tcp_output.c |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2bb2bad..d22c93c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1751,4 +1751,21 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
 	skb->truesize = 2;
 }
 
+/* Convert msecs to jiffies, ensuring that the return value is always at least
+ * 2 jiffies.
+ * This can be used when setting tick-based timers to guarantee that they won't
+ * expire right away.
+ */
+static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)
+{
+	if (__builtin_constant_p(m)) {
+		if (m * HZ > MSEC_PER_SEC)
+			return msecs_to_jiffies(m);
+
+		return 2;
+	}
+
+	return max_t(u32, 2, msecs_to_jiffies(m));
+}
+
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fd..83021c5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	if (tp->packets_out == 1)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
-	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+	timeout = max_t(u32, timeout, tcp_safe_msecs_to_jiffies(10));
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-27 19:15                             ` Ido Yariv
@ 2015-05-28  4:37                               ` Ido Yariv
  2015-05-28  8:55                                 ` David Laight
  0 siblings, 1 reply; 21+ messages in thread
From: Ido Yariv @ 2015-05-28  4:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

Hi Eric,

On Wed, May 27, 2015 at 03:15:26PM -0400, Ido Yariv wrote:
> Hi Eric,
> 
> On Wed, May 27, 2015 at 10:24:16AM -0700, Eric Dumazet wrote:
> > On Wed, 2015-05-27 at 12:54 -0400, Ido Yariv wrote:
> > > Hi Eric,
> > 
> > > That's a nice optimization ;)
> > > 
> > > However, I think that with Nicholas Mc Guire's recent changes to
> > > msecs_to_jiffies (http://marc.info/?l=linux-kernel&m=143195210010666),
> > > we should get this for free, no?
> > 
> > Well, on net and net-next tree we currently have :
> > 
> > $ grep msecs_to_jiffies include/linux/jiffies.h
> > extern unsigned long msecs_to_jiffies(const unsigned int m);
> > 
> > Given your patch is for stable, I would not mind having this done
> > anyway.
> 
> I believe these changes are in tip, but not in net/net-next just yet.
> 
> I actually didn't think this patch is for stable, but we can certainly
> do that.
> 
> Would you be fine with the patch below? Please note that I modified your
> optimization a bit.

We'd probably like to avoid any potential integer overflows as well, so
I modified the if statement accordingly.

Not sure this optimization is really beneficial (and only on stable
kernels), but here's the updated patch.

Cheers,
Ido.

>From 8d78f75cde8ef523a312e6cfab8e1b7a89f97c9f Mon Sep 17 00:00:00 2001
From: Ido Yariv <ido@wizery.com>
Date: Thu, 21 May 2015 08:23:13 +0200
Subject: [PATCH v5] net: tcp: Fix a PTO timing granularity issue

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 include/net/tcp.h     | 20 ++++++++++++++++++++
 net/ipv4/tcp_output.c |  2 +-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2bb2bad..19ed4c0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1751,4 +1751,24 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
 	skb->truesize = 2;
 }
 
+/* Convert msecs to jiffies, ensuring that the return value is at least 2
+ * jiffies.
+ * This can be used when setting tick-based timers to guarantee that they won't
+ * expire right away.
+ */
+static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)
+{
+	if (__builtin_constant_p(m)) {
+		/* The theoretical upper bound of m for 2 jiffies is 2 seconds,
+		 * so compare m with that to avoid potential integer overflows.
+		 */
+		if ((m > 2 * MSEC_PER_SEC) || (m * HZ > 2 * MSEC_PER_SEC))
+			return msecs_to_jiffies(m);
+
+		return 2;
+	}
+
+	return max_t(u32, 2, msecs_to_jiffies(m));
+}
+
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 190538a..26cc5a6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	if (tp->packets_out == 1)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
-	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+	timeout = max_t(u32, timeout, tcp_safe_msecs_to_jiffies(10));
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH] net: tcp: Fix a PTO timing granularity issue
  2015-05-28  4:37                               ` Ido Yariv
@ 2015-05-28  8:55                                 ` David Laight
  2015-05-28 12:33                                   ` [PATCH v6] " Ido Yariv
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2015-05-28  8:55 UTC (permalink / raw)
  To: 'Ido Yariv', Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati, netdev,
	linux-kernel, Ido Yariv

From: Ido Yariv
> Sent: 28 May 2015 05:37
...
> +/* Convert msecs to jiffies, ensuring that the return value is at least 2
> + * jiffies.
> + * This can be used when setting tick-based timers to guarantee that they won't
> + * expire right away.
> + */
> +static inline unsigned long tcp_safe_msecs_to_jiffies(const unsigned int m)

I don't like using 'safe' in function names, being 'safe; depends on what
the caller wants.
Maybe tcp_msecs_to_jiffies_min_2() would be better.

	David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v6] net: tcp: Fix a PTO timing granularity issue
  2015-05-28  8:55                                 ` David Laight
@ 2015-05-28 12:33                                   ` Ido Yariv
  0 siblings, 0 replies; 21+ messages in thread
From: Ido Yariv @ 2015-05-28 12:33 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Nandita Dukkipati,
	Eric Dumazet, David Laight, netdev, linux-kernel
  Cc: Ido Yariv, Ido Yariv

The Tail Loss Probe RFC specifies that the PTO value should be set to
max(2 * SRTT, 10ms), where SRTT is the smoothed round-trip time.

The PTO value is converted to jiffies, so the timer may expire
prematurely.

This is especially problematic on systems in which HZ <= 100, so work
around this by setting the timeout to at least 2 jiffies on such
systems.

The 10ms figure was originally selected based on tests performed with
the current implementation and HZ = 1000. Thus, leave the behavior on
systems with HZ > 100 unchanged.

Signed-off-by: Ido Yariv <idox.yariv@intel.com>
---
 include/net/tcp.h     | 20 ++++++++++++++++++++
 net/ipv4/tcp_output.c |  2 +-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2bb2bad..2ff0181 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1751,4 +1751,24 @@ static inline void skb_set_tcp_pure_ack(struct sk_buff *skb)
 	skb->truesize = 2;
 }
 
+/* Convert msecs to jiffies, ensuring that the return value is at least 2
+ * jiffies.
+ * This can be used when setting tick-based timers to guarantee that they won't
+ * expire right away.
+ */
+static inline unsigned long tcp_msecs_to_jiffies_min_2(const unsigned int m)
+{
+	if (__builtin_constant_p(m)) {
+		/* The theoretical upper bound of m for 2 jiffies is 2 seconds,
+		 * so compare m with that to avoid potential integer overflows.
+		 */
+		if ((m > 2 * MSEC_PER_SEC) || (m * HZ > 2 * MSEC_PER_SEC))
+			return msecs_to_jiffies(m);
+
+		return 2;
+	}
+
+	return max_t(u32, 2, msecs_to_jiffies(m));
+}
+
 #endif	/* _TCP_H */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 190538a..37694d2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2207,7 +2207,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
 	if (tp->packets_out == 1)
 		timeout = max_t(u32, timeout,
 				(rtt + (rtt >> 1) + TCP_DELACK_MAX));
-	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
+	timeout = max_t(u32, timeout, tcp_msecs_to_jiffies_min_2(10));
 
 	/* If RTO is shorter, just schedule TLP in its place. */
 	tlp_time_stamp = tcp_time_stamp + timeout;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-05-28 12:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-26 14:25 [PATCH] net: tcp: Fix a PTO timing granularity issue Ido Yariv
2015-05-26 16:23 ` Eric Dumazet
2015-05-26 17:02   ` Ido Yariv
2015-05-26 17:13     ` Eric Dumazet
2015-05-26 17:55       ` Ido Yariv
2015-05-26 18:13         ` Eric Dumazet
2015-05-26 20:17           ` Ido Yariv
2015-05-27 11:36             ` David Laight
2015-05-27 13:41               ` Eric Dumazet
2015-05-27 14:40                 ` Ido Yariv
2015-05-27 14:56                   ` Eric Dumazet
2015-05-27 15:23                     ` Ido Yariv
2015-05-27 16:23                       ` Eric Dumazet
2015-05-27 16:54                         ` Ido Yariv
2015-05-27 17:24                           ` Eric Dumazet
2015-05-27 19:15                             ` Ido Yariv
2015-05-28  4:37                               ` Ido Yariv
2015-05-28  8:55                                 ` David Laight
2015-05-28 12:33                                   ` [PATCH v6] " Ido Yariv
2015-05-26 18:25         ` [PATCH] " Eric Dumazet
2015-05-26 19:39           ` Ido Yariv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).