All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
@ 2016-09-01 11:14 Liav Rehana
  2016-09-02  8:49 ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Liav Rehana @ 2016-09-01 11:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, john.stultz, noamca, eladkan, Liav Rehana

From: Liav Rehana <liavr@mellanox.com>

During the calculation of the nsec variable, "delta * tkr->mult" may cause
overflow to the msb, if the suspended time is too long.
In that case, we need to guarantee that the variable will not go through a
sign extension during its shift, and thus it will result in a much higher
value - close to the larget value of 64 bits.
The following commit fixes this problem, which causes the following bug:
Trying to connect through ftp to the os after a long enough suspended time
will cause the nsec variable to get a much higher value after its shift
because of sign extension, and thus the loop that follows some instructions
afterwards, implemented in the inline function __iter_div_u64_rem, will
take too long.

Signed-off-by: Liav Rehana <liavr@mellanox.com>
---
 kernel/time/timekeeping.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 479d25c..ddf56a5 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
 	s64 nsec;
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
-	nsec >>= tkr->shift;
+	nsec = ((u64) nsec) >> tkr->shift;
 
 	/* If arch requires, add in get_arch_timeoffset() */
 	return nsec + arch_gettimeoffset();
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-01 11:14 [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation Liav Rehana
@ 2016-09-02  8:49 ` Thomas Gleixner
  2016-09-02 18:30   ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2016-09-02  8:49 UTC (permalink / raw)
  To: Liav Rehana; +Cc: linux-kernel, john.stultz, noamca, eladkan

On Thu, 1 Sep 2016, Liav Rehana wrote:
> From: Liav Rehana <liavr@mellanox.com>
> 
> During the calculation of the nsec variable, "delta * tkr->mult" may cause
> overflow to the msb, if the suspended time is too long.
> In that case, we need to guarantee that the variable will not go through a
> sign extension during its shift, and thus it will result in a much higher
> value - close to the larget value of 64 bits.
> The following commit fixes this problem, which causes the following bug:
> Trying to connect through ftp to the os after a long enough suspended time
> will cause the nsec variable to get a much higher value after its shift
> because of sign extension, and thus the loop that follows some instructions
> afterwards, implemented in the inline function __iter_div_u64_rem, will
> take too long.
> 
> Signed-off-by: Liav Rehana <liavr@mellanox.com>
> ---
>  kernel/time/timekeeping.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 479d25c..ddf56a5 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>  	s64 nsec;
>  
>  	nsec = delta * tkr->mult + tkr->xtime_nsec;
> -	nsec >>= tkr->shift;
> +	nsec = ((u64) nsec) >> tkr->shift;

This typecast is just a baindaid. What happens if you double the suspend time?
The multiplication will simply overflow. So the proper fix is to sanity check
delta and do multiple conversions if delta is big enough. Preferrably this
happens somewhere at the call site and not in this hotpath function.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-02  8:49 ` Thomas Gleixner
@ 2016-09-02 18:30   ` Thomas Gleixner
  2016-09-02 18:47     ` Thomas Gleixner
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Thomas Gleixner @ 2016-09-02 18:30 UTC (permalink / raw)
  To: Liav Rehana; +Cc: linux-kernel, john.stultz, noamca, eladkan

On Fri, 2 Sep 2016, Thomas Gleixner wrote:
> On Thu, 1 Sep 2016, Liav Rehana wrote:
> > From: Liav Rehana <liavr@mellanox.com>
> > 
> > During the calculation of the nsec variable, "delta * tkr->mult" may cause
> > overflow to the msb, if the suspended time is too long.
> > In that case, we need to guarantee that the variable will not go through a
> > sign extension during its shift, and thus it will result in a much higher
> > value - close to the larget value of 64 bits.
> > The following commit fixes this problem, which causes the following bug:
> > Trying to connect through ftp to the os after a long enough suspended time
> > will cause the nsec variable to get a much higher value after its shift
> > because of sign extension, and thus the loop that follows some instructions
> > afterwards, implemented in the inline function __iter_div_u64_rem, will
> > take too long.
> > 
> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
> > ---
> >  kernel/time/timekeeping.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 479d25c..ddf56a5 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> >  	s64 nsec;
> >  
> >  	nsec = delta * tkr->mult + tkr->xtime_nsec;
> > -	nsec >>= tkr->shift;
> > +	nsec = ((u64) nsec) >> tkr->shift;
> 
> This typecast is just a baindaid. What happens if you double the suspend time?
> The multiplication will simply overflow. So the proper fix is to sanity check
> delta and do multiple conversions if delta is big enough. Preferrably this
> happens somewhere at the call site and not in this hotpath function.

As a side note. John, why is that stuff unsigned at all? Shouldn't we use
u64 for all of this?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-02 18:30   ` Thomas Gleixner
@ 2016-09-02 18:47     ` Thomas Gleixner
  2016-09-04  6:40     ` Liav Rehana
  2016-09-07  3:22     ` John Stultz
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2016-09-02 18:47 UTC (permalink / raw)
  To: Liav Rehana; +Cc: linux-kernel, john.stultz, noamca, eladkan

On Fri, 2 Sep 2016, Thomas Gleixner wrote:
> 
> As a side note. John, why is that stuff unsigned at all? Shouldn't we use

s/unsigned/signed/

> u64 for all of this?
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-02 18:30   ` Thomas Gleixner
  2016-09-02 18:47     ` Thomas Gleixner
@ 2016-09-04  6:40     ` Liav Rehana
  2016-09-04  9:01       ` Thomas Gleixner
  2016-09-07  3:22     ` John Stultz
  2 siblings, 1 reply; 12+ messages in thread
From: Liav Rehana @ 2016-09-04  6:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, john.stultz, Noam Camus, Elad Kanfi

Hi Thomas,

The root of the problem is that in case the multiplication of delta and tkr->mult in the line that I've changed
is too big that the MSB of the result is set, then the shift will cause an unwanted sign extension.

That sign extension will be avoided completely if the variable nsec was unsigned (u64 instead of s64), so I
think the correct solution for this is to change the type of nsec to u64.

Do you agree ?
If not, what is the reason for it being of s64 type in the first place ?
Is there any place where we want the nsec variable to contain a negative value ?
Moreover, how can we cope with the sign extension problems in this case ?

Thanks,
Liav.

-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de] 
Sent: Friday, September 02, 2016 9:31 PM
To: Liav Rehana <liavr@mellanox.com>
Cc: linux-kernel@vger.kernel.org; john.stultz@linaro.org; Noam Camus <noamca@mellanox.com>; Elad Kanfi <eladkan@mellanox.com>
Subject: Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.

On Fri, 2 Sep 2016, Thomas Gleixner wrote:
> On Thu, 1 Sep 2016, Liav Rehana wrote:
> > From: Liav Rehana <liavr@mellanox.com>
> > 
> > During the calculation of the nsec variable, "delta * tkr->mult" may 
> > cause overflow to the msb, if the suspended time is too long.
> > In that case, we need to guarantee that the variable will not go 
> > through a sign extension during its shift, and thus it will result 
> > in a much higher value - close to the larget value of 64 bits.
> > The following commit fixes this problem, which causes the following bug:
> > Trying to connect through ftp to the os after a long enough 
> > suspended time will cause the nsec variable to get a much higher 
> > value after its shift because of sign extension, and thus the loop 
> > that follows some instructions afterwards, implemented in the inline 
> > function __iter_div_u64_rem, will take too long.
> > 
> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
> > ---
> >  kernel/time/timekeeping.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c 
> > index 479d25c..ddf56a5 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> >  	s64 nsec;
> >  
> >  	nsec = delta * tkr->mult + tkr->xtime_nsec;
> > -	nsec >>= tkr->shift;
> > +	nsec = ((u64) nsec) >> tkr->shift;
> 
> This typecast is just a baindaid. What happens if you double the suspend time?
> The multiplication will simply overflow. So the proper fix is to 
> sanity check delta and do multiple conversions if delta is big enough. 
> Preferrably this happens somewhere at the call site and not in this hotpath function.

As a side note. John, why is that stuff unsigned at all? Shouldn't we use
u64 for all of this?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-04  6:40     ` Liav Rehana
@ 2016-09-04  9:01       ` Thomas Gleixner
  2016-09-04 10:03         ` Liav Rehana
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2016-09-04  9:01 UTC (permalink / raw)
  To: Liav Rehana; +Cc: linux-kernel, john.stultz, Noam Camus, Elad Kanfi

On Sun, 4 Sep 2016, Liav Rehana wrote:

Please do not top post and trim your reply.

> The root of the problem is that in case the multiplication of delta and
> tkr->mult in the line that I've changed is too big that the MSB of the
> result is set, then the shift will cause an unwanted sign extension.

I completely understand that, but as I said before:

> > This typecast is just a baindaid. What happens if you double the
> > suspend time?  The multiplication will simply overflow. So the proper
> > fix is to sanity check delta and do multiple conversions if delta is
> > big enough.  Preferrably this happens somewhere at the call site and
> > not in this hotpath function.

> That sign extension will be avoided completely if the variable nsec was
> unsigned (u64 instead of s64), so I think the correct solution for this
> is to change the type of nsec to u64.

That's a different story and its not a solution for the general problem of

       delta * mult >= (1 << 31) or delta * mult >= (1 << 32)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-04  9:01       ` Thomas Gleixner
@ 2016-09-04 10:03         ` Liav Rehana
  2016-09-04 10:39           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Liav Rehana @ 2016-09-04 10:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, john.stultz, Noam Camus, Elad Kanfi

>> The root of the problem is that in case the multiplication of delta 
>> and
>> tkr->mult in the line that I've changed is too big that the MSB of the
>> result is set, then the shift will cause an unwanted sign extension.

> I completely understand that, but as I said before:

> > > This typecast is just a baindaid. What happens if you double the 
> > > suspend time?  The multiplication will simply overflow. So the 
> > > proper fix is to sanity check delta and do multiple conversions if 
> > > delta is big enough.  Preferrably this happens somewhere at the call 
> > > site and not in this hotpath function.

> > That sign extension will be avoided completely if the variable nsec 
> > was unsigned (u64 instead of s64), so I think the correct solution for 
> > this is to change the type of nsec to u64.

> That's a different story and its not a solution for the general problem of

>        delta * mult >= (1 << 31) or delta * mult >= (1 << 32)

The case that delta * mult >= 1 << 31 is not a problem by itself, but it causes
an unwanted sign extension since the type of nsec is signed. That sign
extension is what causes the loop to take too long, and not the overflow.
I understand that the typecast is not a general solution, so as I've said, I
think that changing the type of nsec to u64 instead of s64 will be a good and
general solution, as it will indeed solve the problem of the unwanted sign
extension.

To summarize: a sign extension occurs if the nsec variable is signed, and so
I ask if you think it will be a good solution to change its type to unsigned.

Thanks,
	Liav.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-04 10:03         ` Liav Rehana
@ 2016-09-04 10:39           ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2016-09-04 10:39 UTC (permalink / raw)
  To: Liav Rehana; +Cc: linux-kernel, john.stultz, Noam Camus, Elad Kanfi

On Sun, 4 Sep 2016, Liav Rehana wrote:
> >> The root of the problem is that in case the multiplication of delta 
> >> and
> >> tkr->mult in the line that I've changed is too big that the MSB of the
> >> result is set, then the shift will cause an unwanted sign extension.
> 
> > I completely understand that, but as I said before:
> 
> > > > This typecast is just a baindaid. What happens if you double the 
> > > > suspend time?  The multiplication will simply overflow. So the 
> > > > proper fix is to sanity check delta and do multiple conversions if 
> > > > delta is big enough.  Preferrably this happens somewhere at the call 
> > > > site and not in this hotpath function.
> 
> > > That sign extension will be avoided completely if the variable nsec 
> > > was unsigned (u64 instead of s64), so I think the correct solution for 
> > > this is to change the type of nsec to u64.
> 
> > That's a different story and its not a solution for the general problem of
> 
> >        delta * mult >= (1 << 31) or delta * mult >= (1 << 32)
> 
> The case that delta * mult >= 1 << 31 is not a problem by itself, but it causes
> an unwanted sign extension since the type of nsec is signed. That sign
> extension is what causes the loop to take too long, and not the overflow.
> I understand that the typecast is not a general solution, so as I've said, I
> think that changing the type of nsec to u64 instead of s64 will be a good and
> general solution, as it will indeed solve the problem of the unwanted sign
> extension.
> 
> To summarize: a sign extension occurs if the nsec variable is signed, and so
> I ask if you think it will be a good solution to change its type to unsigned.

Do you actually read what I write? I asked John before:

> John, why is that stuff signed at all? Shouldn't we use u64 for all of this?

So to summarize: 

   - Yes, we can use u64 if there is nothing which I missed, but John will
     have the last word on this

   - No, making it u64 does not solve the general problem. It just papers
     over the problem you observe. And we don't add 'paper over' fixes,
     period.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-02 18:30   ` Thomas Gleixner
  2016-09-02 18:47     ` Thomas Gleixner
  2016-09-04  6:40     ` Liav Rehana
@ 2016-09-07  3:22     ` John Stultz
  2016-09-11  6:31       ` Liav Rehana
  2016-09-21 12:52       ` Liav Rehana
  2 siblings, 2 replies; 12+ messages in thread
From: John Stultz @ 2016-09-07  3:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Liav Rehana, lkml, noamca, eladkan

On Fri, Sep 2, 2016 at 11:30 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 2 Sep 2016, Thomas Gleixner wrote:
>> On Thu, 1 Sep 2016, Liav Rehana wrote:
>> > From: Liav Rehana <liavr@mellanox.com>
>> >
>> > During the calculation of the nsec variable, "delta * tkr->mult" may cause
>> > overflow to the msb, if the suspended time is too long.
>> > In that case, we need to guarantee that the variable will not go through a
>> > sign extension during its shift, and thus it will result in a much higher
>> > value - close to the larget value of 64 bits.
>> > The following commit fixes this problem, which causes the following bug:
>> > Trying to connect through ftp to the os after a long enough suspended time
>> > will cause the nsec variable to get a much higher value after its shift
>> > because of sign extension, and thus the loop that follows some instructions
>> > afterwards, implemented in the inline function __iter_div_u64_rem, will
>> > take too long.
>> >
>> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
>> > ---
>> >  kernel/time/timekeeping.c |    2 +-
>> >  1 files changed, 1 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> > index 479d25c..ddf56a5 100644
>> > --- a/kernel/time/timekeeping.c
>> > +++ b/kernel/time/timekeeping.c
>> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>> >     s64 nsec;
>> >
>> >     nsec = delta * tkr->mult + tkr->xtime_nsec;
>> > -   nsec >>= tkr->shift;
>> > +   nsec = ((u64) nsec) >> tkr->shift;
>>
>> This typecast is just a baindaid. What happens if you double the suspend time?
>> The multiplication will simply overflow. So the proper fix is to sanity check
>> delta and do multiple conversions if delta is big enough. Preferrably this
>> happens somewhere at the call site and not in this hotpath function.
>
> As a side note. John, why is that stuff unsigned at all? Shouldn't we use
> u64 for all of this?

Errrr... My memory is quite foggy here. I think we just started way
back when with s64 to catch cases where there were negative nsec
intervals. Looking through the git logs it seems its been that way
since the beginning of the generic timekeeping logic.

For most cases here u64 is fine. There may be a few cases where we do
have valid negative nanosecond intervals, but I can't think of any off
the top of my head, and those can probably be special cased.

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-07  3:22     ` John Stultz
@ 2016-09-11  6:31       ` Liav Rehana
  2016-09-21 21:12         ` John Stultz
  2016-09-21 12:52       ` Liav Rehana
  1 sibling, 1 reply; 12+ messages in thread
From: Liav Rehana @ 2016-09-11  6:31 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner; +Cc: lkml, Noam Camus, Elad Kanfi

>> > > During the calculation of the nsec variable, "delta * tkr->mult" 
> >> > may cause overflow to the msb, if the suspended time is too long.
> >> > In that case, we need to guarantee that the variable will not go 
> >> > through a sign extension during its shift, and thus it will result 
> >> > in a much higher value - close to the larget value of 64 bits.
> >> > The following commit fixes this problem, which causes the following bug:
> >> > Trying to connect through ftp to the os after a long enough 
> >> > suspended time will cause the nsec variable to get a much higher 
> >> > value after its shift because of sign extension, and thus the loop 
> >> > that follows some instructions afterwards, implemented in the 
> >> > inline function __iter_div_u64_rem, will take too long.
> >> >
> >> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
> >> > ---
> >> >  kernel/time/timekeeping.c |    2 +-
> >> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c 
> >> > index 479d25c..ddf56a5 100644
> >> > --- a/kernel/time/timekeeping.c
> >> > +++ b/kernel/time/timekeeping.c
> >> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> >> >     s64 nsec;
> >> >
> >> >     nsec = delta * tkr->mult + tkr->xtime_nsec;
> >> > -   nsec >>= tkr->shift;
> >> > +   nsec = ((u64) nsec) >> tkr->shift;
> >>
> >> This typecast is just a baindaid. What happens if you double the suspend time?
> >> The multiplication will simply overflow. So the proper fix is to 
> >> sanity check delta and do multiple conversions if delta is big 
> >> enough. Preferrably this happens somewhere at the call site and not in this hotpath function.
> >
> > As a side note. John, why is that stuff unsigned at all? Shouldn't we 
> > use
> > u64 for all of this?
>
>Errrr... My memory is quite foggy here. I think we just started way back when with s64 to catch cases where there were negative nsec intervals. Looking through the git logs it seems its > been that way since the beginning of the generic timekeeping logic.
>
>For most cases here u64 is fine. There may be a few cases where we do have valid negative nanosecond intervals, but I can't think of any off the top of my head, and those can >probably be special cased.

In light of your comment for that issue, I would like to change the type of the nsec variable to u64, as it will solve the sign extension problem. For that, I intend to change the type
of that variable in the functions that define it, and in the struct that uses it in kernel/time/timekeeping.c.
Do you think there are other references I should change. Or do you think there is a better solution ? Please provide your opinion on this matter.

Thanks,
	Liav.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-07  3:22     ` John Stultz
  2016-09-11  6:31       ` Liav Rehana
@ 2016-09-21 12:52       ` Liav Rehana
  1 sibling, 0 replies; 12+ messages in thread
From: Liav Rehana @ 2016-09-21 12:52 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner; +Cc: lkml, Noam Camus, Elad Kanfi

PING

-----Original Message-----
From: Liav Rehana 
Sent: Sunday, September 11, 2016 9:32 AM
To: 'John Stultz' <john.stultz@linaro.org>; Thomas Gleixner <tglx@linutronix.de>
Cc: lkml <linux-kernel@vger.kernel.org>; Noam Camus <noamca@mellanox.com>; Elad Kanfi <eladkan@mellanox.com>
Subject: RE: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.

>> > > During the calculation of the nsec variable, "delta * tkr->mult" 
> >> > may cause overflow to the msb, if the suspended time is too long.
> >> > In that case, we need to guarantee that the variable will not go 
> >> > through a sign extension during its shift, and thus it will 
> >> > result in a much higher value - close to the larget value of 64 bits.
> >> > The following commit fixes this problem, which causes the following bug:
> >> > Trying to connect through ftp to the os after a long enough 
> >> > suspended time will cause the nsec variable to get a much higher 
> >> > value after its shift because of sign extension, and thus the 
> >> > loop that follows some instructions afterwards, implemented in 
> >> > the inline function __iter_div_u64_rem, will take too long.
> >> >
> >> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
> >> > ---
> >> >  kernel/time/timekeeping.c |    2 +-
> >> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/kernel/time/timekeeping.c 
> >> > b/kernel/time/timekeeping.c index 479d25c..ddf56a5 100644
> >> > --- a/kernel/time/timekeeping.c
> >> > +++ b/kernel/time/timekeeping.c
> >> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> >> >     s64 nsec;
> >> >
> >> >     nsec = delta * tkr->mult + tkr->xtime_nsec;
> >> > -   nsec >>= tkr->shift;
> >> > +   nsec = ((u64) nsec) >> tkr->shift;
> >>
> >> This typecast is just a baindaid. What happens if you double the suspend time?
> >> The multiplication will simply overflow. So the proper fix is to 
> >> sanity check delta and do multiple conversions if delta is big 
> >> enough. Preferrably this happens somewhere at the call site and not in this hotpath function.
> >
> > As a side note. John, why is that stuff unsigned at all? Shouldn't 
> > we use
> > u64 for all of this?
>
>Errrr... My memory is quite foggy here. I think we just started way back when with s64 to catch cases where there were negative nsec intervals. Looking through the git logs it seems its > been that way since the beginning of the generic timekeeping logic.
>
>For most cases here u64 is fine. There may be a few cases where we do have valid negative nanosecond intervals, but I can't think of any off the top of my head, and those can >probably be special cased.

In light of your comment for that issue, I would like to change the type of the nsec variable to u64, as it will solve the sign extension problem. For that, I intend to change the type of that variable in the functions that define it, and in the struct that uses it in kernel/time/timekeeping.c.
Do you think there are other references I should change. Or do you think there is a better solution ? Please provide your opinion on this matter.

Thanks,
	Liav.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation.
  2016-09-11  6:31       ` Liav Rehana
@ 2016-09-21 21:12         ` John Stultz
  0 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2016-09-21 21:12 UTC (permalink / raw)
  To: Liav Rehana; +Cc: Thomas Gleixner, lkml, Noam Camus, Elad Kanfi

On Sat, Sep 10, 2016 at 11:31 PM, Liav Rehana <liavr@mellanox.com> wrote:
>>> > > During the calculation of the nsec variable, "delta * tkr->mult"
>> >> > may cause overflow to the msb, if the suspended time is too long.
>> >> > In that case, we need to guarantee that the variable will not go
>> >> > through a sign extension during its shift, and thus it will result
>> >> > in a much higher value - close to the larget value of 64 bits.
>> >> > The following commit fixes this problem, which causes the following bug:
>> >> > Trying to connect through ftp to the os after a long enough
>> >> > suspended time will cause the nsec variable to get a much higher
>> >> > value after its shift because of sign extension, and thus the loop
>> >> > that follows some instructions afterwards, implemented in the
>> >> > inline function __iter_div_u64_rem, will take too long.
>> >> >
>> >> > Signed-off-by: Liav Rehana <liavr@mellanox.com>
>> >> > ---
>> >> >  kernel/time/timekeeping.c |    2 +-
>> >> >  1 files changed, 1 insertions(+), 1 deletions(-)
>> >> >
>> >> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> >> > index 479d25c..ddf56a5 100644
>> >> > --- a/kernel/time/timekeeping.c
>> >> > +++ b/kernel/time/timekeeping.c
>> >> > @@ -305,7 +305,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>> >> >     s64 nsec;
>> >> >
>> >> >     nsec = delta * tkr->mult + tkr->xtime_nsec;
>> >> > -   nsec >>= tkr->shift;
>> >> > +   nsec = ((u64) nsec) >> tkr->shift;
>> >>
>> >> This typecast is just a baindaid. What happens if you double the suspend time?
>> >> The multiplication will simply overflow. So the proper fix is to
>> >> sanity check delta and do multiple conversions if delta is big
>> >> enough. Preferrably this happens somewhere at the call site and not in this hotpath function.
>> >
>> > As a side note. John, why is that stuff unsigned at all? Shouldn't we
>> > use
>> > u64 for all of this?
>>
>>Errrr... My memory is quite foggy here. I think we just started way back when with s64 to catch cases where there were negative nsec intervals. Looking through the git logs it seems its > been that way since the beginning of the generic timekeeping logic.
>>
>>For most cases here u64 is fine. There may be a few cases where we do have valid negative nanosecond intervals, but I can't think of any off the top of my head, and those can >probably be special cased.
>
> In light of your comment for that issue, I would like to change the type of the nsec variable to u64, as it will solve the sign extension problem. For that, I intend to change the type
> of that variable in the functions that define it, and in the struct that uses it in kernel/time/timekeeping.c.
> Do you think there are other references I should change. Or do you think there is a better solution ? Please provide your opinion on this matter.

So no major objections, but when making the changes do keep an eye for
potential valid negative intervals. And if possible break it up into
smallish patches so its easier to review/audit.

But Thomas' point holds still, there are cases where if we get
intervals that are far too big, those need to be caught and called out
rather then just allowing unsigned overflow.

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-21 21:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-01 11:14 [PATCH] Fix chance of sign extension to nsec after its msb is set during calculation Liav Rehana
2016-09-02  8:49 ` Thomas Gleixner
2016-09-02 18:30   ` Thomas Gleixner
2016-09-02 18:47     ` Thomas Gleixner
2016-09-04  6:40     ` Liav Rehana
2016-09-04  9:01       ` Thomas Gleixner
2016-09-04 10:03         ` Liav Rehana
2016-09-04 10:39           ` Thomas Gleixner
2016-09-07  3:22     ` John Stultz
2016-09-11  6:31       ` Liav Rehana
2016-09-21 21:12         ` John Stultz
2016-09-21 12:52       ` Liav Rehana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.