All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
@ 2016-10-10 18:47 ` Douglas Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Douglas Anderson @ 2016-10-10 18:47 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz
  Cc: briannorris, huangtao, tony.xie, linux-rockchip,
	Douglas Anderson, linux-kernel

Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter.  However, nothing in any of the code
ensures this.  Specifically:

usleep_range() => do_usleep_range() => schedule_hrtimeout_range() =>
schedule_hrtimeout_range_clock() just ends up calling schedule() with an
appropriate timeout set using the hrtimer.  If someone else happens to
wake up our task then we'll happily return from usleep_range() early.

msleep() already has code to handle this case since it will loop as long
as there was still time left.  usleep_range() had no such loop.

The problem is is easily demonstrated with a small bit of test code:

  static int usleep_test_task(void *data)
  {
    atomic_t *done = data;
    ktime_t start, end;

    start = ktime_get();
    usleep_range(50000, 100000);
    end = ktime_get();
    pr_info("Requested 50000 - 100000 us.  Actually slept for %llu us\n",
      (unsigned long long)ktime_to_us(ktime_sub(end, start)));
    atomic_set(done, 1);

    return 0;
  }

  static void run_usleep_test(void)
  {
    struct task_struct *t;
    atomic_t done;

    atomic_set(&done, 0);

    t = kthread_run(usleep_test_task, &done, "usleep_test_task");
    while (!atomic_read(&done)) {
      wake_up_process(t);
      udelay(1000);
    }
    kthread_stop(t);
  }

If you run the above code without this patch you get things like:
  Requested 50000 - 100000 us.  Actually slept for 967 us

If you run the above code _with_ this patch, you get:
  Requested 50000 - 100000 us.  Actually slept for 50001 us

Presumably this problem was not detected before because:
- It's not terribly common to use wake_up_process() directly.
- Other ways for processes to wake up are not typically mixed with
  usleep_range().
- There aren't lotsof places that use usleep_range(), since many people
  call either msleep() or udelay().

Reported-by: Tao Huang <huangtao@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 kernel/time/timer.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 32bf6f75a8fe..ab03c7e403a4 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
 
 static void __sched do_usleep_range(unsigned long min, unsigned long max)
 {
+	ktime_t start, end;
 	ktime_t kmin;
 	u64 delta;
+	unsigned long elapsed = 0;
+	int ret;
+
+	start = ktime_get();
+	do {
+		kmin = ktime_set(0, min * NSEC_PER_USEC);
+		delta = (u64)(max - min) * NSEC_PER_USEC;
+		ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
+
+		/*
+		 * If schedule_hrtimeout_range() returns 0 then we actually
+		 * hit the timeout. If not then we need to re-calculate the
+		 * new timeout ourselves.
+		 */
+		if (ret == 0)
+			break;
 
-	kmin = ktime_set(0, min * NSEC_PER_USEC);
-	delta = (u64)(max - min) * NSEC_PER_USEC;
-	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
+		end = ktime_get();
+		elapsed = ktime_to_us(ktime_sub(end, start));
+	} while (elapsed < min);
 }
 
 /**
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
@ 2016-10-10 18:47 ` Douglas Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Douglas Anderson @ 2016-10-10 18:47 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz
  Cc: huangtao-TNX95d0MmH7DzftRWevZcw,
	briannorris-F7+t8E8rja9g9hUCZPvPmw, Douglas Anderson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tony.xie-TNX95d0MmH7DzftRWevZcw

Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter.  However, nothing in any of the code
ensures this.  Specifically:

usleep_range() => do_usleep_range() => schedule_hrtimeout_range() =>
schedule_hrtimeout_range_clock() just ends up calling schedule() with an
appropriate timeout set using the hrtimer.  If someone else happens to
wake up our task then we'll happily return from usleep_range() early.

msleep() already has code to handle this case since it will loop as long
as there was still time left.  usleep_range() had no such loop.

The problem is is easily demonstrated with a small bit of test code:

  static int usleep_test_task(void *data)
  {
    atomic_t *done = data;
    ktime_t start, end;

    start = ktime_get();
    usleep_range(50000, 100000);
    end = ktime_get();
    pr_info("Requested 50000 - 100000 us.  Actually slept for %llu us\n",
      (unsigned long long)ktime_to_us(ktime_sub(end, start)));
    atomic_set(done, 1);

    return 0;
  }

  static void run_usleep_test(void)
  {
    struct task_struct *t;
    atomic_t done;

    atomic_set(&done, 0);

    t = kthread_run(usleep_test_task, &done, "usleep_test_task");
    while (!atomic_read(&done)) {
      wake_up_process(t);
      udelay(1000);
    }
    kthread_stop(t);
  }

If you run the above code without this patch you get things like:
  Requested 50000 - 100000 us.  Actually slept for 967 us

If you run the above code _with_ this patch, you get:
  Requested 50000 - 100000 us.  Actually slept for 50001 us

Presumably this problem was not detected before because:
- It's not terribly common to use wake_up_process() directly.
- Other ways for processes to wake up are not typically mixed with
  usleep_range().
- There aren't lotsof places that use usleep_range(), since many people
  call either msleep() or udelay().

Reported-by: Tao Huang <huangtao-TNX95d0MmH7DzftRWevZcw@public.gmane.org>
Signed-off-by: Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
---
 kernel/time/timer.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 32bf6f75a8fe..ab03c7e403a4 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
 
 static void __sched do_usleep_range(unsigned long min, unsigned long max)
 {
+	ktime_t start, end;
 	ktime_t kmin;
 	u64 delta;
+	unsigned long elapsed = 0;
+	int ret;
+
+	start = ktime_get();
+	do {
+		kmin = ktime_set(0, min * NSEC_PER_USEC);
+		delta = (u64)(max - min) * NSEC_PER_USEC;
+		ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
+
+		/*
+		 * If schedule_hrtimeout_range() returns 0 then we actually
+		 * hit the timeout. If not then we need to re-calculate the
+		 * new timeout ourselves.
+		 */
+		if (ret == 0)
+			break;
 
-	kmin = ktime_set(0, min * NSEC_PER_USEC);
-	delta = (u64)(max - min) * NSEC_PER_USEC;
-	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
+		end = ktime_get();
+		elapsed = ktime_to_us(ktime_sub(end, start));
+	} while (elapsed < min);
 }
 
 /**
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 18:47 ` Douglas Anderson
  (?)
@ 2016-10-10 19:53 ` Andreas Mohr
  2016-10-10 21:16   ` Doug Anderson
  -1 siblings, 1 reply; 9+ messages in thread
From: Andreas Mohr @ 2016-10-10 19:53 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Thomas Gleixner, John Stultz, briannorris, huangtao, tony.xie,
	linux-rockchip, linux-kernel

Hi,

On Mon, Oct 10, 2016 at 11:47:57AM -0700, Douglas Anderson wrote:
> Users of usleep_range() expect that it will _never_ return in less time
> than the minimum passed parameter.  However, nothing in any of the code
> ensures this.  Specifically:

Ouch (aka: good catch! - by the reporter, though... ;)


> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 32bf6f75a8fe..ab03c7e403a4 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
>  
>  static void __sched do_usleep_range(unsigned long min, unsigned long max)
>  {
> +	ktime_t start, end;
>  	ktime_t kmin;
>  	u64 delta;
> +	unsigned long elapsed = 0;
> +	int ret;
> +
> +	start = ktime_get();
> +	do {
> +		kmin = ktime_set(0, min * NSEC_PER_USEC);
> +		delta = (u64)(max - min) * NSEC_PER_USEC;
> +		ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
> +
> +		/*
> +		 * If schedule_hrtimeout_range() returns 0 then we actually
> +		 * hit the timeout. If not then we need to re-calculate the
> +		 * new timeout ourselves.
> +		 */
> +		if (ret == 0)
> +			break;
>  
> -	kmin = ktime_set(0, min * NSEC_PER_USEC);
> -	delta = (u64)(max - min) * NSEC_PER_USEC;
> -	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
> +		end = ktime_get();
> +		elapsed = ktime_to_us(ktime_sub(end, start));
> +	} while (elapsed < min);

Some thoughts:
- max >= min pre-condition is validated somewhere, I'd hope?
  (somewhere in outer API frame?)
> +		delta = (u64)(max - min) * NSEC_PER_USEC;

- there is a domain transition in there, **within** the repeated loop:
> +		elapsed = ktime_to_us(ktime_sub(end, start));
  -->
> +	} while (elapsed < min);
  should likely be made to be able to
  directly compare a *non-converted* "elapsed" value
  (IIRC there's an API such as ktime_before()).
  One could argue that the "repeat" case is the non-likely case
  (thus the conversion should possibly not be done before
  actually being required),
  however I guess it's exactly hitting the non-likely cases
  which will contribute towards
  non-predictable, non-deterministic latency behaviour of
  a code path
  (think RT requirements)

Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 18:47 ` Douglas Anderson
  (?)
  (?)
@ 2016-10-10 20:04 ` Brian Norris
  2016-10-10 20:12     ` Doug Anderson
  2016-10-10 20:47   ` Andreas Mohr
  -1 siblings, 2 replies; 9+ messages in thread
From: Brian Norris @ 2016-10-10 20:04 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Thomas Gleixner, John Stultz, huangtao, tony.xie, linux-rockchip,
	linux-kernel, Andreas Mohr

Hi Doug,

On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote:
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 32bf6f75a8fe..ab03c7e403a4 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
>  
>  static void __sched do_usleep_range(unsigned long min, unsigned long max)
>  {
> +	ktime_t start, end;
>  	ktime_t kmin;
>  	u64 delta;
> +	unsigned long elapsed = 0;
> +	int ret;
> +
> +	start = ktime_get();
> +	do {
> +		kmin = ktime_set(0, min * NSEC_PER_USEC);

I believe 'min' is unmodified throughout, and therefore 'kmin' is
computed to be the same minimum timeout in each loop. Shouldn't this be
decreasing on each iteration of the loop? (i.e., either your compute
'kmin' differently here, or you recompute 'min' based on the elapsed
time?)

> +		delta = (u64)(max - min) * NSEC_PER_USEC;
> +		ret = schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
> +
> +		/*
> +		 * If schedule_hrtimeout_range() returns 0 then we actually
> +		 * hit the timeout. If not then we need to re-calculate the
> +		 * new timeout ourselves.
> +		 */
> +		if (ret == 0)
> +			break;
>  
> -	kmin = ktime_set(0, min * NSEC_PER_USEC);
> -	delta = (u64)(max - min) * NSEC_PER_USEC;
> -	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
> +		end = ktime_get();
> +		elapsed = ktime_to_us(ktime_sub(end, start));
> +	} while (elapsed < min);

I think Andreas had similar comments, but it seemed to me like
ktime_before() might be nicer. But (as Andreas did) you might get into
(premature?) micro-optimizations down that path, and I'm certainly not
an expert there.

>  }
>  
>  /**

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 20:04 ` Brian Norris
@ 2016-10-10 20:12     ` Doug Anderson
  2016-10-10 20:47   ` Andreas Mohr
  1 sibling, 0 replies; 9+ messages in thread
From: Doug Anderson @ 2016-10-10 20:12 UTC (permalink / raw)
  To: Brian Norris
  Cc: Thomas Gleixner, John Stultz, Tao Huang, Tony Xie,
	open list:ARM/Rockchip SoC...,
	linux-kernel, Andreas Mohr

Hi,

On Mon, Oct 10, 2016 at 1:04 PM, Brian Norris <briannorris@chromium.org> wrote:
> Hi Doug,
>
> On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote:
>> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
>> index 32bf6f75a8fe..ab03c7e403a4 100644
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
>>
>>  static void __sched do_usleep_range(unsigned long min, unsigned long max)
>>  {
>> +     ktime_t start, end;
>>       ktime_t kmin;
>>       u64 delta;
>> +     unsigned long elapsed = 0;
>> +     int ret;
>> +
>> +     start = ktime_get();
>> +     do {
>> +             kmin = ktime_set(0, min * NSEC_PER_USEC);
>
> I believe 'min' is unmodified throughout, and therefore 'kmin' is
> computed to be the same minimum timeout in each loop. Shouldn't this be
> decreasing on each iteration of the loop? (i.e., either your compute
> 'kmin' differently here, or you recompute 'min' based on the elapsed
> time?)

Yes, I stupidly changed something at the last second and then didn't
test again after my stupid change.  Fix coming soon with all comments
addressed.  Sorry for posting broken code.  :( :( :(

-Doug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
@ 2016-10-10 20:12     ` Doug Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Doug Anderson @ 2016-10-10 20:12 UTC (permalink / raw)
  To: Brian Norris
  Cc: Tao Huang, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andreas Mohr,
	open list:ARM/Rockchip SoC...,
	Tony Xie, John Stultz, Thomas Gleixner

Hi,

On Mon, Oct 10, 2016 at 1:04 PM, Brian Norris <briannorris-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> Hi Doug,
>
> On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote:
>> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
>> index 32bf6f75a8fe..ab03c7e403a4 100644
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -1898,12 +1898,29 @@ EXPORT_SYMBOL(msleep_interruptible);
>>
>>  static void __sched do_usleep_range(unsigned long min, unsigned long max)
>>  {
>> +     ktime_t start, end;
>>       ktime_t kmin;
>>       u64 delta;
>> +     unsigned long elapsed = 0;
>> +     int ret;
>> +
>> +     start = ktime_get();
>> +     do {
>> +             kmin = ktime_set(0, min * NSEC_PER_USEC);
>
> I believe 'min' is unmodified throughout, and therefore 'kmin' is
> computed to be the same minimum timeout in each loop. Shouldn't this be
> decreasing on each iteration of the loop? (i.e., either your compute
> 'kmin' differently here, or you recompute 'min' based on the elapsed
> time?)

Yes, I stupidly changed something at the last second and then didn't
test again after my stupid change.  Fix coming soon with all comments
addressed.  Sorry for posting broken code.  :( :( :(

-Doug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 20:12     ` Doug Anderson
  (?)
@ 2016-10-10 20:42     ` Andreas Mohr
  -1 siblings, 0 replies; 9+ messages in thread
From: Andreas Mohr @ 2016-10-10 20:42 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Brian Norris, Thomas Gleixner, John Stultz, Tao Huang, Tony Xie,
	open list:ARM/Rockchip SoC...,
	linux-kernel, Andreas Mohr

On Mon, Oct 10, 2016 at 01:12:39PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Oct 10, 2016 at 1:04 PM, Brian Norris <briannorris@chromium.org> wrote:
> > I believe 'min' is unmodified throughout, and therefore 'kmin' is
> > computed to be the same minimum timeout in each loop. Shouldn't this be
> > decreasing on each iteration of the loop? (i.e., either your compute
> > 'kmin' differently here, or you recompute 'min' based on the elapsed
> > time?)
> 
> Yes, I stupidly changed something at the last second and then didn't
> test again after my stupid change.  Fix coming soon with all comments
> addressed.  Sorry for posting broken code.  :( :( :(

With a loop style that is actively re-calculating things,
such implementations should then not fall into the trap of
basing the "next" value on "current" time,
thereby bogusly accumulating scheduling-based delays
with each new loop iteration etc.
(i.e., things should still be based on hard, precise termination according to
an *initially* calculated, *absolute*, *minimum* expiry time).

Andreas Mohr

-- 
GNU/Linux. It's not the software that's free, it's you.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 20:04 ` Brian Norris
  2016-10-10 20:12     ` Doug Anderson
@ 2016-10-10 20:47   ` Andreas Mohr
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Mohr @ 2016-10-10 20:47 UTC (permalink / raw)
  To: Brian Norris
  Cc: Douglas Anderson, Thomas Gleixner, John Stultz, huangtao,
	tony.xie, linux-rockchip, linux-kernel, Andreas Mohr

Hi,

On Mon, Oct 10, 2016 at 01:04:01PM -0700, Brian Norris wrote:
> Hi Doug,
> 
> On Mon, Oct 10, 2016 at 11:47:57AM -0700, Doug Anderson wrote:
> > +		end = ktime_get();
> > +		elapsed = ktime_to_us(ktime_sub(end, start));
> > +	} while (elapsed < min);
> 
> I think Andreas had similar comments, but it seemed to me like
> ktime_before() might be nicer. But (as Andreas did) you might get into
> (premature?) micro-optimizations down that path, and I'm certainly not
> an expert there.

I'd justify it this way:
orientation/type of input data is rather irrelevant -
all that matters is how you want to treat things in your *inner handling* -
and if that happens to be ktime-based
(as is usually - if not pretty much always - the case within kernel code),
then it ought to be that way,
*consistently* (to avoid conversion transition overhead).

Andreas Mohr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-10 19:53 ` Andreas Mohr
@ 2016-10-10 21:16   ` Doug Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Doug Anderson @ 2016-10-10 21:16 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Thomas Gleixner, John Stultz, Brian Norris, Tao Huang, Tony Xie,
	open list:ARM/Rockchip SoC...,
	linux-kernel

Hi,

On Mon, Oct 10, 2016 at 12:53 PM, Andreas Mohr <andi@lisas.de> wrote:
> Some thoughts:
> - max >= min pre-condition is validated somewhere, I'd hope?
>   (somewhere in outer API frame?)

I don't think this is validated, but it's not a new problem.  My patch
doesn't attempt to solve this.  A future patch could, though.  You can
speculate on whether a WARN_ON would be better or some type of
friendly warning.

-Doug

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-10 21:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10 18:47 [PATCH] timers: Fix usleep_range() in the context of wake_up_process() Douglas Anderson
2016-10-10 18:47 ` Douglas Anderson
2016-10-10 19:53 ` Andreas Mohr
2016-10-10 21:16   ` Doug Anderson
2016-10-10 20:04 ` Brian Norris
2016-10-10 20:12   ` Doug Anderson
2016-10-10 20:12     ` Doug Anderson
2016-10-10 20:42     ` Andreas Mohr
2016-10-10 20:47   ` Andreas Mohr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.