linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/2] timers: Fix usleep_range() in the context of wake_up_process()
@ 2016-10-21 15:58 Douglas Anderson
  2016-10-21 15:58 ` [PATCH v5 2/2] timers: Fix documentation for schedule_timeout() and similar Douglas Anderson
  2016-10-26 11:22 ` [tip:timers/core] timers: Fix usleep_range() in the context of wake_up_process() tip-bot for Douglas Anderson
  0 siblings, 2 replies; 4+ messages in thread
From: Douglas Anderson @ 2016-10-21 15:58 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz
  Cc: Andreas Mohr, briannorris, huangtao, tony.xie, linux-rockchip,
	linux, heiko, broonie, djkurtz, tskd08, Douglas Anderson,
	linux-kernel

Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter.  However, nothing in any of the code
ensures this.  Specifically:

usleep_range() => do_usleep_range() => schedule_hrtimeout_range() =>
schedule_hrtimeout_range_clock() just ends up calling schedule() with an
appropriate timeout set using the hrtimer.  If someone else happens to
wake up our task then we'll happily return from usleep_range() early.

msleep() already has code to handle this case since it will loop as long
as there was still time left.  usleep_range() had no such loop.

The problem is is easily demonstrated with a small bit of test code:

  static int usleep_test_task(void *data)
  {
    atomic_t *done = data;
    ktime_t start, end;

    start = ktime_get();
    usleep_range(50000, 100000);
    end = ktime_get();
    pr_info("Requested 50000 - 100000 us.  Actually slept for %llu us\n",
      (unsigned long long)ktime_to_us(ktime_sub(end, start)));
    atomic_set(done, 1);

    return 0;
  }

  static void run_usleep_test(void)
  {
    struct task_struct *t;
    atomic_t done;

    atomic_set(&done, 0);

    t = kthread_run(usleep_test_task, &done, "usleep_test_task");
    while (!atomic_read(&done)) {
      wake_up_process(t);
      udelay(1000);
    }
    kthread_stop(t);
  }

If you run the above code without this patch you get things like:
  Requested 50000 - 100000 us.  Actually slept for 967 us

If you run the above code _with_ this patch, you get:
  Requested 50000 - 100000 us.  Actually slept for 98896 us

Presumably this problem was not detected before because:
- It's not terribly common to use wake_up_process() directly.
- Other ways for processes to wake up are not typically mixed with
  usleep_range().
- There aren't lots of places that use usleep_range(), since many people
  call either msleep() or udelay().

NOTES:
- An effort was made to look for users relying on the old behavior by
  looking for usleep_range() in the same file as wake_up_process().
  No problems was found by this search, though it is conceivable that
  someone could have put the sleep and wakeup in two different files.
- An effort was made to ask several upstream maintainers if they were
  aware of people relying on wake_up_process() to wake up
  usleep_range().  No maintainers were aware of that but they were aware
  of many people relying on usleep_range() never returning before the
  minimum.

Reported-by: Tao Huang <huangtao@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v5:
- Don't accidentally busy wait after first wakeup (Thomas Gleixner)
- Removed Reviewed-by tags

Changes in v4: None
Changes in v3:
- Add Reviewed-by tags
- Add notes about validation

Changes in v2:
- Fixed stupid bug that snuck in before posting
- Use ktime_before
- Remove delta from the loop

 kernel/time/timer.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 32bf6f75a8fe..ab65e7bcc2c2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1896,16 +1896,6 @@ unsigned long msleep_interruptible(unsigned int msecs)
 
 EXPORT_SYMBOL(msleep_interruptible);
 
-static void __sched do_usleep_range(unsigned long min, unsigned long max)
-{
-	ktime_t kmin;
-	u64 delta;
-
-	kmin = ktime_set(0, min * NSEC_PER_USEC);
-	delta = (u64)(max - min) * NSEC_PER_USEC;
-	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
-}
-
 /**
  * usleep_range - Sleep for an approximate time
  * @min: Minimum time in usecs to sleep
@@ -1919,7 +1909,15 @@ static void __sched do_usleep_range(unsigned long min, unsigned long max)
  */
 void __sched usleep_range(unsigned long min, unsigned long max)
 {
-	__set_current_state(TASK_UNINTERRUPTIBLE);
-	do_usleep_range(min, max);
+	ktime_t expires = ktime_add_us(ktime_get(), min);
+	u64 delta = (u64)(max - min) * NSEC_PER_USEC;
+
+	for (;;) {
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		/* Do not return before the requested sleep time has elapsed */
+		if (!schedule_hrtimeout_range(&expires, delta,
+					      HRTIMER_MODE_ABS))
+			break;
+	}
 }
 EXPORT_SYMBOL(usleep_range);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v5 2/2] timers: Fix documentation for schedule_timeout() and similar
  2016-10-21 15:58 [PATCH v5 1/2] timers: Fix usleep_range() in the context of wake_up_process() Douglas Anderson
@ 2016-10-21 15:58 ` Douglas Anderson
  2016-10-26 11:22   ` [tip:timers/core] " tip-bot for Douglas Anderson
  2016-10-26 11:22 ` [tip:timers/core] timers: Fix usleep_range() in the context of wake_up_process() tip-bot for Douglas Anderson
  1 sibling, 1 reply; 4+ messages in thread
From: Douglas Anderson @ 2016-10-21 15:58 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz
  Cc: Andreas Mohr, briannorris, huangtao, tony.xie, linux-rockchip,
	linux, heiko, broonie, djkurtz, tskd08, Douglas Anderson,
	linux-kernel

The documentatoin for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claimed that the routines couldn't
possibly return early if the task state was TASK_UNINTERRUPTIBLE.  This
was simply not true since anyone calling wake_up_process() would cause
those routines to exit early.

As some evidence that the documentation was broken (not the code):
- If we changed the code to match the documentation, msleep() would be
  identical to schedule_timeout_uninterruptible() and
  msleep_interruptible() would be identical to
  schedule_timeout_interruptible().  That doesn't seem likely to have
  been the intention.
- The schedule() function sleeps until a task is woken up.  Logically,
  one would expect that the schedule_timeout() function would sleep
  until a task is woken up or a timeout occurrs.

As part of the above observation, it can be seen that
schedule_hrtimeout() and schedule_hrtimeout_range() might return -EINTR
even if the task state was TASK_UNINTERRUPTIBLE.  This isn't terrible
behavior so we'll document it and keep it as-is.  After all, trying to
match schedule_timeout() and return the time left would incure a bunch
of extra calculation cost that isn't needed in all cases.

Suggested-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v5: None
Changes in v4:
- Fixed stray double quotes.
- Updated wording as per Thomas Gleixner.

Changes in v3:
- Documentation fix new for v3.

 kernel/time/hrtimer.c | 20 ++++++++++++++------
 kernel/time/timer.c   | 11 +++++++----
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index bb5ec425dfe0..08be5c99d26b 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1742,15 +1742,19 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
- * pass before the routine returns.
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process()).
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task.
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
  *
- * Returns 0 when the timer has expired otherwise -EINTR
+ * Returns 0 when the timer has expired. If the task was woken before the
+ * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
+ * by an explicit wakeup, it returns -EINTR.
  */
 int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta,
 				     const enum hrtimer_mode mode)
@@ -1772,15 +1776,19 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range);
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
- * pass before the routine returns.
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process()).
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task.
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
  *
- * Returns 0 when the timer has expired otherwise -EINTR
+ * Returns 0 when the timer has expired. If the task was woken before the
+ * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
+ * by an explicit wakeup, it returns -EINTR.
  */
 int __sched schedule_hrtimeout(ktime_t *expires,
 			       const enum hrtimer_mode mode)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ab65e7bcc2c2..330214a19beb 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1691,11 +1691,12 @@ static void process_timeout(unsigned long __data)
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
- * pass before the routine returns. The routine will return 0
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process())".
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task. In this case the remaining time
- * in jiffies will be returned, or 0 if the timer expired in time
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
@@ -1704,7 +1705,9 @@ static void process_timeout(unsigned long __data)
  * the CPU away without a bound on the timeout. In this case the return
  * value will be %MAX_SCHEDULE_TIMEOUT.
  *
- * In all cases the return value is guaranteed to be non-negative.
+ * Returns 0 when the timer has expired otherwise the remaining time in
+ * jiffies will be returned.  In all cases the return value is guaranteed
+ * to be non-negative.
  */
 signed long __sched schedule_timeout(signed long timeout)
 {
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip:timers/core] timers: Fix usleep_range() in the context of wake_up_process()
  2016-10-21 15:58 [PATCH v5 1/2] timers: Fix usleep_range() in the context of wake_up_process() Douglas Anderson
  2016-10-21 15:58 ` [PATCH v5 2/2] timers: Fix documentation for schedule_timeout() and similar Douglas Anderson
@ 2016-10-26 11:22 ` tip-bot for Douglas Anderson
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Douglas Anderson @ 2016-10-26 11:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, mingo, john.stultz, linux-kernel, dianders, andi, huangtao

Commit-ID:  6c5e9059692567740a4ee51530dffe51a4b9584d
Gitweb:     http://git.kernel.org/tip/6c5e9059692567740a4ee51530dffe51a4b9584d
Author:     Douglas Anderson <dianders@chromium.org>
AuthorDate: Fri, 21 Oct 2016 08:58:50 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 13:14:46 +0200

timers: Fix usleep_range() in the context of wake_up_process()

Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter. However, nothing in the code ensures
this, when the sleeping task is woken by wake_up_process() or any other
mechanism which can wake a task from uninterruptible state.

Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
against wakeups. schedule_hrtimeout_range*() is designed this way despite
the fact that the API documentation does not mention it.

msleep() already has code to handle this case since it will loop as long
as there was still time left.  usleep_range() has no such loop, add it.

Presumably this problem was not detected before because usleep_range() is
only used in a few places and the function is mostly used in contexts which
are not exposed to wakeups of any form.

An effort was made to look for users relying on the old behavior by
looking for usleep_range() in the same file as wake_up_process().
No problems were found by this search, though it is conceivable that
someone could have put the sleep and wakeup in two different files.

An effort was made to ask several upstream maintainers if they were aware
of people relying on wake_up_process() to wake up usleep_range(). No
maintainers were aware of that but they were aware of many people relying
on usleep_range() never returning before the minimum.

Reported-by: Tao Huang <huangtao@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: djkurtz@chromium.org
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d47980..12681c9 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1896,16 +1896,6 @@ unsigned long msleep_interruptible(unsigned int msecs)
 
 EXPORT_SYMBOL(msleep_interruptible);
 
-static void __sched do_usleep_range(unsigned long min, unsigned long max)
-{
-	ktime_t kmin;
-	u64 delta;
-
-	kmin = ktime_set(0, min * NSEC_PER_USEC);
-	delta = (u64)(max - min) * NSEC_PER_USEC;
-	schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
-}
-
 /**
  * usleep_range - Sleep for an approximate time
  * @min: Minimum time in usecs to sleep
@@ -1919,7 +1909,14 @@ static void __sched do_usleep_range(unsigned long min, unsigned long max)
  */
 void __sched usleep_range(unsigned long min, unsigned long max)
 {
-	__set_current_state(TASK_UNINTERRUPTIBLE);
-	do_usleep_range(min, max);
+	ktime_t exp = ktime_add_us(ktime_get(), min);
+	u64 delta = (u64)(max - min) * NSEC_PER_USEC;
+
+	for (;;) {
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		/* Do not return before the requested sleep time has elapsed */
+		if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
+			break;
+	}
 }
 EXPORT_SYMBOL(usleep_range);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip:timers/core] timers: Fix documentation for schedule_timeout() and similar
  2016-10-21 15:58 ` [PATCH v5 2/2] timers: Fix documentation for schedule_timeout() and similar Douglas Anderson
@ 2016-10-26 11:22   ` tip-bot for Douglas Anderson
  0 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Douglas Anderson @ 2016-10-26 11:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: djkurtz, mingo, dianders, linux-kernel, john.stultz, tglx, hpa, andi

Commit-ID:  4b7e9cf9c84b09adc428e0433cd376b91f9c52a7
Gitweb:     http://git.kernel.org/tip/4b7e9cf9c84b09adc428e0433cd376b91f9c52a7
Author:     Douglas Anderson <dianders@chromium.org>
AuthorDate: Fri, 21 Oct 2016 08:58:51 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 13:14:47 +0200

timers: Fix documentation for schedule_timeout() and similar

The documentation for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claim that the routines couldn't possibly
return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
not true since wake_up_process() will cause those routines to exit early.

We cannot make schedule_[hr]timeout() loop until the timeout expires if the
task state is uninterruptible because we have users which rely on the
existing and designed behaviour.

Make the documentation match the (correct) implementation.

schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
woken up. This might look strange, but making the return code depend on the
state is too much of an effort as it would affect all the call sites. There
is no value in doing so, but we spell it out clearly in the documentation.

Suggested-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: huangtao@rock-chips.com
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/hrtimer.c | 20 ++++++++++++++------
 kernel/time/timer.c   | 11 +++++++----
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index bb5ec42..08be5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1742,15 +1742,19 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
- * pass before the routine returns.
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process()).
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task.
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
  *
- * Returns 0 when the timer has expired otherwise -EINTR
+ * Returns 0 when the timer has expired. If the task was woken before the
+ * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
+ * by an explicit wakeup, it returns -EINTR.
  */
 int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta,
 				     const enum hrtimer_mode mode)
@@ -1772,15 +1776,19 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range);
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to
- * pass before the routine returns.
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process()).
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task.
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
  *
- * Returns 0 when the timer has expired otherwise -EINTR
+ * Returns 0 when the timer has expired. If the task was woken before the
+ * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or
+ * by an explicit wakeup, it returns -EINTR.
  */
 int __sched schedule_hrtimeout(ktime_t *expires,
 			       const enum hrtimer_mode mode)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 12681c9..88aab86 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1691,11 +1691,12 @@ static void process_timeout(unsigned long __data)
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
- * pass before the routine returns. The routine will return 0
+ * pass before the routine returns unless the current task is explicitly
+ * woken up, (e.g. by wake_up_process())".
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task. In this case the remaining time
- * in jiffies will be returned, or 0 if the timer expired in time
+ * delivered to the current task or the current task is explicitly woken
+ * up.
  *
  * The current task state is guaranteed to be TASK_RUNNING when this
  * routine returns.
@@ -1704,7 +1705,9 @@ static void process_timeout(unsigned long __data)
  * the CPU away without a bound on the timeout. In this case the return
  * value will be %MAX_SCHEDULE_TIMEOUT.
  *
- * In all cases the return value is guaranteed to be non-negative.
+ * Returns 0 when the timer has expired otherwise the remaining time in
+ * jiffies will be returned.  In all cases the return value is guaranteed
+ * to be non-negative.
  */
 signed long __sched schedule_timeout(signed long timeout)
 {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-26 11:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 15:58 [PATCH v5 1/2] timers: Fix usleep_range() in the context of wake_up_process() Douglas Anderson
2016-10-21 15:58 ` [PATCH v5 2/2] timers: Fix documentation for schedule_timeout() and similar Douglas Anderson
2016-10-26 11:22   ` [tip:timers/core] " tip-bot for Douglas Anderson
2016-10-26 11:22 ` [tip:timers/core] timers: Fix usleep_range() in the context of wake_up_process() tip-bot for Douglas Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).