linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] wait_for_completion_timeout() considered harmful.
@ 2013-11-16 21:06 NeilBrown
  2013-11-18 23:27 ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-16 21:06 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Andrew Morton
  Cc: lkml, Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 7084 bytes --]


It would be reasonable to assume that

      wait_for_completion_timeout(&wm8350->auxadc_done, msecs_to_jiffies(5));

would wait at least 5 msecs for the auxadc_done to complete.  But it does not.
With a HZ of 200 or less, msecs_to_jiffies(5) has value '1', and so this
will only wait until the next "timer tick", which could happen immediately.

This can lead to incorrect results - and has done so in out-of-tree patches
for drivers/misc/bmp085.c which uses a very similar construct to enable interrupt
based result collection.

The documentation for several *_timeout* functions claim they will wait for
"timeout jiffies" to have elapsed where this is not the case.  They will
actually wait for "timeout" jiffies to have started implying an elapsed time
between (timeout-1) and (timeout).

This patch corrects some of this documentation, and adds a collection of
  wait_for_completion*_msecs()
interfaces which wait at least the given number of milliseconds for the
completion (or a signal).

If accepted, we can the see which of the current:
   wait_for_completion_timeout-*(..., msecs_to_jiffies(XXX))
could be converted.

Reported-by: "Dr. H. Nikolaus Schaller" <hns@goldelico.com>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae3af43..efe92eaf1c45 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,7 @@
  */
 
 #include <linux/wait.h>
+#include <linux/jiffies.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -106,4 +107,33 @@ extern bool completion_done(struct completion *x);
 extern void complete(struct completion *);
 extern void complete_all(struct completion *);
 
+/* Following wrappers will wait at least 'msecs' milliseconds
+ * unless completion (or signal) happens before then.
+ */
+static inline unsigned long wait_for_completion_msecs(
+	struct completion *x, unsigned long msecs)
+{
+	return wait_for_completion_timeout(x, 1 + msecs_to_jiffies(msecs));
+}
+
+static inline unsigned long wait_for_completion_io_msecs(
+	struct completion *x, unsigned long msecs)
+{
+	return wait_for_completion_io_timeout(x, 1 + msecs_to_jiffies(msecs));
+}
+
+static inline long wait_for_completion_interruptible_msecs(
+	struct completion *x, unsigned long msecs)
+{
+	return wait_for_completion_interruptible_timeout(
+		x, 1 + msecs_to_jiffies(msecs));
+}
+
+static inline long wait_for_completion_killable_msecs(
+	struct completion *x, unsigned long msecs)
+{
+	return wait_for_completion_killable_timeout(
+		x, 1 + msecs_to_jiffies(msecs));
+}
+
 #endif
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index a63f4dc27909..b51902373cc0 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -126,12 +126,15 @@ EXPORT_SYMBOL(wait_for_completion);
 /**
  * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
- * @timeout:  timeout value in jiffies
+ * @timeout:  timeout value in "jiffie starts"
  *
  * This waits for either a completion of a specific task to be signaled or for a
  * specified timeout to expire. The timeout is in jiffies. It is not
  * interruptible.
  *
+ * The timeout is in "jiffie starts".  It could take between (timeout-1)
+ * and (timeout) jiffies for that many to start.
+ *
  * Return: 0 if timed out, and positive (at least 1, or number of jiffies left
  * till timeout) if completed.
  */
@@ -159,10 +162,11 @@ EXPORT_SYMBOL(wait_for_completion_io);
 /**
  * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
- * @timeout:  timeout value in jiffies
+ * @timeout:  timeout value in "jiffie starts"
  *
  * This waits for either a completion of a specific task to be signaled or for a
- * specified timeout to expire. The timeout is in jiffies. It is not
+ * specified timeout to expire. The timeout is in "jiffie starts" which implies
+ * a duration between (timeout-1) and (timeout) jiffies. It is not
  * interruptible. The caller is accounted as waiting for IO.
  *
  * Return: 0 if timed out, and positive (at least 1, or number of jiffies left
@@ -196,10 +200,12 @@ EXPORT_SYMBOL(wait_for_completion_interruptible);
 /**
  * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
  * @x:  holds the state of this particular completion
- * @timeout:  timeout value in jiffies
+ * @timeout:  timeout value in "jiffie starts"
  *
  * This waits for either a completion of a specific task to be signaled or for a
- * specified timeout to expire. It is interruptible. The timeout is in jiffies.
+ * specified timeout to expire. It is interruptible. The timeout is in
+ * "jiffie starts" which implies a duration between (timeout-1) and
+ * (timeout) jiffies.
  *
  * Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
  * or number of jiffies left till timeout) if completed.
@@ -233,11 +239,12 @@ EXPORT_SYMBOL(wait_for_completion_killable);
 /**
  * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
- * @timeout:  timeout value in jiffies
+ * @timeout:  timeout value in "jiffie starts"
  *
  * This waits for either a completion of a specific task to be
  * signaled or for a specified timeout to expire. It can be
- * interrupted by a kill signal. The timeout is in jiffies.
+ * interrupted by a kill signal. The timeout is in "jiffie starts"
+ * which implies a duration between (timeout-1) and (timeout) jiffies.
  *
  * Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
  * or number of jiffies left till timeout) if completed.
diff --git a/kernel/timer.c b/kernel/timer.c
index 6582b82fa966..bf8ed6adb140 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1405,16 +1405,17 @@ static void process_timeout(unsigned long __data)
 
 /**
  * schedule_timeout - sleep until timeout
- * @timeout: timeout value in jiffies
+ * @timeout: timeout value in "jiffie starts"
  *
- * Make the current task sleep until @timeout jiffies have
- * elapsed. The routine will return immediately unless
+ * Make the current task sleep until the start of the "timeout"
+ * jiffie from now.  This can take between (timeout-1) and (timeout)
+ * jiffies to occur. The routine will return immediately unless
  * the current task state has been set (see set_current_state()).
  *
  * You can set the task state as follows -
  *
  * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
- * pass before the routine returns. The routine will return 0
+ * have started before the routine returns. The routine will return 0
  *
  * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
  * delivered to the current task. In this case the remaining time

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-16 21:06 [PATCH/RFC] wait_for_completion_timeout() considered harmful NeilBrown
@ 2013-11-18 23:27 ` Andrew Morton
  2013-11-18 23:42   ` Peter Zijlstra
  2013-11-18 23:44   ` NeilBrown
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2013-11-18 23:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

On Sun, 17 Nov 2013 08:06:03 +1100 NeilBrown <neilb@suse.de> wrote:

> It would be reasonable to assume that
> 
>       wait_for_completion_timeout(&wm8350->auxadc_done, msecs_to_jiffies(5));
> 
> would wait at least 5 msecs for the auxadc_done to complete.  But it does not.
> With a HZ of 200 or less, msecs_to_jiffies(5) has value '1', and so this
> will only wait until the next "timer tick", which could happen immediately.
> 
> This can lead to incorrect results - and has done so in out-of-tree patches
> for drivers/misc/bmp085.c which uses a very similar construct to enable interrupt
> based result collection.
> 
> The documentation for several *_timeout* functions claim they will wait for
> "timeout jiffies" to have elapsed where this is not the case.  They will
> actually wait for "timeout" jiffies to have started implying an elapsed time
> between (timeout-1) and (timeout).
> 
> This patch corrects some of this documentation, and adds a collection of
>   wait_for_completion*_msecs()
> interfaces which wait at least the given number of milliseconds for the
> completion (or a signal).

Mutter.  wait_for_x(..., 5ms) should wait for a minimum of 5ms, no matter
what.

So I'd suggest we make that happen, rather than adding some new interfaces?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-18 23:27 ` Andrew Morton
@ 2013-11-18 23:42   ` Peter Zijlstra
  2013-11-19  0:49     ` Jonathan Corbet
  2013-11-18 23:44   ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2013-11-18 23:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: NeilBrown, Ingo Molnar, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

On Mon, Nov 18, 2013 at 03:27:46PM -0800, Andrew Morton wrote:
> On Sun, 17 Nov 2013 08:06:03 +1100 NeilBrown <neilb@suse.de> wrote:
> 
> > It would be reasonable to assume that
> > 
> >       wait_for_completion_timeout(&wm8350->auxadc_done, msecs_to_jiffies(5));
> > 
> > would wait at least 5 msecs for the auxadc_done to complete.  But it does not.
> > With a HZ of 200 or less, msecs_to_jiffies(5) has value '1', and so this
> > will only wait until the next "timer tick", which could happen immediately.
> > 
> > This can lead to incorrect results - and has done so in out-of-tree patches
> > for drivers/misc/bmp085.c which uses a very similar construct to enable interrupt
> > based result collection.
> > 
> > The documentation for several *_timeout* functions claim they will wait for
> > "timeout jiffies" to have elapsed where this is not the case.  They will
> > actually wait for "timeout" jiffies to have started implying an elapsed time
> > between (timeout-1) and (timeout).
> > 
> > This patch corrects some of this documentation, and adds a collection of
> >   wait_for_completion*_msecs()
> > interfaces which wait at least the given number of milliseconds for the
> > completion (or a signal).
> 
> Mutter.  wait_for_x(..., 5ms) should wait for a minimum of 5ms, no matter
> what.
> 
> So I'd suggest we make that happen, rather than adding some new interfaces?

Yeah, also the completion interface is just one of many that's buggered
this way.

I briefly talked to Thomas about this earlier today and we need to fix
this at a lower level -- the quick 'n dirty solution is to add 1 jiffy
down in the timer-wheel when we enqueue these things.

And yes, we very much don't want to create new interfaces with similar
but slightly different semantics, that's just asking for trouble.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-18 23:27 ` Andrew Morton
  2013-11-18 23:42   ` Peter Zijlstra
@ 2013-11-18 23:44   ` NeilBrown
  2013-11-19  8:25     ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-18 23:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 2534 bytes --]

On Mon, 18 Nov 2013 15:27:46 -0800 Andrew Morton <akpm@linux-foundation.org>
wrote:

> On Sun, 17 Nov 2013 08:06:03 +1100 NeilBrown <neilb@suse.de> wrote:
> 
> > It would be reasonable to assume that
> > 
> >       wait_for_completion_timeout(&wm8350->auxadc_done, msecs_to_jiffies(5));
> > 
> > would wait at least 5 msecs for the auxadc_done to complete.  But it does not.
> > With a HZ of 200 or less, msecs_to_jiffies(5) has value '1', and so this
> > will only wait until the next "timer tick", which could happen immediately.
> > 
> > This can lead to incorrect results - and has done so in out-of-tree patches
> > for drivers/misc/bmp085.c which uses a very similar construct to enable interrupt
> > based result collection.
> > 
> > The documentation for several *_timeout* functions claim they will wait for
> > "timeout jiffies" to have elapsed where this is not the case.  They will
> > actually wait for "timeout" jiffies to have started implying an elapsed time
> > between (timeout-1) and (timeout).
> > 
> > This patch corrects some of this documentation, and adds a collection of
> >   wait_for_completion*_msecs()
> > interfaces which wait at least the given number of milliseconds for the
> > completion (or a signal).
> 
> Mutter.  wait_for_x(..., 5ms) should wait for a minimum of 5ms, no matter
> what.
> 
> So I'd suggest we make that happen, rather than adding some new interfaces?

I thought of that.  It would certainly be nice.

However what we have is 
         XXX_timeout(...., jiffies).
And if we decided that
         XXX_timeout(...., msecs_to_jiffies(5))
would only timeout after at least 5ms, then
         schedule_timeout(1)
would have to wait at least one full jiffie, which is quite different to what
it currently does.

We have loops that have
    timeout = schedule_timeout(timeout)
in the middle and if we change the semantics of schedule_timeout() to round
up, those loops could wait quite a bit longer than expected.

So I think that we do need to add new interfaces just like msleep() was introduced
a while back to fix all the various misuses of
    schedule_timeout(msecs_to_jiffies(XX))).

Possibly we can also discard old bad interfaces.
Maybe the *_timeout() interfaces should become *_until() where the jiffies
number isn't a count but is a value that we wait for "jiffies" to exceed.

I don't think there is a really easy solution, but thanks for pushing the
discussion along towards trying to understand one.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-18 23:42   ` Peter Zijlstra
@ 2013-11-19  0:49     ` Jonathan Corbet
  2013-11-19  7:05       ` Ingo Molnar
  2013-11-19  8:29       ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Jonathan Corbet @ 2013-11-19  0:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, NeilBrown, Ingo Molnar, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

On Tue, 19 Nov 2013 00:42:09 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> I briefly talked to Thomas about this earlier today and we need to fix
> this at a lower level -- the quick 'n dirty solution is to add 1 jiffy
> down in the timer-wheel when we enqueue these things.

That can lead to situations like the one I encountered years ago where
msleep(1) would snooze for 20ms.  I didn't get much love for my idea of
switching msleep() to hrtimers back then, but I still think it might be be
better to provide the resolution that the interface appears to promise.

jon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-19  0:49     ` Jonathan Corbet
@ 2013-11-19  7:05       ` Ingo Molnar
  2013-11-19  8:29       ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2013-11-19  7:05 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Peter Zijlstra, Andrew Morton, NeilBrown, Ingo Molnar,
	Thomas Gleixner, lkml, Dr. H. Nikolaus Schaller, Marek Belisko,
	Mark Brown, Linus Torvalds


* Jonathan Corbet <corbet@lwn.net> wrote:

> On Tue, 19 Nov 2013 00:42:09 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > I briefly talked to Thomas about this earlier today and we need to 
> > fix this at a lower level -- the quick 'n dirty solution is to add 
> > 1 jiffy down in the timer-wheel when we enqueue these things.
> 
> That can lead to situations like the one I encountered years ago 
> where msleep(1) would snooze for 20ms.  I didn't get much love for 
> my idea of switching msleep() to hrtimers back then, but I still 
> think it might be be better to provide the resolution that the 
> interface appears to promise.

That looks like a sensible approach - mind resending that patch? We 
can put it into the timer tree and see what happens.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-18 23:44   ` NeilBrown
@ 2013-11-19  8:25     ` Peter Zijlstra
  2013-11-19  8:58       ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2013-11-19  8:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

On Tue, Nov 19, 2013 at 10:44:38AM +1100, NeilBrown wrote:
> We have loops that have
>     timeout = schedule_timeout(timeout)
> in the middle and if we change the semantics of schedule_timeout() to round
> up, those loops could wait quite a bit longer than expected.

Depends on what you expect; most of these functions have documentation
that says they will sleep at least timeout amount of time.

schedule_timeout()'s version looks like:

 * Make the current task sleep until @timeout jiffies have
 * elapsed.

Clearly it doesn't do that currently, so adding 1 will actually make it
do what it says on the tin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-19  0:49     ` Jonathan Corbet
  2013-11-19  7:05       ` Ingo Molnar
@ 2013-11-19  8:29       ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2013-11-19  8:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, NeilBrown, Ingo Molnar, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

On Mon, Nov 18, 2013 at 05:49:02PM -0700, Jonathan Corbet wrote:
> On Tue, 19 Nov 2013 00:42:09 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > I briefly talked to Thomas about this earlier today and we need to fix
> > this at a lower level -- the quick 'n dirty solution is to add 1 jiffy
> > down in the timer-wheel when we enqueue these things.
> 
> That can lead to situations like the one I encountered years ago where
> msleep(1) would snooze for 20ms. 

No, for up to 20ms, since you get part of the first jiffy, which was the
whole problem (assuming we're talking HZ=100 here).

> I didn't get much love for my idea of
> switching msleep() to hrtimers back then, but I still think it might be be
> better to provide the resolution that the interface appears to promise.

It would actually make more sense to use hrtimers here than it does to
use the old timers since hrtimers has much better gurantees but also
because the timer is guaranteed to fire.

The use case where the old timers really beat hrtimers hands down is
for timeout timers that never actually fire.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-19  8:25     ` Peter Zijlstra
@ 2013-11-19  8:58       ` NeilBrown
  2013-11-19 12:23         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2013-11-19  8:58 UTC (permalink / raw)
  To: Peter Zijlstra, Jonathan Corbet, Dean Nelson
  Cc: Andrew Morton, Ingo Molnar, Thomas Gleixner, lkml,
	Dr. H. Nikolaus Schaller, Marek Belisko, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 2743 bytes --]

On Tue, 19 Nov 2013 09:25:48 +0100 Peter Zijlstra <peterz@infradead.org>
wrote:

> On Tue, Nov 19, 2013 at 10:44:38AM +1100, NeilBrown wrote:
> > We have loops that have
> >     timeout = schedule_timeout(timeout)
> > in the middle and if we change the semantics of schedule_timeout() to round
> > up, those loops could wait quite a bit longer than expected.
> 
> Depends on what you expect; most of these functions have documentation
> that says they will sleep at least timeout amount of time.
> 
> schedule_timeout()'s version looks like:
> 
>  * Make the current task sleep until @timeout jiffies have
>  * elapsed.
> 
> Clearly it doesn't do that currently, so adding 1 will actually make it
> do what it says on the tin.

Hmm.. maybe you are right, and I now see that my argument about loops isn't
nearly as convincing as it seemed at the time.

However there is still the risk that someone wrote code depending on the
current behaviour rather than the current documentation.  msleep() is one
such example (it has a "+1") but that is easily found and fixed.

I went hunting for code that uses a timeout of '1' which would be affected
the most (I found no use cases for '0').  Two interesting examples:

drivers/video/via/via-core.c: 

	/*
	 * Now we just wait until the interrupt handler says
	 * we're done.  Except that, actually, we need to wait a little
	 * longer: the interrupts seem to jump the gun a little and we
	 * get corrupted frames sometimes.
	 */
	wait_for_completion_timeout(&viafb_dma_completion, 1);
	msleep(1);

Maybe the 'msleep(1)' was only needed because of this 'bug' - Jon might know.
In that case the comment and  msleep could get removed.

drivers/misc/sgi-xp/xpc_channel.c:

/*
 * Wait for a message entry to become available for the specified channel,
 * but don't wait any longer than 1 jiffy.
 */

.....
	atomic_inc(&ch->n_on_msg_allocate_wq);
	ret = interruptible_sleep_on_timeout(&ch->msg_allocate_wq, 1);
	atomic_dec(&ch->n_on_msg_allocate_wq);

If interruptible_sleep_on_timeout were affected by the proposed change, then
the code would no longer match the comment.  That code is five years old ...
I wonder Dean remembers if it is important ... or is even still at SGI.

Also
     git grep '_timeout[_a-z]*(.*msecs_to_jiffies(1)'

finds 10 matches which would be significantly affected.  I can't easily say
if it would be for the better or not.

Another random piece of code that maybe could get simplified:
drivers/iio/light/tsl2563.c:
	/*
	 * TODO: Make sure that we wait at least required delay but why we
	 * have to extend it one tick more?
	 */
	schedule_timeout_interruptible(msecs_to_jiffies(delay) + 2);


NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-19  8:58       ` NeilBrown
@ 2013-11-19 12:23         ` Peter Zijlstra
  2013-11-19 14:39           ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2013-11-19 12:23 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jonathan Corbet, Dean Nelson, Andrew Morton, Ingo Molnar,
	Thomas Gleixner, lkml, Dr. H. Nikolaus Schaller, Marek Belisko,
	Mark Brown

On Tue, Nov 19, 2013 at 07:58:51PM +1100, NeilBrown wrote:
> 	/*
> 	 * TODO: Make sure that we wait at least required delay but why we
> 	 * have to extend it one tick more?
> 	 */
> 	schedule_timeout_interruptible(msecs_to_jiffies(delay) + 2);

What makes me sad is that clearly people knew stuff was broken but
somehow it never got properly fixed.

Yes, changing something like this is risky, but I prefer to fix the
implementation to the sane and documented behaviour and fix up whatever
fallout that generates. The end result is saner code in general and less
new bugs.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.
  2013-11-19 12:23         ` Peter Zijlstra
@ 2013-11-19 14:39           ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2013-11-19 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: NeilBrown, Jonathan Corbet, Dean Nelson, Andrew Morton,
	Ingo Molnar, lkml, Dr. H. Nikolaus Schaller, Marek Belisko,
	Mark Brown

On Tue, 19 Nov 2013, Peter Zijlstra wrote:
> On Tue, Nov 19, 2013 at 07:58:51PM +1100, NeilBrown wrote:
> > 	/*
> > 	 * TODO: Make sure that we wait at least required delay but why we
> > 	 * have to extend it one tick more?
> > 	 */
> > 	schedule_timeout_interruptible(msecs_to_jiffies(delay) + 2);
> 
> What makes me sad is that clearly people knew stuff was broken but
> somehow it never got properly fixed.
> 
> Yes, changing something like this is risky, but I prefer to fix the
> implementation to the sane and documented behaviour and fix up whatever
> fallout that generates. The end result is saner code in general and less
> new bugs.
 
The underlying issue of the timer wheel is that it is driven by a
coarse grained tick.

So a schedule_timeout(1) is guaranteed to sleep until the next tick
arrives. There is no way to figure out whether the timer was started
right at the beginning of a tick period or right at the end.

This has been the case from day one of the timer wheel.

The user space interfaces [clock_]nanosleep and the various posix
timers had countermeasures based on gettimeofday() to guarantee the
correct behaviour. glibc still has some extra checks around some
timeout related syscalls for that reason. We killed that mess when we
converted them over to hrtimers.

So if you really want to make sure that you slept at minimum the given
number of ticks with the timer wheel, then you have only one choice:

       Add unconditionally 1 to the given number of ticks

I don't think that this will cause a massive fallout as the timing of
the timer_list timers already varies by an order of magnitude due to
the HZ settings. So anything which relies on the timer wheel in terms
of precision is hosed already. So adding 1 will just hose it some
more.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-19 14:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-16 21:06 [PATCH/RFC] wait_for_completion_timeout() considered harmful NeilBrown
2013-11-18 23:27 ` Andrew Morton
2013-11-18 23:42   ` Peter Zijlstra
2013-11-19  0:49     ` Jonathan Corbet
2013-11-19  7:05       ` Ingo Molnar
2013-11-19  8:29       ` Peter Zijlstra
2013-11-18 23:44   ` NeilBrown
2013-11-19  8:25     ` Peter Zijlstra
2013-11-19  8:58       ` NeilBrown
2013-11-19 12:23         ` Peter Zijlstra
2013-11-19 14:39           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).