linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
@ 2013-06-04 19:28 Oleg Nesterov
  2013-06-04 21:35 ` Imre Deak
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2013-06-04 19:28 UTC (permalink / raw)
  To: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
	Imre Deak, Jens Axboe, Linus Torvalds, Lukas Czerner,
	Paul E. McKenney
  Cc: linux-kernel

Hello,

Just noticed this commit...

commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
Author: Imre Deak <imre.deak@intel.com>
Date:   Fri May 24 15:55:09 2013 -0700

    Many callers of the wait_event_timeout() and
    wait_event_interruptible_timeout() expect that the return value will be
    positive if the specified condition becomes true before the timeout
    elapses.  However, at the moment this isn't guaranteed.  If the wake-up
    handler is delayed enough, the time remaining until timeout will be
    calculated as 0 - and passed back as a return value - even if the
    condition became true before the timeout has passed.

OK, agreed.

	--- a/include/linux/wait.h
	+++ b/include/linux/wait.h
	@@ -217,6 +217,8 @@ do {						\
			if (!ret)						\
				break;						\
		}								\
	+	if (!ret && (condition))					\
	+		ret = 1;						\
		finish_wait(&wq, &__wait);					\
	 } while (0)

Well, this evaluates "condition" twice, perhaps it would be more
clean to do, say,

	#define __wait_event_timeout(wq, condition, ret)			\
	do {									\
		DEFINE_WAIT(__wait);						\
										\
		for (;;) {							\
			prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
			if (condition) {					\
				if (!ret)					\
					ret = 1;				\
				break;						\
			} else if (!ret)					\
				break;						\
			ret = schedule_timeout(ret);				\
		}								\
		finish_wait(&wq, &__wait);					\
	} while (0)

but this is minor.

	@@ -233,8 +235,9 @@ do {						\
	  * wake_up() has to be called after changing any variable that could
	  * change the result of the wait condition.
	  *
	- * The function returns 0 if the @timeout elapsed, and the remaining
	- * jiffies if the condition evaluated to true before the timeout elapsed.
	+ * The function returns 0 if the @timeout elapsed, or the remaining
	+ * jiffies (at least 1) if the @condition evaluated to %true before
	+ * the @timeout elapsed.

This is still not true if timeout == 0.

Shouldn't we also change wait_event_timeout() ? Say,

	#define wait_event_timeout(wq, condition, timeout)			\
	({									\
		long __ret = timeout;						\
		if (!(condition))						\
			__wait_event_timeout(wq, condition, __ret);		\
		else if (!__ret)						\
			__ret = 1;						\
		__ret;								\
	})

Or wait_event_timeout(timeout => 0) is not legal in a non-void context?

To me the code like

	long wait_for_something(bool nonblock)
	{
		timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
		return wait_event_timeout(..., timeout);
	}

looks reasonable and correct. But it is not?

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-04 19:28 [PATCH] wait: fix false timeouts when using wait_event_timeout() Oleg Nesterov
@ 2013-06-04 21:35 ` Imre Deak
  2013-06-04 21:40   ` Imre Deak
  0 siblings, 1 reply; 23+ messages in thread
From: Imre Deak @ 2013-06-04 21:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
	Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney,
	linux-kernel

On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> Hello,
> 
> Just noticed this commit...
> 
> commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
> Author: Imre Deak <imre.deak@intel.com>
> Date:   Fri May 24 15:55:09 2013 -0700
> 
>     Many callers of the wait_event_timeout() and
>     wait_event_interruptible_timeout() expect that the return value will be
>     positive if the specified condition becomes true before the timeout
>     elapses.  However, at the moment this isn't guaranteed.  If the wake-up
>     handler is delayed enough, the time remaining until timeout will be
>     calculated as 0 - and passed back as a return value - even if the
>     condition became true before the timeout has passed.
> 
> OK, agreed.
> 
> 	--- a/include/linux/wait.h
> 	+++ b/include/linux/wait.h
> 	@@ -217,6 +217,8 @@ do {						\
> 			if (!ret)						\
> 				break;						\
> 		}								\
> 	+	if (!ret && (condition))					\
> 	+		ret = 1;						\
> 		finish_wait(&wq, &__wait);					\
> 	 } while (0)
> 
> Well, this evaluates "condition" twice, perhaps it would be more
> clean to do, say,
> 
> 	#define __wait_event_timeout(wq, condition, ret)			\
> 	do {									\
> 		DEFINE_WAIT(__wait);						\
> 										\
> 		for (;;) {							\
> 			prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
> 			if (condition) {					\
> 				if (!ret)					\
> 					ret = 1;				\
> 				break;						\
> 			} else if (!ret)					\
> 				break;						\
> 			ret = schedule_timeout(ret);				\
> 		}								\
> 		finish_wait(&wq, &__wait);					\
> 	} while (0)
> 
> but this is minor.
> 
> 	@@ -233,8 +235,9 @@ do {						\
> 	  * wake_up() has to be called after changing any variable that could
> 	  * change the result of the wait condition.
> 	  *
> 	- * The function returns 0 if the @timeout elapsed, and the remaining
> 	- * jiffies if the condition evaluated to true before the timeout elapsed.
> 	+ * The function returns 0 if the @timeout elapsed, or the remaining
> 	+ * jiffies (at least 1) if the @condition evaluated to %true before
> 	+ * the @timeout elapsed.
> 
> This is still not true if timeout == 0.
> 
> Shouldn't we also change wait_event_timeout() ? Say,
> 
> 	#define wait_event_timeout(wq, condition, timeout)			\
> 	({									\
> 		long __ret = timeout;						\
> 		if (!(condition))						\
> 			__wait_event_timeout(wq, condition, __ret);		\
> 		else if (!__ret)						\
> 			__ret = 1;						\
> 		__ret;								\
> 	})
> 
> Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
> 
> To me the code like
> 
> 	long wait_for_something(bool nonblock)
> 	{
> 		timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> 		return wait_event_timeout(..., timeout);
> 	}
> 
> looks reasonable and correct. But it is not?

I don't see why it would be not legal. Note though that in the above
form wait_event_timeout(cond, 0) would still schedule() if cond is
false, which is not what I'd expect from a non-blocking function.

--Imre


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-04 21:35 ` Imre Deak
@ 2013-06-04 21:40   ` Imre Deak
  2013-06-05 16:37     ` Oleg Nesterov
  0 siblings, 1 reply; 23+ messages in thread
From: Imre Deak @ 2013-06-04 21:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
	Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney,
	linux-kernel

On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote:
> On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> > Hello,
> > 
> > Just noticed this commit...
> > 
> > commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
> > Author: Imre Deak <imre.deak@intel.com>
> > Date:   Fri May 24 15:55:09 2013 -0700
> > 
> >     Many callers of the wait_event_timeout() and
> >     wait_event_interruptible_timeout() expect that the return value will be
> >     positive if the specified condition becomes true before the timeout
> >     elapses.  However, at the moment this isn't guaranteed.  If the wake-up
> >     handler is delayed enough, the time remaining until timeout will be
> >     calculated as 0 - and passed back as a return value - even if the
> >     condition became true before the timeout has passed.
> > 
> > OK, agreed.
> > 
> > 	--- a/include/linux/wait.h
> > 	+++ b/include/linux/wait.h
> > 	@@ -217,6 +217,8 @@ do {						\
> > 			if (!ret)						\
> > 				break;						\
> > 		}								\
> > 	+	if (!ret && (condition))					\
> > 	+		ret = 1;						\
> > 		finish_wait(&wq, &__wait);					\
> > 	 } while (0)
> > 
> > Well, this evaluates "condition" twice, perhaps it would be more
> > clean to do, say,
> > 
> > 	#define __wait_event_timeout(wq, condition, ret)			\
> > 	do {									\
> > 		DEFINE_WAIT(__wait);						\
> > 										\
> > 		for (;;) {							\
> > 			prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
> > 			if (condition) {					\
> > 				if (!ret)					\
> > 					ret = 1;				\
> > 				break;						\
> > 			} else if (!ret)					\
> > 				break;						\
> > 			ret = schedule_timeout(ret);				\
> > 		}								\
> > 		finish_wait(&wq, &__wait);					\
> > 	} while (0)
> > 
> > but this is minor.
> > 
> > 	@@ -233,8 +235,9 @@ do {						\
> > 	  * wake_up() has to be called after changing any variable that could
> > 	  * change the result of the wait condition.
> > 	  *
> > 	- * The function returns 0 if the @timeout elapsed, and the remaining
> > 	- * jiffies if the condition evaluated to true before the timeout elapsed.
> > 	+ * The function returns 0 if the @timeout elapsed, or the remaining
> > 	+ * jiffies (at least 1) if the @condition evaluated to %true before
> > 	+ * the @timeout elapsed.
> > 
> > This is still not true if timeout == 0.
> > 
> > Shouldn't we also change wait_event_timeout() ? Say,
> > 
> > 	#define wait_event_timeout(wq, condition, timeout)			\
> > 	({									\
> > 		long __ret = timeout;						\
> > 		if (!(condition))						\
> > 			__wait_event_timeout(wq, condition, __ret);		\
> > 		else if (!__ret)						\
> > 			__ret = 1;						\
> > 		__ret;								\
> > 	})
> > 
> > Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
> > 
> > To me the code like
> > 
> > 	long wait_for_something(bool nonblock)
> > 	{
> > 		timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> > 		return wait_event_timeout(..., timeout);
> > 	}
> > 
> > looks reasonable and correct. But it is not?
> 
> I don't see why it would be not legal. Note though that in the above
> form wait_event_timeout(cond, 0) would still schedule() if cond is
> false, which is not what I'd expect from a non-blocking function.

Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0
wouldn't schedule(), so things would work as expected.

--Imre



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-04 21:40   ` Imre Deak
@ 2013-06-05 16:37     ` Oleg Nesterov
  2013-06-05 19:07       ` Oleg Nesterov
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2013-06-05 16:37 UTC (permalink / raw)
  To: Imre Deak
  Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
	Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney,
	linux-kernel

On 06/05, Imre Deak wrote:
>
> On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote:
> > On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> > >
> > > Shouldn't we also change wait_event_timeout() ? Say,
> > >
> > > 	#define wait_event_timeout(wq, condition, timeout)			\
> > > 	({									\
> > > 		long __ret = timeout;						\
> > > 		if (!(condition))						\
> > > 			__wait_event_timeout(wq, condition, __ret);		\
> > > 		else if (!__ret)						\
> > > 			__ret = 1;						\
> > > 		__ret;								\
> > > 	})
> > >
> > > Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
> > >
> > > To me the code like
> > >
> > > 	long wait_for_something(bool nonblock)
> > > 	{
> > > 		timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> > > 		return wait_event_timeout(..., timeout);
> > > 	}
> > >
> > > looks reasonable and correct. But it is not?
> >
> > I don't see why it would be not legal. Note though that in the above
> > form wait_event_timeout(cond, 0) would still schedule() if cond is
> > false, which is not what I'd expect from a non-blocking function.

Yes, if false. But what if it is true?

> Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0
> wouldn't schedule(), so things would work as expected.

Can't understand... probably you missed my point. Let me try again.

I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND).
But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this
doesn'tlook right to me.

And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout
do not match wait_event/__wait_event. IOW, you can't use
__wait_eveint_timeout() if you do not need the fast-path check.

So. How about

	#define __wait_event_timeout(wq, condition, timeout)			\
	({									\
		DEFINE_WAIT(__wait);						\
		long __ret = 0, __to = timeout;					\
										\
		for (;;) {							\
			prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
			if (condition) {					\
				__ret = __to ?: 1;				\
				break;						\
			}							\
			if (!__to)						\
				break;						\
			__to = schedule_timeout(__to);				\
		}								\
		finish_wait(&wq, &__wait);					\
		__ret;								\
	})

	#define wait_event_timeout(wq, condition, timeout)			\
	({									\
		long __ret;							\
		if (condition)							\
			__ret = (timeout) ?: 1;					\
		else								\
			__ret = __wait_event_timeout(wq, condition, timeout);	\
		__ret;								\
	})

?

Othwerwise we should document the fact that the caller should alvays verify
timeout != 0 if it checks the result of wait_event_timeout().


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-05 16:37     ` Oleg Nesterov
@ 2013-06-05 19:07       ` Oleg Nesterov
  2013-06-06  1:45         ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2013-06-05 19:07 UTC (permalink / raw)
  To: Imre Deak
  Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
	Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney,
	linux-kernel, Tejun Heo

On 06/05, Oleg Nesterov wrote:
>
> I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND).
> But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this
> doesn'tlook right to me.
>
> And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout
> do not match wait_event/__wait_event. IOW, you can't use
> __wait_eveint_timeout() if you do not need the fast-path check.
>
> So. How about
>
> 	#define __wait_event_timeout(wq, condition, timeout)			\
> 	({									\
> 		DEFINE_WAIT(__wait);						\
> 		long __ret = 0, __to = timeout;					\
> 										\
> 		for (;;) {							\
> 			prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
> 			if (condition) {					\
> 				__ret = __to ?: 1;				\
> 				break;						\
> 			}							\
> 			if (!__to)						\
> 				break;						\
> 			__to = schedule_timeout(__to);				\
> 		}								\
> 		finish_wait(&wq, &__wait);					\
> 		__ret;								\
> 	})
>
> 	#define wait_event_timeout(wq, condition, timeout)			\
> 	({									\
> 		long __ret;							\
> 		if (condition)							\
> 			__ret = (timeout) ?: 1;					\
> 		else								\
> 			__ret = __wait_event_timeout(wq, condition, timeout);	\
> 		__ret;								\
> 	})
>
> ?
>
> Othwerwise we should document the fact that the caller should alvays verify
> timeout != 0 if it checks the result of wait_event_timeout().

And in fact, perhaps we can implement wait_event_common() and avoid the
code duplications?

	#define __wait_no_timeout(timeout)	\
		(__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)

	/* uglified signal_pending_state() */
	#define __wait_signal_pending(state)					\
		((state == TASK_INTERRUPTIBLE) ? signal_pending(current) :	\
		 (state == TASK_KILLABLE) ? fatal_signal_pending(current) :	\
		  0)

	#define __wait_event_common(wq, condition, state, tout)			\
	({									\
		DEFINE_WAIT(__wait);						\
		long __ret = 0, __tout = tout;					\
										\
		for (;;) {							\
			prepare_to_wait(&wq, &__wait, state);			\
			if (condition) {					\
				__ret = __wait_no_timeout(tout) ?: __tout ?: 1;	\
				break;						\
			}							\
										\
			if (__wait_signal_pending(state)) {			\
				__ret = -ERESTARTSYS;				\
				break;						\
			}							\
										\
			if (__wait_no_timeout(tout))				\
				schedule();					\
			else if (__tout)					\
				__tout = schedule_timeout(__tout);		\
			else							\
				break;						\
		}								\
		finish_wait(&wq, &__wait);					\
		__ret;								\
	})

	#define wait_event_common(wq, condition, state, tout)			\
	({									\
		long __ret;							\
		if (condition)							\
			__ret = __wait_no_timeout(tout) ?: (tout) ?: 1;		\
		else								\
			__ret = __wait_event_common(wq, condition, state, tout);\
		__ret;								\
	})

Then, for example, wait_event() becomes

	#define wait_event(wq, condition) 					\
		wait_event_common(wq, condition,				\
				TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)	\

Hmm. I compiled the kernel with the patch below,

		$ size vmlinux
		   text	   data	    bss	    dec	    hex	filename
	-	4978601	2935080	10104832	18018513	112f0d1	vmlinux
	+	4977769	2930984	10104832	18013585	112dd91	vmlinux

looks good...

Oleg.

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1133695..359fffc 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -173,18 +173,52 @@ wait_queue_head_t *bit_waitqueue(void *, int);
 #define wake_up_interruptible_sync_poll(x, m)				\
 	__wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m))
 
-#define __wait_event(wq, condition) 					\
-do {									\
+#define __wait_no_timeout(timeout)	\
+	(__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)
+
+/* uglified signal_pending_state() */
+#define __wait_signal_pending(state)					\
+	((state == TASK_INTERRUPTIBLE) ? signal_pending(current) :	\
+	 (state == TASK_KILLABLE) ? fatal_signal_pending(current) :	\
+	  0)
+
+#define __wait_event_common(wq, condition, state, tout)			\
+({									\
 	DEFINE_WAIT(__wait);						\
+	long __ret = 0, __tout = tout;					\
 									\
 	for (;;) {							\
-		prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
-		if (condition)						\
+		prepare_to_wait(&wq, &__wait, state);			\
+		if (condition) {					\
+			__ret = __wait_no_timeout(tout) ?: __tout ?: 1;	\
+			break;						\
+		}							\
+									\
+		if (__wait_signal_pending(state)) {			\
+			__ret = -ERESTARTSYS;				\
+			break;						\
+		}							\
+									\
+		if (__wait_no_timeout(tout))				\
+			schedule();					\
+		else if (__tout)					\
+			__tout = schedule_timeout(__tout);		\
+		else							\
 			break;						\
-		schedule();						\
 	}								\
 	finish_wait(&wq, &__wait);					\
-} while (0)
+	__ret;								\
+})
+
+#define wait_event_common(wq, condition, state, tout)			\
+({									\
+	long __ret;							\
+	if (condition)							\
+		__ret = __wait_no_timeout(tout) ?: (tout) ?: 1;		\
+	else								\
+		__ret = __wait_event_common(wq, condition, state, tout);\
+	__ret;								\
+})
 
 /**
  * wait_event - sleep until a condition gets true
@@ -198,29 +232,13 @@ do {									\
  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.
  */
-#define wait_event(wq, condition) 					\
-do {									\
-	if (condition)	 						\
-		break;							\
-	__wait_event(wq, condition);					\
-} while (0)
+#define __wait_event(wq, condition) 					\
+	__wait_event_common(wq, condition,				\
+			TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)	\
 
-#define __wait_event_timeout(wq, condition, ret)			\
-do {									\
-	DEFINE_WAIT(__wait);						\
-									\
-	for (;;) {							\
-		prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);	\
-		if (condition)						\
-			break;						\
-		ret = schedule_timeout(ret);				\
-		if (!ret)						\
-			break;						\
-	}								\
-	if (!ret && (condition))					\
-		ret = 1;						\
-	finish_wait(&wq, &__wait);					\
-} while (0)
+#define wait_event(wq, condition) 					\
+	wait_event_common(wq, condition,				\
+			TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)	\
 
 /**
  * wait_event_timeout - sleep until a condition gets true or a timeout elapses
@@ -239,31 +257,13 @@ do {									\
  * jiffies (at least 1) if the @condition evaluated to %true before
  * the @timeout elapsed.
  */
-#define wait_event_timeout(wq, condition, timeout)			\
-({									\
-	long __ret = timeout;						\
-	if (!(condition)) 						\
-		__wait_event_timeout(wq, condition, __ret);		\
-	__ret;								\
-})
+#define __wait_event_timeout(wq, condition, timeout)			\
+	__wait_event_common(wq, condition,				\
+			TASK_UNINTERRUPTIBLE, timeout)
 
-#define __wait_event_interruptible(wq, condition, ret)			\
-do {									\
-	DEFINE_WAIT(__wait);						\
-									\
-	for (;;) {							\
-		prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);	\
-		if (condition)						\
-			break;						\
-		if (!signal_pending(current)) {				\
-			schedule();					\
-			continue;					\
-		}							\
-		ret = -ERESTARTSYS;					\
-		break;							\
-	}								\
-	finish_wait(&wq, &__wait);					\
-} while (0)
+#define wait_event_timeout(wq, condition, timeout)			\
+	wait_event_common(wq, condition,				\
+			TASK_UNINTERRUPTIBLE, timeout)
 
 /**
  * wait_event_interruptible - sleep until a condition gets true
@@ -280,35 +280,13 @@ do {									\
  * The function will return -ERESTARTSYS if it was interrupted by a
  * signal and 0 if @condition evaluated to true.
  */
-#define wait_event_interruptible(wq, condition)				\
-({									\
-	int __ret = 0;							\
-	if (!(condition))						\
-		__wait_event_interruptible(wq, condition, __ret);	\
-	__ret;								\
-})
+#define __wait_event_interruptible(wq, condition)			\
+	__wait_event_common(wq, condition,				\
+			TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)
 
-#define __wait_event_interruptible_timeout(wq, condition, ret)		\
-do {									\
-	DEFINE_WAIT(__wait);						\
-									\
-	for (;;) {							\
-		prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);	\
-		if (condition)						\
-			break;						\
-		if (!signal_pending(current)) {				\
-			ret = schedule_timeout(ret);			\
-			if (!ret)					\
-				break;					\
-			continue;					\
-		}							\
-		ret = -ERESTARTSYS;					\
-		break;							\
-	}								\
-	if (!ret && (condition))					\
-		ret = 1;						\
-	finish_wait(&wq, &__wait);					\
-} while (0)
+#define wait_event_interruptible(wq, condition)				\
+	wait_event_common(wq, condition,				\
+			TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)
 
 /**
  * wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses
@@ -328,13 +306,13 @@ do {									\
  * a signal, or the remaining jiffies (at least 1) if the @condition
  * evaluated to %true before the @timeout elapsed.
  */
+#define __wait_event_interruptible_timeout(wq, condition, timeout)	\
+	__wait_event_common(wq, condition,				\
+			TASK_INTERRUPTIBLE, timeout)
+
 #define wait_event_interruptible_timeout(wq, condition, timeout)	\
-({									\
-	long __ret = timeout;						\
-	if (!(condition))						\
-		__wait_event_interruptible_timeout(wq, condition, __ret); \
-	__ret;								\
-})
+	wait_event_common(wq, condition,				\
+			TASK_INTERRUPTIBLE, timeout)
 
 #define __wait_event_hrtimeout(wq, condition, timeout, state)		\
 ({									\
@@ -601,24 +579,6 @@ do {									\
 
 
 
-#define __wait_event_killable(wq, condition, ret)			\
-do {									\
-	DEFINE_WAIT(__wait);						\
-									\
-	for (;;) {							\
-		prepare_to_wait(&wq, &__wait, TASK_KILLABLE);		\
-		if (condition)						\
-			break;						\
-		if (!fatal_signal_pending(current)) {			\
-			schedule();					\
-			continue;					\
-		}							\
-		ret = -ERESTARTSYS;					\
-		break;							\
-	}								\
-	finish_wait(&wq, &__wait);					\
-} while (0)
-
 /**
  * wait_event_killable - sleep until a condition gets true
  * @wq: the waitqueue to wait on
@@ -634,14 +594,13 @@ do {									\
  * The function will return -ERESTARTSYS if it was interrupted by a
  * signal and 0 if @condition evaluated to true.
  */
-#define wait_event_killable(wq, condition)				\
-({									\
-	int __ret = 0;							\
-	if (!(condition))						\
-		__wait_event_killable(wq, condition, __ret);		\
-	__ret;								\
-})
+#define __wait_event_killable(wq, condition)				\
+	__wait_event_common(wq, condition,				\
+			TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT)
 
+#define wait_event_killable(wq, condition)				\
+	wait_event_common(wq, condition,				\
+			TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT)
 
 #define __wait_event_lock_irq(wq, condition, lock, cmd)			\
 do {									\


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-05 19:07       ` Oleg Nesterov
@ 2013-06-06  1:45         ` Tejun Heo
  2013-06-06 18:47           ` Oleg Nesterov
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2013-06-06  1:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Imre Deak, Andrew Morton, Daniel Vetter, Dave Jones,
	David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner,
	Paul E. McKenney, linux-kernel

Hello, Oleg.

On Wed, Jun 05, 2013 at 09:07:23PM +0200, Oleg Nesterov wrote:
> And in fact, perhaps we can implement wait_event_common() and avoid the
> code duplications?
> 
> 	#define __wait_no_timeout(timeout)	\
> 		(__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)
> 
> 	/* uglified signal_pending_state() */
> 	#define __wait_signal_pending(state)					\
> 		((state == TASK_INTERRUPTIBLE) ? signal_pending(current) :	\
> 		 (state == TASK_KILLABLE) ? fatal_signal_pending(current) :	\
> 		  0)
> 
> 	#define __wait_event_common(wq, condition, state, tout)			\
> 	({									\
> 		DEFINE_WAIT(__wait);						\
> 		long __ret = 0, __tout = tout;					\
> 										\
> 		for (;;) {							\
> 			prepare_to_wait(&wq, &__wait, state);			\
> 			if (condition) {					\
> 				__ret = __wait_no_timeout(tout) ?: __tout ?: 1;	\
> 				break;						\
> 			}							\
> 										\
> 			if (__wait_signal_pending(state)) {			\
> 				__ret = -ERESTARTSYS;				\
> 				break;						\
> 			}							\
> 										\
> 			if (__wait_no_timeout(tout))				\
> 				schedule();					\
> 			else if (__tout)					\
> 				__tout = schedule_timeout(__tout);		\
> 			else							\
> 				break;						\
> 		}								\
> 		finish_wait(&wq, &__wait);					\
> 		__ret;								\
> 	})

Heh, yeah, this looks good to me and a lot better than trying to do
the same thing over and over again and ending up with subtle
differences.

> Hmm. I compiled the kernel with the patch below,
> 
> 		$ size vmlinux
> 		   text	   data	    bss	    dec	    hex	filename
> 	-	4978601	2935080	10104832	18018513	112f0d1	vmlinux
> 	+	4977769	2930984	10104832	18013585	112dd91	vmlinux

Nice.  Provided you went over assembly outputs of at least some
combinations, please feel free to add

 Reviewed-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-06-06  1:45         ` Tejun Heo
@ 2013-06-06 18:47           ` Oleg Nesterov
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2013-06-06 18:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Imre Deak, Andrew Morton, Daniel Vetter, Dave Jones,
	David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner,
	Paul E. McKenney, linux-kernel

Hi Tejun,

On 06/05, Tejun Heo wrote:
>
> Heh, yeah, this looks good to me and a lot better than trying to do
> the same thing over and over again and ending up with subtle
> differences.

Yes, this is the goal. Of course we could fix wait_event_timout() and
wait_event_interruptible_timeout() without unification, but this would
add more copy-and-paste.

> > Hmm. I compiled the kernel with the patch below,
> >
> > 		$ size vmlinux
> > 		   text	   data	    bss	    dec	    hex	filename
> > 	-	4978601	2935080	10104832	18018513	112f0d1	vmlinux
> > 	+	4977769	2930984	10104832	18013585	112dd91	vmlinux
>
> Nice.  Provided you went over assembly outputs of at least some
> combinations,

I did, and that is why I am surprized by the numbers above.

Yes, gcc optimizes out the unwanted checks but the generated code
differs, of course. And sometimes the difference looks "random" to me.

Simplest example:

	extern wait_queue_head_t WQ;
	extern int COND;

	void we(void)
	{
		wait_event(WQ, COND);
	}

before:
	we:
		pushq	%rbp	#
		movq	%rsp, %rbp	#,
		pushq	%rbx	#
		subq	$56, %rsp	#,
		call	mcount
		movl	COND(%rip), %edx	# COND,
		testl	%edx, %edx	#
		jne	.L5	#,
		leaq	-64(%rbp), %rbx	#, tmp66
		movq	$0, -64(%rbp)	#, __wait
		movq	$autoremove_wake_function, -48(%rbp)	#, __wait.func
	#APP
	# 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1
		movq %gs:current_task,%rax	#, pfo_ret__
	# 0 "" 2
	#NO_APP
		movq	%rax, -56(%rbp)	# pfo_ret__, __wait.private
		leaq	24(%rbx), %rax	#, tmp61
		movq	%rax, -40(%rbp)	# tmp61, __wait.task_list.next
		movq	%rax, -32(%rbp)	# tmp61, __wait.task_list.prev
		.p2align 4,,10
		.p2align 3
	.L4:
		movl	$2, %edx	#,
		movq	%rbx, %rsi	# tmp66,
		movq	$WQ, %rdi	#,
		call	prepare_to_wait	#
		movl	COND(%rip), %eax	# COND,
		testl	%eax, %eax	#
		jne	.L3	#,
		call	schedule	#
		jmp	.L4	#
		.p2align 4,,10
		.p2align 3
	.L3:
		movq	%rbx, %rsi	# tmp66,
		movq	$WQ, %rdi	#,
		call	finish_wait	#
	.L5:
		addq	$56, %rsp	#,
		popq	%rbx	#
		leave
		ret

after:
	we:
		pushq	%rbp	#
		movq	%rsp, %rbp	#,
		pushq	%rbx	#
		subq	$56, %rsp	#,
		call	mcount
		movl	COND(%rip), %edx	# COND,
		testl	%edx, %edx	#
		jne	.L5	#,
		leaq	-64(%rbp), %rbx	#, tmp66
		movq	$0, -64(%rbp)	#, __wait
		movq	$autoremove_wake_function, -48(%rbp)	#, __wait.func
	#APP
	# 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1
		movq %gs:current_task,%rax	#, pfo_ret__
	# 0 "" 2
	#NO_APP
		movq	%rax, -56(%rbp)	# pfo_ret__, __wait.private
		leaq	24(%rbx), %rax	#, tmp61
		movq	%rax, -40(%rbp)	# tmp61, __wait.task_list.next
		movq	%rax, -32(%rbp)	# tmp61, __wait.task_list.prev
		.p2align 4,,10
		.p2align 3
	.L4:
		movl	$2, %edx	#,
		movq	%rbx, %rsi	# tmp66,
		movq	$WQ, %rdi	#,
		call	prepare_to_wait	#
		movl	COND(%rip), %eax	# COND,
		testl	%eax, %eax	#
		je	.L3	#,
		movq	%rbx, %rsi	# tmp66,
		movq	$WQ, %rdi	#,
		call	finish_wait	#
		jmp	.L5	#
		.p2align 4,,10
		.p2align 3
	.L3:
		call	schedule	#
		.p2align 4,,8
		.p2align 3
		jmp	.L4	#
		.p2align 4,,10
		.p2align 3
	.L5:
		addq	$56, %rsp	#,
		popq	%rbx	#
		leave
		.p2align 4,,4
		.p2align 3
		ret

As you can see, with this patch gcc generates a bit more code. But
only because reorderes finish_wait() and inserts the additional nops,
otherwise code the same. I think this difference is not "reliable"
and can be ignored.

But, the code like "return wait_even_timeout(true, non_const_timeout)"
really generates more code and this is correct: the patch tries to fix
the bug (at least I think this is a bug) and the additional code ensures
that __ret != 0.

>  Reviewed-by: Tejun Heo <tj@kernel.org>

Thanks! I'll send this patch soon.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-07 23:12   ` Andrew Morton
@ 2013-05-08  9:49     ` Imre Deak
  0 siblings, 0 replies; 23+ messages in thread
From: Imre Deak @ 2013-05-08  9:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Vetter, Paul E. McKenney, David Howells, Dave Jones,
	Jens Axboe, Lukas Czerner, Linux Kernel Mailing List

On Tue, 2013-05-07 at 16:12 -0700, Andrew Morton wrote:
> On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote:
> > > Many callers of the wait_event_timeout() and
> > > wait_event_interruptible_timeout() expect that the return value will be
> > > positive if the specified condition becomes true before the timeout
> > > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > > handler is delayed enough, the time remaining until timeout will be
> > > calculated as 0 - and passed back as a return value - even if the
> > > condition became true before the timeout has passed.
> > >
> > > Fix this by returning at least 1 if the condition becomes true. This
> > > semantic is in line with what wait_for_condition_timeout() does; see
> > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > failure under heavy load".
> > >
> > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > 
> > We have 3 instances of this bug in drm/i915. One case even where we
> > switch between the interruptible and not interruptible
> > wait_event_timeout variants, foolishly presuming they have the same
> > semantics. I very much like this.
> 
> Let's think about scheduling this fix.
> 
> Are any of the bugs which we expect this patch fixes serious enough to
> warrant merging it into 3.10?  And -stable?

There is at least [1], but I'm sure there is more similar reports about
i915. I'd vote for -stable at least.

--Imre

[1] https://bugs.freedesktop.org/show_bug.cgi?id=64133



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02  9:36 ` Daniel Vetter
@ 2013-05-07 23:12   ` Andrew Morton
  2013-05-08  9:49     ` Imre Deak
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2013-05-07 23:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Imre Deak, Paul E. McKenney, David Howells, Dave Jones,
	Jens Axboe, Lukas Czerner, Linux Kernel Mailing List

On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote:
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> >
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
> >
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> 
> We have 3 instances of this bug in drm/i915. One case even where we
> switch between the interruptible and not interruptible
> wait_event_timeout variants, foolishly presuming they have the same
> semantics. I very much like this.

Let's think about scheduling this fix.

Are any of the bugs which we expect this patch fixes serious enough to
warrant merging it into 3.10?  And -stable?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:35   ` Jens Axboe
@ 2013-05-02 19:56     ` Imre Deak
  0 siblings, 0 replies; 23+ messages in thread
From: Imre Deak @ 2013-05-02 19:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: David Howells, Daniel Vetter, Paul E. McKenney, Dave Jones,
	Lukas Czerner, linux-kernel

On Thu, 2013-05-02 at 14:35 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, David Howells wrote:
> > Imre Deak <imre.deak@intel.com> wrote:
> > 
> > > Many callers of the wait_event_timeout() and
> > > wait_event_interruptible_timeout() expect that the return value will be
> > > positive if the specified condition becomes true before the timeout
> > > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > > handler is delayed enough, the time remaining until timeout will be
> > > calculated as 0 - and passed back as a return value - even if the
> > > condition became true before the timeout has passed.
> > > 
> > > Fix this by returning at least 1 if the condition becomes true. This
> > > semantic is in line with what wait_for_condition_timeout() does; see
> > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > failure under heavy load".
> > > 
> > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > 
> > Acked-by: David Howells <dhowells@redhat.com>
> 
> You can add mine, too:
> 
> Acked-by: Jens Axboe <axboe@kernel.dk>

Ok, I think we agree that the +1 thing in schedule_timeout() discussed
in this thread should be handled separately, so if there is no other
objection I'd be happy if this patch was merged through someone's tree
as-is.

--Imre



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 13:56           ` Imre Deak
@ 2013-05-02 14:04             ` Daniel Vetter
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2013-05-02 14:04 UTC (permalink / raw)
  To: Imre Deak
  Cc: Jens Axboe, David Howells, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, May 2, 2013 at 3:56 PM, Imre Deak <imre.deak@intel.com> wrote:
> On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote:
>> On Thu, May 02 2013, Imre Deak wrote:
>> > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
>> > > On Thu, May 02 2013, Daniel Vetter wrote:
>> > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
>> > > > >> Fix this by returning at least 1 if the condition becomes true. This
>> > > > >> semantic is in line with what wait_for_condition_timeout() does; see
>> > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
>> > > > >> failure under heavy load".
>> > > > >
>> > > > > But now you can't distinguish the timer expiring first, if the thread doing
>> > > > > the waiting gets delayed sufficiently long for the event to happen.
>> > > >
>> > > > That can already happen, e.g.
>> > > >
>> > > > 1. wakeup happens and condition is true.
>> > > > 2. we compute remaining jiffies > 0
>> > > > -> preempt
>> > > > 3. now wait_for_event_timeout returns.
>> > > >
>> > > > Only difference is that the delay/preempt happens in between 1. and
>> > > > 2., and then suddenly the wake up didn't happen in time (with the
>> > > > current return code semantics).
>> > > >
>> > > > So imo the current behaviour is simply a bug and will miss timely
>> > > > wakeups in some cases.
>> > > >
>> > > > The other way round, namely wait_for_event_timeout taking longer than
>> > > > the timeout is expected (and part of the interface for every timeout
>> > > > function). So all current callers already need to be able to cope with
>> > > > random preemption/delays pushing the total time before the call to
>> > > > wait_for_event and checking the return value over the timeout, even
>> > > > when condition was signalled in time.
>> > > >
>> > > > If there's any case which relies on accurate timeout detection that
>> > > > simply won't work with wait_for_event (they need an nmi or a hw
>> > > > timestamp counter or something similar).
>> > >
>> > > I seriously doubt that anyone is depending on any sort of accuracy on
>> > > the return. 1 jiffy is not going to make or break anything - in fact,
>> > > jiffies could be incremented nsecs after the initial call. So a
>> > > granularity of at least 1 is going to be expected in any case.
>> > >
>> > > The important bit here is that the API should behave as expected. And
>> > > the most logical way to code that is to check the return value. I can
>> > > easily see people forgetting to re-check the condition, hence you get a
>> > > bug. The fact that you and the original reporter already had accidents
>> > > with this is a clear sign that the logical way to use the API is not the
>> > > correct one.
>> > >
>> > > IMHO, the change definitely makes sense.
>> >
>> > Ok, so taking courage of this answer ;P How about also the following?
>> >
>> > diff --git a/kernel/timer.c b/kernel/timer.c
>> > index dbf7a78..5a62456 100644
>> > --- a/kernel/timer.c
>> > +++ b/kernel/timer.c
>> > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
>> > timeout)
>> >             }
>> >     }
>> >
>> > -   expire = timeout + jiffies;
>> > +   /*
>> > +    * We can't be sure how close we are to the next tick, so +1 to
>> > +    * guarantee that we wait at least timeout amount.
>> > +    */
>> > +   expire = timeout + jiffies + 1;
>> >
>> >     setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
>> >     __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
>> >
>> >
>> > It'd plug a similar hole for wait_event_timeout() and similar users, who
>> > don't compensate for the above..
>>
>> Any jiffy based API is going to have this issue. I think it's different
>> from the original patch, which just makes the API potentially return
>> something that is confusing.
>
> Yea, at least those that take a relative time. Usually the timeout is
> given with a big overhead, so there it won't be a problem. But I also
> found users like drivers/acpi/ec.c, that will pass a timeout of 1
> jiffies, which may then result in premature timeouts.

In drm/i915 we have a bunch of those 1 jiffie timeouts, too,
especially with low HZ configs. For easy comparison with hw docs we
try to stick to the documented timeout limits and convert them with
msec_to_jiffies (which does the right thing wrt rounding). But since
these timeouts are often in the low ms range, that often means just 1
jiffie timeouts. So the timeout could be off by a large factor (and we
have the bugzillas to prove it happens in reality).

I'm not sure whether sprinkling +1 all over the codebase is a that
nice approach. Adding the +1 in sched_timeout otoh would give us a
nice idiot proof interface for driver guys like me. We also have some
custom register wait loops using msleep(1) and jiffy based timeouts,
and there our driver macro does the +1. That helped a lot in avoiding
stupid bugs, but now we've converted a few things over to interrupts +
timeout and the bugs are back ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:54         ` Jens Axboe
@ 2013-05-02 13:56           ` Imre Deak
  2013-05-02 14:04             ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Imre Deak @ 2013-05-02 13:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, Imre Deak wrote:
> > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> > > On Thu, May 02 2013, Daniel Vetter wrote:
> > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
> > > > >> Fix this by returning at least 1 if the condition becomes true. This
> > > > >> semantic is in line with what wait_for_condition_timeout() does; see
> > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > > >> failure under heavy load".
> > > > >
> > > > > But now you can't distinguish the timer expiring first, if the thread doing
> > > > > the waiting gets delayed sufficiently long for the event to happen.
> > > > 
> > > > That can already happen, e.g.
> > > > 
> > > > 1. wakeup happens and condition is true.
> > > > 2. we compute remaining jiffies > 0
> > > > -> preempt
> > > > 3. now wait_for_event_timeout returns.
> > > > 
> > > > Only difference is that the delay/preempt happens in between 1. and
> > > > 2., and then suddenly the wake up didn't happen in time (with the
> > > > current return code semantics).
> > > > 
> > > > So imo the current behaviour is simply a bug and will miss timely
> > > > wakeups in some cases.
> > > > 
> > > > The other way round, namely wait_for_event_timeout taking longer than
> > > > the timeout is expected (and part of the interface for every timeout
> > > > function). So all current callers already need to be able to cope with
> > > > random preemption/delays pushing the total time before the call to
> > > > wait_for_event and checking the return value over the timeout, even
> > > > when condition was signalled in time.
> > > > 
> > > > If there's any case which relies on accurate timeout detection that
> > > > simply won't work with wait_for_event (they need an nmi or a hw
> > > > timestamp counter or something similar).
> > > 
> > > I seriously doubt that anyone is depending on any sort of accuracy on
> > > the return. 1 jiffy is not going to make or break anything - in fact,
> > > jiffies could be incremented nsecs after the initial call. So a
> > > granularity of at least 1 is going to be expected in any case.
> > > 
> > > The important bit here is that the API should behave as expected. And
> > > the most logical way to code that is to check the return value. I can
> > > easily see people forgetting to re-check the condition, hence you get a
> > > bug. The fact that you and the original reporter already had accidents
> > > with this is a clear sign that the logical way to use the API is not the
> > > correct one.
> > > 
> > > IMHO, the change definitely makes sense.
> > 
> > Ok, so taking courage of this answer ;P How about also the following?
> > 
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index dbf7a78..5a62456 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
> > timeout)
> >  		}
> >  	}
> >  
> > -	expire = timeout + jiffies;
> > +	/*
> > +	 * We can't be sure how close we are to the next tick, so +1 to
> > +	 * guarantee that we wait at least timeout amount.
> > +	 */
> > +	expire = timeout + jiffies + 1;
> >  
> >  	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> >  	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
> > 
> > 
> > It'd plug a similar hole for wait_event_timeout() and similar users, who
> > don't compensate for the above..
> 
> Any jiffy based API is going to have this issue. I think it's different
> from the original patch, which just makes the API potentially return
> something that is confusing.

Yea, at least those that take a relative time. Usually the timeout is
given with a big overhead, so there it won't be a problem. But I also
found users like drivers/acpi/ec.c, that will pass a timeout of 1
jiffies, which may then result in premature timeouts.

> So not sure on the above, sorry.

Ok. A WARN to wait_event_timeout for the above case could still be a
useful guard..

--Imre


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:34       ` Imre Deak
@ 2013-05-02 12:54         ` Jens Axboe
  2013-05-02 13:56           ` Imre Deak
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2013-05-02 12:54 UTC (permalink / raw)
  To: Imre Deak
  Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, May 02 2013, Imre Deak wrote:
> On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> > On Thu, May 02 2013, Daniel Vetter wrote:
> > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
> > > >> Fix this by returning at least 1 if the condition becomes true. This
> > > >> semantic is in line with what wait_for_condition_timeout() does; see
> > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > >> failure under heavy load".
> > > >
> > > > But now you can't distinguish the timer expiring first, if the thread doing
> > > > the waiting gets delayed sufficiently long for the event to happen.
> > > 
> > > That can already happen, e.g.
> > > 
> > > 1. wakeup happens and condition is true.
> > > 2. we compute remaining jiffies > 0
> > > -> preempt
> > > 3. now wait_for_event_timeout returns.
> > > 
> > > Only difference is that the delay/preempt happens in between 1. and
> > > 2., and then suddenly the wake up didn't happen in time (with the
> > > current return code semantics).
> > > 
> > > So imo the current behaviour is simply a bug and will miss timely
> > > wakeups in some cases.
> > > 
> > > The other way round, namely wait_for_event_timeout taking longer than
> > > the timeout is expected (and part of the interface for every timeout
> > > function). So all current callers already need to be able to cope with
> > > random preemption/delays pushing the total time before the call to
> > > wait_for_event and checking the return value over the timeout, even
> > > when condition was signalled in time.
> > > 
> > > If there's any case which relies on accurate timeout detection that
> > > simply won't work with wait_for_event (they need an nmi or a hw
> > > timestamp counter or something similar).
> > 
> > I seriously doubt that anyone is depending on any sort of accuracy on
> > the return. 1 jiffy is not going to make or break anything - in fact,
> > jiffies could be incremented nsecs after the initial call. So a
> > granularity of at least 1 is going to be expected in any case.
> > 
> > The important bit here is that the API should behave as expected. And
> > the most logical way to code that is to check the return value. I can
> > easily see people forgetting to re-check the condition, hence you get a
> > bug. The fact that you and the original reporter already had accidents
> > with this is a clear sign that the logical way to use the API is not the
> > correct one.
> > 
> > IMHO, the change definitely makes sense.
> 
> Ok, so taking courage of this answer ;P How about also the following?
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index dbf7a78..5a62456 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
> timeout)
>  		}
>  	}
>  
> -	expire = timeout + jiffies;
> +	/*
> +	 * We can't be sure how close we are to the next tick, so +1 to
> +	 * guarantee that we wait at least timeout amount.
> +	 */
> +	expire = timeout + jiffies + 1;
>  
>  	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
>  	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
> 
> 
> It'd plug a similar hole for wait_event_timeout() and similar users, who
> don't compensate for the above..

Any jiffy based API is going to have this issue. I think it's different
from the original patch, which just makes the API potentially return
something that is confusing.

So not sure on the above, sorry.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:29 ` David Howells
@ 2013-05-02 12:35   ` Jens Axboe
  2013-05-02 19:56     ` Imre Deak
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2013-05-02 12:35 UTC (permalink / raw)
  To: David Howells
  Cc: Imre Deak, Daniel Vetter, Paul E. McKenney, Dave Jones,
	Lukas Czerner, linux-kernel

On Thu, May 02 2013, David Howells wrote:
> Imre Deak <imre.deak@intel.com> wrote:
> 
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> > 
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
> > 
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> 
> Acked-by: David Howells <dhowells@redhat.com>

You can add mine, too:

Acked-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:23     ` Jens Axboe
@ 2013-05-02 12:34       ` Imre Deak
  2013-05-02 12:54         ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Imre Deak @ 2013-05-02 12:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, Daniel Vetter wrote:
> > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
> > >> Fix this by returning at least 1 if the condition becomes true. This
> > >> semantic is in line with what wait_for_condition_timeout() does; see
> > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > >> failure under heavy load".
> > >
> > > But now you can't distinguish the timer expiring first, if the thread doing
> > > the waiting gets delayed sufficiently long for the event to happen.
> > 
> > That can already happen, e.g.
> > 
> > 1. wakeup happens and condition is true.
> > 2. we compute remaining jiffies > 0
> > -> preempt
> > 3. now wait_for_event_timeout returns.
> > 
> > Only difference is that the delay/preempt happens in between 1. and
> > 2., and then suddenly the wake up didn't happen in time (with the
> > current return code semantics).
> > 
> > So imo the current behaviour is simply a bug and will miss timely
> > wakeups in some cases.
> > 
> > The other way round, namely wait_for_event_timeout taking longer than
> > the timeout is expected (and part of the interface for every timeout
> > function). So all current callers already need to be able to cope with
> > random preemption/delays pushing the total time before the call to
> > wait_for_event and checking the return value over the timeout, even
> > when condition was signalled in time.
> > 
> > If there's any case which relies on accurate timeout detection that
> > simply won't work with wait_for_event (they need an nmi or a hw
> > timestamp counter or something similar).
> 
> I seriously doubt that anyone is depending on any sort of accuracy on
> the return. 1 jiffy is not going to make or break anything - in fact,
> jiffies could be incremented nsecs after the initial call. So a
> granularity of at least 1 is going to be expected in any case.
> 
> The important bit here is that the API should behave as expected. And
> the most logical way to code that is to check the return value. I can
> easily see people forgetting to re-check the condition, hence you get a
> bug. The fact that you and the original reporter already had accidents
> with this is a clear sign that the logical way to use the API is not the
> correct one.
> 
> IMHO, the change definitely makes sense.

Ok, so taking courage of this answer ;P How about also the following?

diff --git a/kernel/timer.c b/kernel/timer.c
index dbf7a78..5a62456 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
timeout)
 		}
 	}
 
-	expire = timeout + jiffies;
+	/*
+	 * We can't be sure how close we are to the next tick, so +1 to
+	 * guarantee that we wait at least timeout amount.
+	 */
+	expire = timeout + jiffies + 1;
 
 	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
 	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);


It'd plug a similar hole for wait_event_timeout() and similar users, who
don't compensate for the above..

--Imre



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02  8:58 Imre Deak
  2013-05-02  9:36 ` Daniel Vetter
  2013-05-02 10:29 ` David Howells
@ 2013-05-02 12:29 ` David Howells
  2013-05-02 12:35   ` Jens Axboe
  2 siblings, 1 reply; 23+ messages in thread
From: David Howells @ 2013-05-02 12:29 UTC (permalink / raw)
  To: Imre Deak
  Cc: dhowells, Daniel Vetter, Paul E. McKenney, Dave Jones,
	Jens Axboe, Lukas Czerner, linux-kernel

Imre Deak <imre.deak@intel.com> wrote:

> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
> 
> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".
> 
> Signed-off-by: Imre Deak <imre.deak@intel.com>

Acked-by: David Howells <dhowells@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:13   ` Daniel Vetter
  2013-05-02 12:23     ` Jens Axboe
@ 2013-05-02 12:29     ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: David Howells @ 2013-05-02 12:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: dhowells, Daniel Vetter, Imre Deak, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

Jens Axboe <axboe@kernel.dk> wrote:

> IMHO, the change definitely makes sense.

Yeah...  I think I agree.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 12:13   ` Daniel Vetter
@ 2013-05-02 12:23     ` Jens Axboe
  2013-05-02 12:34       ` Imre Deak
  2013-05-02 12:29     ` David Howells
  1 sibling, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2013-05-02 12:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: David Howells, Imre Deak, Paul E. McKenney, Dave Jones,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, May 02 2013, Daniel Vetter wrote:
> On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
> >> Fix this by returning at least 1 if the condition becomes true. This
> >> semantic is in line with what wait_for_condition_timeout() does; see
> >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> >> failure under heavy load".
> >
> > But now you can't distinguish the timer expiring first, if the thread doing
> > the waiting gets delayed sufficiently long for the event to happen.
> 
> That can already happen, e.g.
> 
> 1. wakeup happens and condition is true.
> 2. we compute remaining jiffies > 0
> -> preempt
> 3. now wait_for_event_timeout returns.
> 
> Only difference is that the delay/preempt happens in between 1. and
> 2., and then suddenly the wake up didn't happen in time (with the
> current return code semantics).
> 
> So imo the current behaviour is simply a bug and will miss timely
> wakeups in some cases.
> 
> The other way round, namely wait_for_event_timeout taking longer than
> the timeout is expected (and part of the interface for every timeout
> function). So all current callers already need to be able to cope with
> random preemption/delays pushing the total time before the call to
> wait_for_event and checking the return value over the timeout, even
> when condition was signalled in time.
> 
> If there's any case which relies on accurate timeout detection that
> simply won't work with wait_for_event (they need an nmi or a hw
> timestamp counter or something similar).

I seriously doubt that anyone is depending on any sort of accuracy on
the return. 1 jiffy is not going to make or break anything - in fact,
jiffies could be incremented nsecs after the initial call. So a
granularity of at least 1 is going to be expected in any case.

The important bit here is that the API should behave as expected. And
the most logical way to code that is to check the return value. I can
easily see people forgetting to re-check the condition, hence you get a
bug. The fact that you and the original reporter already had accidents
with this is a clear sign that the logical way to use the API is not the
correct one.

IMHO, the change definitely makes sense.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 10:29 ` David Howells
  2013-05-02 12:02   ` Imre Deak
@ 2013-05-02 12:13   ` Daniel Vetter
  2013-05-02 12:23     ` Jens Axboe
  2013-05-02 12:29     ` David Howells
  1 sibling, 2 replies; 23+ messages in thread
From: Daniel Vetter @ 2013-05-02 12:13 UTC (permalink / raw)
  To: David Howells
  Cc: Imre Deak, Paul E. McKenney, Dave Jones, Jens Axboe,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote:
>> Fix this by returning at least 1 if the condition becomes true. This
>> semantic is in line with what wait_for_condition_timeout() does; see
>> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
>> failure under heavy load".
>
> But now you can't distinguish the timer expiring first, if the thread doing
> the waiting gets delayed sufficiently long for the event to happen.

That can already happen, e.g.

1. wakeup happens and condition is true.
2. we compute remaining jiffies > 0
-> preempt
3. now wait_for_event_timeout returns.

Only difference is that the delay/preempt happens in between 1. and
2., and then suddenly the wake up didn't happen in time (with the
current return code semantics).

So imo the current behaviour is simply a bug and will miss timely
wakeups in some cases.

The other way round, namely wait_for_event_timeout taking longer than
the timeout is expected (and part of the interface for every timeout
function). So all current callers already need to be able to cope with
random preemption/delays pushing the total time before the call to
wait_for_event and checking the return value over the timeout, even
when condition was signalled in time.

If there's any case which relies on accurate timeout detection that
simply won't work with wait_for_event (they need an nmi or a hw
timestamp counter or something similar).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02 10:29 ` David Howells
@ 2013-05-02 12:02   ` Imre Deak
  2013-05-02 12:13   ` Daniel Vetter
  1 sibling, 0 replies; 23+ messages in thread
From: Imre Deak @ 2013-05-02 12:02 UTC (permalink / raw)
  To: David Howells
  Cc: Daniel Vetter, Paul E. McKenney, Dave Jones, Jens Axboe,
	Lukas Czerner, linux-kernel

On Thu, 2013-05-02 at 11:29 +0100, David Howells wrote:
> Imre Deak <imre.deak@intel.com> wrote:
> 
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> 
> Fun.
> 
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
> 
> But now you can't distinguish the timer expiring first, if the thread doing
> the waiting gets delayed sufficiently long for the event to happen.

I'm trying to understand what sequence do you mean. I can think of the
following - as an example - in case of starting a transaction that will
set a completion flag:

waiter                                               completion handler
start transaction
                                                     set completion_flag
ret = wait_event_timeout(timeout, completion_flag)

In this case ret will be timeout which is the original behavior, so
should be ok. One exception is if timeout=0 to begin with, since then -
after this change - ret will be 1. But I can't see how that use case is
useful. I guess I'm missing something, could you elaborate?

--Imre

> I'm not sure there's a good answer - except maybe making the timer expiry
> handler check the condition (which would likely get really yucky really
> quickly).
> 
> David



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02  8:58 Imre Deak
  2013-05-02  9:36 ` Daniel Vetter
@ 2013-05-02 10:29 ` David Howells
  2013-05-02 12:02   ` Imre Deak
  2013-05-02 12:13   ` Daniel Vetter
  2013-05-02 12:29 ` David Howells
  2 siblings, 2 replies; 23+ messages in thread
From: David Howells @ 2013-05-02 10:29 UTC (permalink / raw)
  To: Imre Deak
  Cc: dhowells, Daniel Vetter, Paul E. McKenney, Dave Jones,
	Jens Axboe, Lukas Czerner, linux-kernel

Imre Deak <imre.deak@intel.com> wrote:

> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.

Fun.

> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".

But now you can't distinguish the timer expiring first, if the thread doing
the waiting gets delayed sufficiently long for the event to happen.

I'm not sure there's a good answer - except maybe making the timer expiry
handler check the condition (which would likely get really yucky really
quickly).

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
  2013-05-02  8:58 Imre Deak
@ 2013-05-02  9:36 ` Daniel Vetter
  2013-05-07 23:12   ` Andrew Morton
  2013-05-02 10:29 ` David Howells
  2013-05-02 12:29 ` David Howells
  2 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2013-05-02  9:36 UTC (permalink / raw)
  To: Imre Deak
  Cc: Paul E. McKenney, David Howells, Dave Jones, Jens Axboe,
	Lukas Czerner, Linux Kernel Mailing List

On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote:
> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
>
> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".
>
> Signed-off-by: Imre Deak <imre.deak@intel.com>

We have 3 instances of this bug in drm/i915. One case even where we
switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics. I very much like this.

Acked-by:  Daniel Vetter <daniel.vetter@ffwll.ch>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] wait: fix false timeouts when using wait_event_timeout()
@ 2013-05-02  8:58 Imre Deak
  2013-05-02  9:36 ` Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Imre Deak @ 2013-05-02  8:58 UTC (permalink / raw)
  To: Daniel Vetter, Paul E. McKenney, David Howells, Dave Jones,
	Jens Axboe, Lukas Czerner, linux-kernel
  Cc: Imre Deak

Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true. This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
failure under heavy load".

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 include/linux/wait.h |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7cb64d4..5336842 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -217,6 +217,8 @@ do {									\
 		if (!ret)						\
 			break;						\
 	}								\
+	if (!ret && (condition))					\
+		ret = 1;						\
 	finish_wait(&wq, &__wait);					\
 } while (0)
 
@@ -233,8 +235,9 @@ do {									\
  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.
  *
- * The function returns 0 if the @timeout elapsed, and the remaining
- * jiffies if the condition evaluated to true before the timeout elapsed.
+ * The function returns 0 if the @timeout elapsed, or the remaining
+ * jiffies (at least 1) if the @condition evaluated to %true before
+ * the @timeout elapsed.
  */
 #define wait_event_timeout(wq, condition, timeout)			\
 ({									\
@@ -302,6 +305,8 @@ do {									\
 		ret = -ERESTARTSYS;					\
 		break;							\
 	}								\
+	if (!ret && (condition))					\
+		ret = 1;						\
 	finish_wait(&wq, &__wait);					\
 } while (0)
 
@@ -318,9 +323,10 @@ do {									\
  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.
  *
- * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it
- * was interrupted by a signal, and the remaining jiffies otherwise
- * if the condition evaluated to true before the timeout elapsed.
+ * Returns:
+ * 0 if the @timeout elapsed, -%ERESTARTSYS if it was interrupted by
+ * a signal, or the remaining jiffies (at least 1) if the @condition
+ * evaluated to %true before the @timeout elapsed.
  */
 #define wait_event_interruptible_timeout(wq, condition, timeout)	\
 ({									\
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-06-06 18:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-04 19:28 [PATCH] wait: fix false timeouts when using wait_event_timeout() Oleg Nesterov
2013-06-04 21:35 ` Imre Deak
2013-06-04 21:40   ` Imre Deak
2013-06-05 16:37     ` Oleg Nesterov
2013-06-05 19:07       ` Oleg Nesterov
2013-06-06  1:45         ` Tejun Heo
2013-06-06 18:47           ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2013-05-02  8:58 Imre Deak
2013-05-02  9:36 ` Daniel Vetter
2013-05-07 23:12   ` Andrew Morton
2013-05-08  9:49     ` Imre Deak
2013-05-02 10:29 ` David Howells
2013-05-02 12:02   ` Imre Deak
2013-05-02 12:13   ` Daniel Vetter
2013-05-02 12:23     ` Jens Axboe
2013-05-02 12:34       ` Imre Deak
2013-05-02 12:54         ` Jens Axboe
2013-05-02 13:56           ` Imre Deak
2013-05-02 14:04             ` Daniel Vetter
2013-05-02 12:29     ` David Howells
2013-05-02 12:29 ` David Howells
2013-05-02 12:35   ` Jens Axboe
2013-05-02 19:56     ` Imre Deak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).