All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
@ 2014-08-04 10:30 Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Peter Zijlstra

Hi,

Ilya recently tripped over a nested sleep which made Ingo suggest we should
have debug checks for that. So I did some, see patch 7. Of course that
triggered a whole bunch of fail the instant I tried to boot my machine.

With this series I can boot my test box and build a kernel on it, I'm fairly
sure that's far too limited a test to have found all, but its a start.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-04 13:44   ` Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 2/7] wait: Provide Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Eric Paris,
	John McCutchan, Robert Love, Greg Kroah-Hartman, Jiri Slaby,
	Peter Zijlstra

[-- Attachment #1: peterz-wait-nested.patch --]
[-- Type: text/plain, Size: 2745 bytes --]

There are a few places that call blocking primitives from wait loops,
provide infrastructure to support this without the typical
task_struct::state collision.

We record the wakeup in wait_queue_t::flags which leaves
task_struct::state free to be used by others.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/wait.h |    7 ++++++-
 kernel/sched/wait.c  |   29 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -13,9 +13,12 @@ typedef struct __wait_queue wait_queue_t
 typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key);
 int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);
 
+/* __wait_queue::flags */
+#define WQ_FLAG_EXCLUSIVE	0x01
+#define WQ_FLAG_WOKEN		0x02
+
 struct __wait_queue {
 	unsigned int		flags;
-#define WQ_FLAG_EXCLUSIVE	0x01
 	void			*private;
 	wait_queue_func_t	func;
 	struct list_head	task_list;
@@ -825,6 +828,8 @@ void prepare_to_wait_exclusive(wait_queu
 long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
 void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key);
+long wait_woken(wait_queue_t *wait, unsigned mode, long timeout);
+int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -297,6 +297,35 @@ int autoremove_wake_function(wait_queue_
 }
 EXPORT_SYMBOL(autoremove_wake_function);
 
+long wait_woken(wait_queue_t *wait, unsigned mode, long timeout)
+{
+	set_current_state(mode);
+	if (!(wait->flags & WQ_FLAG_WOKEN))
+		timeout = schedule_timeout(timeout);
+	else
+		wait->flags &= ~WQ_FLAG_WOKEN;
+	__set_current_state(TASK_RUNNING);
+
+	return timeout;
+}
+EXPORT_SYMBOL(wait_woken);
+
+int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+	/*
+	 * Although this function is called under waitqueue lock, LOCK
+	 * doesn't imply write barrier and the users expect write
+	 * barrier semantics on wakeup functions.  The following
+	 * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
+	 * and is paired with set_mb() in wait_woken().
+	 */
+	smp_wmb();
+	wait->flags |= WQ_FLAG_WOKEN;
+
+	return default_wake_function(wait, mode, sync, key);
+}
+EXPORT_SYMBOL(woken_wake_function);
+
 int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *arg)
 {
 	struct wait_bit_key *key = arg;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 2/7] wait: Provide
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 3/7] exit: Desl with nested sleeps Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-wait-might_sleep.patch --]
[-- Type: text/plain, Size: 4661 bytes --]

Add more might_sleep() checks, suppose someone put a wait_event() like
thing in a wait loop..

Can't put might_sleep() in ___wait_event() because there's the locked
primitives which call ___wait_event() with locks held.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/wait.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -272,6 +272,7 @@ __out:	__ret;								\
  */
 #define wait_event(wq, condition)					\
 do {									\
+	might_sleep();							\
 	if (condition)							\
 		break;							\
 	__wait_event(wq, condition);					\
@@ -302,6 +303,7 @@ do {									\
 #define wait_event_timeout(wq, condition, timeout)			\
 ({									\
 	long __ret = timeout;						\
+	might_sleep();							\
 	if (!___wait_cond_timeout(condition))				\
 		__ret = __wait_event_timeout(wq, condition, timeout);	\
 	__ret;								\
@@ -327,6 +329,7 @@ do {									\
  */
 #define wait_event_cmd(wq, condition, cmd1, cmd2)			\
 do {									\
+	might_sleep();							\
 	if (condition)							\
 		break;							\
 	__wait_event_cmd(wq, condition, cmd1, cmd2);			\
@@ -354,6 +357,7 @@ do {									\
 #define wait_event_interruptible(wq, condition)				\
 ({									\
 	int __ret = 0;							\
+	might_sleep();							\
 	if (!(condition))						\
 		__ret = __wait_event_interruptible(wq, condition);	\
 	__ret;								\
@@ -385,6 +389,7 @@ do {									\
 #define wait_event_interruptible_timeout(wq, condition, timeout)	\
 ({									\
 	long __ret = timeout;						\
+	might_sleep();							\
 	if (!___wait_cond_timeout(condition))				\
 		__ret = __wait_event_interruptible_timeout(wq,		\
 						condition, timeout);	\
@@ -435,6 +440,7 @@ do {									\
 #define wait_event_hrtimeout(wq, condition, timeout)			\
 ({									\
 	int __ret = 0;							\
+	might_sleep();							\
 	if (!(condition))						\
 		__ret = __wait_event_hrtimeout(wq, condition, timeout,	\
 					       TASK_UNINTERRUPTIBLE);	\
@@ -460,6 +466,7 @@ do {									\
 #define wait_event_interruptible_hrtimeout(wq, condition, timeout)	\
 ({									\
 	long __ret = 0;							\
+	might_sleep();							\
 	if (!(condition))						\
 		__ret = __wait_event_hrtimeout(wq, condition, timeout,	\
 					       TASK_INTERRUPTIBLE);	\
@@ -473,6 +480,7 @@ do {									\
 #define wait_event_interruptible_exclusive(wq, condition)		\
 ({									\
 	int __ret = 0;							\
+	might_sleep();							\
 	if (!(condition))						\
 		__ret = __wait_event_interruptible_exclusive(wq, condition);\
 	__ret;								\
@@ -647,6 +655,7 @@ do {									\
 #define wait_event_killable(wq, condition)				\
 ({									\
 	int __ret = 0;							\
+	might_sleep();							\
 	if (!(condition))						\
 		__ret = __wait_event_killable(wq, condition);		\
 	__ret;								\
@@ -895,6 +904,7 @@ extern int bit_wait_io(struct wait_bit_k
 static inline int
 wait_on_bit(void *word, int bit, unsigned mode)
 {
+	might_sleep();
 	if (!test_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit(word, bit,
@@ -919,6 +929,7 @@ wait_on_bit(void *word, int bit, unsigne
 static inline int
 wait_on_bit_io(void *word, int bit, unsigned mode)
 {
+	might_sleep();
 	if (!test_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit(word, bit,
@@ -945,6 +956,7 @@ wait_on_bit_io(void *word, int bit, unsi
 static inline int
 wait_on_bit_action(void *word, int bit, wait_bit_action_f *action, unsigned mode)
 {
+	might_sleep();
 	if (!test_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit(word, bit, action, mode);
@@ -972,6 +984,7 @@ wait_on_bit_action(void *word, int bit,
 static inline int
 wait_on_bit_lock(void *word, int bit, unsigned mode)
 {
+	might_sleep();
 	if (!test_and_set_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit_lock(word, bit, bit_wait, mode);
@@ -995,6 +1008,7 @@ wait_on_bit_lock(void *word, int bit, un
 static inline int
 wait_on_bit_lock_io(void *word, int bit, unsigned mode)
 {
+	might_sleep();
 	if (!test_and_set_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit_lock(word, bit, bit_wait_io, mode);
@@ -1020,6 +1034,7 @@ wait_on_bit_lock_io(void *word, int bit,
 static inline int
 wait_on_bit_lock_action(void *word, int bit, wait_bit_action_f *action, unsigned mode)
 {
+	might_sleep();
 	if (!test_and_set_bit(bit, word))
 		return 0;
 	return out_of_line_wait_on_bit_lock(word, bit, action, mode);
@@ -1038,6 +1053,7 @@ wait_on_bit_lock_action(void *word, int
 static inline
 int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode)
 {
+	might_sleep();
 	if (atomic_read(val) == 0)
 		return 0;
 	return out_of_line_wait_on_atomic_t(val, action, mode);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 3/7] exit: Desl with nested sleeps
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 2/7] wait: Provide Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-04 18:53   ` Oleg Nesterov
  2014-08-04 10:30 ` [RFC][PATCH 4/7] inotify: Deal " Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-might_sleep-exit.patch --]
[-- Type: text/plain, Size: 2088 bytes --]

do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end
up calling potential sleeps before we reset it.

Not strictly a bug in the current form, but clean it up to enable
debugging infrastructure and avoid it becoming a bug.

 WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90()
 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270

 Call Trace:
  [<ffffffff81694991>] dump_stack+0x4e/0x7a
  [<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0
  [<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50
  [<ffffffff810bca6e>] __might_sleep+0x7e/0x90
  [<ffffffff811a1c15>] might_fault+0x55/0xb0
  [<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10
  [<ffffffff8109a804>] do_wait+0x104/0x270
  [<ffffffff8109b837>] SyS_wait4+0x77/0x100
  [<ffffffff8169d692>] system_call_fastpath+0x16/0x1b


Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/exit.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -991,6 +991,8 @@ static int wait_task_zombie(struct wait_
 
 		get_task_struct(p);
 		read_unlock(&tasklist_lock);
+		__set_current_state(TASK_RUNNING);
+
 		if ((exit_code & 0x7f) == 0) {
 			why = CLD_EXITED;
 			status = exit_code >> 8;
@@ -1071,6 +1073,7 @@ static int wait_task_zombie(struct wait_
 	 * thread can reap it because we its state == DEAD/TRACE.
 	 */
 	read_unlock(&tasklist_lock);
+	__set_current_state(TASK_RUNNING);
 
 	retval = wo->wo_rusage
 		? getrusage(p, RUSAGE_BOTH, wo->wo_rusage) : 0;
@@ -1202,6 +1205,7 @@ static int wait_task_stopped(struct wait
 	pid = task_pid_vnr(p);
 	why = ptrace ? CLD_TRAPPED : CLD_STOPPED;
 	read_unlock(&tasklist_lock);
+	__set_current_state(TASK_RUNNING);
 
 	if (unlikely(wo->wo_flags & WNOWAIT))
 		return wait_noreap_copyout(wo, p, pid, uid, why, exit_code);
@@ -1264,6 +1268,7 @@ static int wait_task_continued(struct wa
 	pid = task_pid_vnr(p);
 	get_task_struct(p);
 	read_unlock(&tasklist_lock);
+	__set_current_state(TASK_RUNNING);
 
 	if (!wo->wo_info) {
 		retval = wo->wo_rusage



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 4/7] inotify: Deal with nested sleeps
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
                   ` (2 preceding siblings ...)
  2014-08-04 10:30 ` [RFC][PATCH 3/7] exit: Desl with nested sleeps Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-04 19:23   ` Oleg Nesterov
  2014-08-05  2:22   ` Lai Jiangshan
  2014-08-04 10:30 ` [RFC][PATCH 5/7] tty: " Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Eric Paris,
	John McCutchan, Robert Love, Peter Zijlstra

[-- Attachment #1: peterz-might_sleep-inotify.patch --]
[-- Type: text/plain, Size: 1714 bytes --]

inotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Cc: Eric Paris <eparis@parisplace.org>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 fs/notify/inotify/inotify_user.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -227,14 +227,13 @@ static ssize_t inotify_read(struct file
 	struct fsnotify_event *kevent;
 	char __user *start;
 	int ret;
-	DEFINE_WAIT(wait);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
 	start = buf;
 	group = file->private_data;
 
+	add_wait_queue(&group->notification_waitq, &wait);
 	while (1) {
-		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
-
 		mutex_lock(&group->notification_mutex);
 		kevent = get_one_event(group, count);
 		mutex_unlock(&group->notification_mutex);
@@ -264,10 +263,10 @@ static ssize_t inotify_read(struct file
 		if (start != buf)
 			break;
 
-		schedule();
+		wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 	}
+	remove_wait_queue(&group->notification_waitq, &wait);
 
-	finish_wait(&group->notification_waitq, &wait);
 	if (start != buf && ret != -EFAULT)
 		ret = buf - start;
 	return ret;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 5/7] tty: Deal with nested sleeps
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
                   ` (3 preceding siblings ...)
  2014-08-04 10:30 ` [RFC][PATCH 4/7] inotify: Deal " Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-05 23:29   ` Greg Kroah-Hartman
  2014-08-04 10:30 ` [RFC][PATCH 6/7] smp: Correctly deal " Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel,
	Greg Kroah-Hartman, Jiri Slaby, Peter Zijlstra

[-- Attachment #1: peterz-might_sleep-tty.patch --]
[-- Type: text/plain, Size: 2917 bytes --]

n_tty_{read,write} are wait loops with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 drivers/tty/n_tty.c |   17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -2123,7 +2123,7 @@ static ssize_t n_tty_read(struct tty_str
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	unsigned char __user *b = buf;
-	DECLARE_WAITQUEUE(wait, current);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
 	ssize_t retval = 0;
@@ -2186,10 +2186,6 @@ static ssize_t n_tty_read(struct tty_str
 			nr--;
 			break;
 		}
-		/* This statement must be first before checking for input
-		   so that any interrupt will set the state back to
-		   TASK_RUNNING. */
-		set_current_state(TASK_INTERRUPTIBLE);
 
 		if (((minimum - (b - buf)) < ldata->minimum_to_wake) &&
 		    ((minimum - (b - buf)) >= 1))
@@ -2220,13 +2216,13 @@ static ssize_t n_tty_read(struct tty_str
 				n_tty_set_room(tty);
 				up_read(&tty->termios_rwsem);
 
-				timeout = schedule_timeout(timeout);
+				timeout = wait_woken(&wait, TASK_INTERRUPTIBLE,
+						     timeout);
 
 				down_read(&tty->termios_rwsem);
 				continue;
 			}
 		}
-		__set_current_state(TASK_RUNNING);
 
 		/* Deal with packet mode. */
 		if (packet && b == buf) {
@@ -2273,7 +2269,6 @@ static ssize_t n_tty_read(struct tty_str
 
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	__set_current_state(TASK_RUNNING);
 	if (b - buf)
 		retval = b - buf;
 
@@ -2306,7 +2301,7 @@ static ssize_t n_tty_write(struct tty_st
 			   const unsigned char *buf, size_t nr)
 {
 	const unsigned char *b = buf;
-	DECLARE_WAITQUEUE(wait, current);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	ssize_t retval = 0;
 
@@ -2324,7 +2319,6 @@ static ssize_t n_tty_write(struct tty_st
 
 	add_wait_queue(&tty->write_wait, &wait);
 	while (1) {
-		set_current_state(TASK_INTERRUPTIBLE);
 		if (signal_pending(current)) {
 			retval = -ERESTARTSYS;
 			break;
@@ -2378,12 +2372,11 @@ static ssize_t n_tty_write(struct tty_st
 		}
 		up_read(&tty->termios_rwsem);
 
-		schedule();
+		wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 
 		down_read(&tty->termios_rwsem);
 	}
 break_out:
-	__set_current_state(TASK_RUNNING);
 	remove_wait_queue(&tty->write_wait, &wait);
 	if (b - buf != nr && tty->fasync)
 		set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 6/7] smp: Correctly deal with nested sleeps
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
                   ` (4 preceding siblings ...)
  2014-08-04 10:30 ` [RFC][PATCH 5/7] tty: " Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-04 10:30 ` [RFC][PATCH 7/7] sched: Debug " Peter Zijlstra
  2014-08-05  8:33   ` Ilya Dryomov
  7 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-might_sleep_smpboot.patch --]
[-- Type: text/plain, Size: 1710 bytes --]

smp_hotplug_thread::{setup,unpark} functions can sleep too:

 __might_sleep+0x74/0x80
 kmem_cache_alloc_trace+0x4e/0x1c0
 perf_event_alloc+0x55/0x450
 perf_event_create_kernel_counter+0x2f/0x100
 watchdog_nmi_enable+0x8d/0x160
 watchdog_enable+0x45/0x90
 smpboot_thread_fn+0xec/0x2b0
 kthread+0xe4/0x100
 ret_from_fork+0x7c/0xb0

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/smpboot.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -110,7 +110,7 @@ static int smpboot_thread_fn(void *data)
 		set_current_state(TASK_INTERRUPTIBLE);
 		preempt_disable();
 		if (kthread_should_stop()) {
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->cleanup)
 				ht->cleanup(td->cpu, cpu_online(td->cpu));
@@ -136,26 +136,27 @@ static int smpboot_thread_fn(void *data)
 		/* Check for state change setup */
 		switch (td->status) {
 		case HP_THREAD_NONE:
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->setup)
 				ht->setup(td->cpu);
 			td->status = HP_THREAD_ACTIVE;
-			preempt_disable();
-			break;
+			continue;
+
 		case HP_THREAD_PARKED:
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			if (ht->unpark)
 				ht->unpark(td->cpu);
 			td->status = HP_THREAD_ACTIVE;
-			preempt_disable();
-			break;
+			continue;
 		}
 
 		if (!ht->thread_should_run(td->cpu)) {
-			preempt_enable();
+			preempt_enable_no_resched();
 			schedule();
 		} else {
-			set_current_state(TASK_RUNNING);
+			__set_current_state(TASK_RUNNING);
 			preempt_enable();
 			ht->thread_fn(td->cpu);
 		}



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 7/7] sched: Debug nested sleeps
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
                   ` (5 preceding siblings ...)
  2014-08-04 10:30 ` [RFC][PATCH 6/7] smp: Correctly deal " Peter Zijlstra
@ 2014-08-04 10:30 ` Peter Zijlstra
  2014-08-05  8:33   ` Ilya Dryomov
  7 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:30 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-might_sleep_nesting.patch --]
[-- Type: text/plain, Size: 3670 bytes --]

Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.

Since all blocking is arranged through task_struct::state, nesting
this is going to cause two distinct issues:

 - the inner primitive will set TASK_RUNNING and the outer will not
   block

 - the outer sets !TASK_RUNNING and the inner expects to be called
   with TASK_RUNNING and blocks forever (mutex_lock).

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/sched.h |   46 ++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/core.c   |   13 +++++++++++++
 2 files changed, 57 insertions(+), 2 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -241,6 +241,43 @@ extern char ___assert_task_state[1 - 2*!
 				((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
 				 (task->flags & PF_FROZEN) == 0)
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+
+#define __set_task_state(tsk, state_value)			\
+	do {							\
+		(tsk)->task_state_change = _THIS_IP_;		\
+		(tsk)->state = (state_value);			\
+	} while (0)
+#define set_task_state(tsk, state_value)			\
+	do {							\
+		(tsk)->task_state_change = _THIS_IP_;		\
+		set_mb((tsk)->state, (state_value));		\
+	} while (0)
+
+/*
+ * set_current_state() includes a barrier so that the write of current->state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ *	set_current_state(TASK_UNINTERRUPTIBLE);
+ *	if (do_i_need_to_sleep())
+ *		schedule();
+ *
+ * If the caller does not need such serialisation then use __set_current_state()
+ */
+#define __set_current_state(state_value)			\
+	do {							\
+		current->task_state_change = _THIS_IP_;		\
+		current->state = (state_value);			\
+	} while (0)
+#define set_current_state(state_value)				\
+	do {							\
+		current->task_state_change = _THIS_IP_;		\
+		set_mb(current->state, (state_value));		\
+	} while (0)
+
+#else
+
 #define __set_task_state(tsk, state_value)		\
 	do { (tsk)->state = (state_value); } while (0)
 #define set_task_state(tsk, state_value)		\
@@ -257,11 +294,13 @@ extern char ___assert_task_state[1 - 2*!
  *
  * If the caller does not need such serialisation then use __set_current_state()
  */
-#define __set_current_state(state_value)			\
+#define __set_current_state(state_value)		\
 	do { current->state = (state_value); } while (0)
-#define set_current_state(state_value)		\
+#define set_current_state(state_value)			\
 	set_mb(current->state, (state_value))
 
+#endif
+
 /* Task command name length */
 #define TASK_COMM_LEN 16
 
@@ -1650,6 +1689,9 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	unsigned long	task_state_change;
+#endif
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7077,6 +7077,19 @@ void __might_sleep(const char *file, int
 {
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
+	/*
+	 * Blocking primitives will set (and therefore destroy) current->state,
+	 * since we will exit with TASK_RUNNING make sure we enter with it,
+	 * otherwise we will destroy state.
+	 */
+	if (WARN(current->state != TASK_RUNNING,
+			"do not call blocking ops when !TASK_RUNNING; "
+			"state=%lx set at [<%p>] %pS\n",
+			current->state,
+			(void *)current->task_state_change,
+			(void *)current->task_state_change))
+		__set_current_state(TASK_RUNNING);
+
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking
  2014-08-04 10:30 ` [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking Peter Zijlstra
@ 2014-08-04 13:44   ` Peter Zijlstra
  2014-08-04 18:35     ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 13:44 UTC (permalink / raw)
  To: mingo, oleg, torvalds
  Cc: tglx, ilya.dryomov, umgwanakikbuti, linux-kernel, Eric Paris,
	John McCutchan, Robert Love, Greg Kroah-Hartman, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

On Mon, Aug 04, 2014 at 12:30:26PM +0200, Peter Zijlstra wrote:

> +long wait_woken(wait_queue_t *wait, unsigned mode, long timeout)
> +{
> +	set_current_state(mode);
> +	if (!(wait->flags & WQ_FLAG_WOKEN))
> +		timeout = schedule_timeout(timeout);
> +	else
> +		wait->flags &= ~WQ_FLAG_WOKEN;

I just noticed that poll_schedule_timeout() uses set_mb() for clearing
its triggered variable. But I'm not entirely sure I see why..

> +	__set_current_state(TASK_RUNNING);
> +
> +	return timeout;
> +}
> +EXPORT_SYMBOL(wait_woken);
> +
> +int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	/*
> +	 * Although this function is called under waitqueue lock, LOCK
> +	 * doesn't imply write barrier and the users expect write
> +	 * barrier semantics on wakeup functions.  The following
> +	 * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
> +	 * and is paired with set_mb() in wait_woken().
> +	 */
> +	smp_wmb();
> +	wait->flags |= WQ_FLAG_WOKEN;
> +
> +	return default_wake_function(wait, mode, sync, key);
> +}
> +EXPORT_SYMBOL(woken_wake_function);

So possibly we could also use this for poll_schedule_timeout() and
__pollwake().

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking
  2014-08-04 13:44   ` Peter Zijlstra
@ 2014-08-04 18:35     ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2014-08-04 18:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Eric Paris, John McCutchan, Robert Love,
	Greg Kroah-Hartman, Jiri Slaby

On 08/04, Peter Zijlstra wrote:
>
> I just noticed that poll_schedule_timeout() uses set_mb() for clearing
> its triggered variable. But I'm not entirely sure I see why..

At least we need to ensure that this "pwq->triggered = 0" can't be reordered
after the next ->poll() returns 0. In this case we should sleep unless pollwake()
was called and it set "->triggered = 1". So without this mb() we can miss an
event.

But I can hardly understand the "data written before wake up is always visible
after wake up" part... Probably this means that if we didn't actually sleep
because pwq->triggered == 1 we need to ensure that the next ->poll(file) should
see all changes which were the reason for wakeup.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 3/7] exit: Desl with nested sleeps
  2014-08-04 10:30 ` [RFC][PATCH 3/7] exit: Desl with nested sleeps Peter Zijlstra
@ 2014-08-04 18:53   ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2014-08-04 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, torvalds, tglx, ilya.dryomov, umgwanakikbuti, linux-kernel

On 08/04, Peter Zijlstra wrote:
>
> Not strictly a bug in the current form,

Yes, this is the false positive.

> but clean it up to enable
> debugging infrastructure and avoid it becoming a bug.

OK, but

> @@ -991,6 +991,8 @@ static int wait_task_zombie(struct wait_
>
>  		get_task_struct(p);
>  		read_unlock(&tasklist_lock);
> +		__set_current_state(TASK_RUNNING);
> +

without a comment it is not clear why do we bother to set RUNNING here.

Perhaps we can add another helper which sets current->state = RUNNING?
Just to self-document the usage.

We can even make it depend on CONFIG_DEBUG_ATOMIC_SLEEP (not sure this
makes sense). But in this case it should set ->task_state_change = 0
and __might_sleep() should take !task_state_change into account.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/7] inotify: Deal with nested sleeps
  2014-08-04 10:30 ` [RFC][PATCH 4/7] inotify: Deal " Peter Zijlstra
@ 2014-08-04 19:23   ` Oleg Nesterov
  2014-08-04 21:02     ` Peter Zijlstra
  2014-08-05  2:22   ` Lai Jiangshan
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2014-08-04 19:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Eric Paris, John McCutchan, Robert Love

On 08/04, Peter Zijlstra wrote:
>
>  	while (1) {
> -		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
> -
>  		mutex_lock(&group->notification_mutex);

So yes, even these 2 lines look obviously buggy. Even if
fsnotify_add_notify_event()->wake_up(&group->notification_waitq) uses
TASK_NORMAL, so at least this can't miss an event.

It is too later for me, but I am wondering if we can do another thing.
Something like

		int state;

		prepare_to_wait(wait, TASK_INTERRUPTIBLE);

		PUSH(&wait, state);
		mutex_lock();
		mutex_unlock();
		POP(&wait, state);

and, ignoring all races, lack of barriers, etc

	#define PUSH(w, s)	s = current->state; current->state = RUNNING;

	#define POP(w, s)	current->state = WOKEN(w) ? RUNNING : s;

Probably not... just curious.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/7] inotify: Deal with nested sleeps
  2014-08-04 19:23   ` Oleg Nesterov
@ 2014-08-04 21:02     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-04 21:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mingo, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Eric Paris, John McCutchan, Robert Love

On Mon, Aug 04, 2014 at 09:23:58PM +0200, Oleg Nesterov wrote:
> On 08/04, Peter Zijlstra wrote:
> >
> >  	while (1) {
> > -		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
> > -
> >  		mutex_lock(&group->notification_mutex);
> 
> So yes, even these 2 lines look obviously buggy. Even if
> fsnotify_add_notify_event()->wake_up(&group->notification_waitq) uses
> TASK_NORMAL, so at least this can't miss an event.

There's another problem, mutex_lock() actively assumes ->state ==
TASK_RUNNING and if its not can go to sleep, possibly without ever being
woken again (because nobody knows its sleeping).

We should probably fix that too, but then its not too weird an
assumption for a blocking primitive.

> It is too later for me, but I am wondering if we can do another thing.
> Something like
> 
> 		int state;
> 
> 		prepare_to_wait(wait, TASK_INTERRUPTIBLE);
> 
> 		PUSH(&wait, state);
> 		mutex_lock();
> 		mutex_unlock();
> 		POP(&wait, state);
> 
> and, ignoring all races, lack of barriers, etc
> 
> 	#define PUSH(w, s)	s = current->state; current->state = RUNNING;
> 
> 	#define POP(w, s)	current->state = WOKEN(w) ? RUNNING : s;
> 
> Probably not... just curious.

Sure we can do a state stack, but I'm not immediately seeing the benefit
of doing so. Also I don't think we want to encourage people to do things
like this.

-rt does something like that for its spinlock->rt_mutex conversion.

In fact, you only need to push/pop around mutex_lock(), unlock will
never change state.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/7] inotify: Deal with nested sleeps
  2014-08-04 10:30 ` [RFC][PATCH 4/7] inotify: Deal " Peter Zijlstra
  2014-08-04 19:23   ` Oleg Nesterov
@ 2014-08-05  2:22   ` Lai Jiangshan
  2014-08-05  7:28     ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2014-08-05  2:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, oleg, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Eric Paris, John McCutchan, Robert Love

I don't think this one needs nested sleeps.

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index cc423a3..1ca5888 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -233,15 +233,16 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
 	group = file->private_data;
 
 	while (1) {
+		mutex_lock(&group->notification_mutex);
 		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
 
-		mutex_lock(&group->notification_mutex);
 		kevent = get_one_event(group, count);
 		mutex_unlock(&group->notification_mutex);
 
 		pr_debug("%s: group=%p kevent=%p\n", __func__, group, kevent);
 
 		if (kevent) {
+			__set_current_state(TASK_RUNNING);
 			ret = PTR_ERR(kevent);
 			if (IS_ERR(kevent))
 				break;



On 08/04/2014 06:30 PM, Peter Zijlstra wrote:
> inotify_read is a wait loop with sleeps in. Wait loops rely on
> task_struct::state and sleeps do too, since that's the only means of
> actually sleeping. Therefore the nested sleeps destroy the wait loop
> state and the wait loop breaks the sleep functions that assume
> TASK_RUNNING (mutex_lock).
> 
> Fix this by using the new woken_wake_function and wait_woken() stuff,
> which registers wakeups in wait and thereby allows shrinking the
> task_state::state changes to the actual sleep part.
> 
> Cc: Eric Paris <eparis@parisplace.org>
> Cc: John McCutchan <john@johnmccutchan.com>
> Cc: Robert Love <rlove@rlove.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  fs/notify/inotify/inotify_user.c |    9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -227,14 +227,13 @@ static ssize_t inotify_read(struct file
>  	struct fsnotify_event *kevent;
>  	char __user *start;
>  	int ret;
> -	DEFINE_WAIT(wait);
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
>  	start = buf;
>  	group = file->private_data;
>  
> +	add_wait_queue(&group->notification_waitq, &wait);
>  	while (1) {
> -		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
> -
>  		mutex_lock(&group->notification_mutex);
>  		kevent = get_one_event(group, count);
>  		mutex_unlock(&group->notification_mutex);
> @@ -264,10 +263,10 @@ static ssize_t inotify_read(struct file
>  		if (start != buf)
>  			break;
>  
> -		schedule();
> +		wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>  	}
> +	remove_wait_queue(&group->notification_waitq, &wait);
>  
> -	finish_wait(&group->notification_waitq, &wait);
>  	if (start != buf && ret != -EFAULT)
>  		ret = buf - start;
>  	return ret;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/7] inotify: Deal with nested sleeps
  2014-08-05  2:22   ` Lai Jiangshan
@ 2014-08-05  7:28     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-05  7:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: mingo, oleg, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Eric Paris, John McCutchan, Robert Love

On Tue, Aug 05, 2014 at 10:22:11AM +0800, Lai Jiangshan wrote:
> I don't think this one needs nested sleeps.
> 
> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
> index cc423a3..1ca5888 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -233,15 +233,16 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
>  	group = file->private_data;
>  
>  	while (1) {
> +		mutex_lock(&group->notification_mutex);
>  		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
>  
> -		mutex_lock(&group->notification_mutex);
>  		kevent = get_one_event(group, count);
>  		mutex_unlock(&group->notification_mutex);
>  
>  		pr_debug("%s: group=%p kevent=%p\n", __func__, group, kevent);
>  
>  		if (kevent) {
> +			__set_current_state(TASK_RUNNING);
>  			ret = PTR_ERR(kevent);
>  			if (IS_ERR(kevent))
>  				break;

Yeah, that'll probably work, but I'm not sure this the place to be
creative like that though. But whatever the inotify people prefer
really.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
@ 2014-08-05  8:33   ` Ilya Dryomov
  2014-08-04 10:30 ` [RFC][PATCH 2/7] wait: Provide Peter Zijlstra
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Ilya Dryomov @ 2014-08-05  8:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Hi,
>
> Ilya recently tripped over a nested sleep which made Ingo suggest we should
> have debug checks for that. So I did some, see patch 7. Of course that
> triggered a whole bunch of fail the instant I tried to boot my machine.
>
> With this series I can boot my test box and build a kernel on it, I'm fairly
> sure that's far too limited a test to have found all, but its a start.

FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
netdev and linux-mm.

WARNING: CPU: 2 PID: 1978 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070640>] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 2 PID: 1978 Comm: ceph-osd Not tainted 3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bb6 ffff8800126739e8 ffffffff8156ec1d 0000000000000000
 ffff880012673a38 ffff880012673a28 ffffffff81032c27 ffff880012673a58
 0000000000000200 ffff8800150fa060 00000000000007ad ffffffff817ed352
Call Trace:
 [<ffffffff8156ec1d>] dump_stack+0x4f/0x7c
 [<ffffffff81032c27>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81032cf1>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff814f23cf>] ? tcp_v4_do_rcv+0x10f/0x4a0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105b53b>] __might_sleep+0x5b/0x1e0
 [<ffffffff8148d73d>] release_sock+0x13d/0x200
 [<ffffffff81498223>] sk_stream_wait_memory+0x133/0x2d0
 [<ffffffff810701d0>] ? woken_wake_function+0x10/0x10
 [<ffffffff814dfdbf>] tcp_sendmsg+0xb6f/0xd70
 [<ffffffff815096cf>] inet_sendmsg+0xdf/0x100
 [<ffffffff815095f0>] ? inet_recvmsg+0x100/0x100
 [<ffffffff814896d7>] sock_sendmsg+0x67/0x90
 [<ffffffff810fd961>] ? might_fault+0x51/0xb0
 [<ffffffff81489a22>] ___sys_sendmsg+0x2d2/0x2e0
 [<ffffffff81095e58>] ? futex_wake+0x128/0x140
 [<ffffffff81095d31>] ? futex_wake+0x1/0x140
 [<ffffffff81141dd0>] ? do_dup2+0xd0/0xd0
 [<ffffffff8105fa31>] ? get_parent_ip+0x11/0x50
 [<ffffffff813cea27>] ? debug_smp_processor_id+0x17/0x20
 [<ffffffff813c33c5>] ? delay_tsc+0x85/0xb0
 [<ffffffff81141ead>] ? __fget+0xdd/0xf0
 [<ffffffff81141dd0>] ? do_dup2+0xd0/0xd0
 [<ffffffff81141f05>] ? __fget_light+0x45/0x60
 [<ffffffff81141f2e>] ? __fdget+0xe/0x10
 [<ffffffff8148a4e4>] __sys_sendmsg+0x44/0x70
 [<ffffffff8148a519>] SyS_sendmsg+0x9/0x10
 [<ffffffff81575b92>] system_call_fastpath+0x16/0x1b

WARNING: CPU: 0 PID: 380 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070640>] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 0 PID: 380 Comm: kswapd0 Tainted: G        W     3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bb6 ffff88007b64bc68 ffffffff8156ec1d 0000000000000000
 ffff88007b64bcb8 ffff88007b64bca8 ffffffff81032c27 0000000000000000
 0000000000000000 ffff88007c062060 0000000000000065 ffffffff8179ca1f
Call Trace:
 [<ffffffff8156ec1d>] dump_stack+0x4f/0x7c
 [<ffffffff81032c27>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81032cf1>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105b53b>] __might_sleep+0x5b/0x1e0
 [<ffffffff810f7fd3>] __reset_isolation_suitable+0x83/0x140
 [<ffffffff810f83f3>] reset_isolation_suitable+0x33/0x50
 [<ffffffff810eb717>] kswapd+0x2e7/0x4d0
 [<ffffffff810701d0>] ? woken_wake_function+0x10/0x10
 [<ffffffff810eb430>] ? balance_pgdat+0x5b0/0x5b0
 [<ffffffff810539ab>] kthread+0xfb/0x110
 [<ffffffff810538b0>] ? flush_kthread_worker+0x130/0x130
 [<ffffffff81575aec>] ret_from_fork+0x7c/0xb0
 [<ffffffff810538b0>] ? flush_kthread_worker+0x130/0x130

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
@ 2014-08-05  8:33   ` Ilya Dryomov
  0 siblings, 0 replies; 25+ messages in thread
From: Ilya Dryomov @ 2014-08-05  8:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Hi,
>
> Ilya recently tripped over a nested sleep which made Ingo suggest we should
> have debug checks for that. So I did some, see patch 7. Of course that
> triggered a whole bunch of fail the instant I tried to boot my machine.
>
> With this series I can boot my test box and build a kernel on it, I'm fairly
> sure that's far too limited a test to have found all, but its a start.

FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
netdev and linux-mm.

WARNING: CPU: 2 PID: 1978 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070640>] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 2 PID: 1978 Comm: ceph-osd Not tainted 3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bb6 ffff8800126739e8 ffffffff8156ec1d 0000000000000000
 ffff880012673a38 ffff880012673a28 ffffffff81032c27 ffff880012673a58
 0000000000000200 ffff8800150fa060 00000000000007ad ffffffff817ed352
Call Trace:
 [<ffffffff8156ec1d>] dump_stack+0x4f/0x7c
 [<ffffffff81032c27>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81032cf1>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff814f23cf>] ? tcp_v4_do_rcv+0x10f/0x4a0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105b53b>] __might_sleep+0x5b/0x1e0
 [<ffffffff8148d73d>] release_sock+0x13d/0x200
 [<ffffffff81498223>] sk_stream_wait_memory+0x133/0x2d0
 [<ffffffff810701d0>] ? woken_wake_function+0x10/0x10
 [<ffffffff814dfdbf>] tcp_sendmsg+0xb6f/0xd70
 [<ffffffff815096cf>] inet_sendmsg+0xdf/0x100
 [<ffffffff815095f0>] ? inet_recvmsg+0x100/0x100
 [<ffffffff814896d7>] sock_sendmsg+0x67/0x90
 [<ffffffff810fd961>] ? might_fault+0x51/0xb0
 [<ffffffff81489a22>] ___sys_sendmsg+0x2d2/0x2e0
 [<ffffffff81095e58>] ? futex_wake+0x128/0x140
 [<ffffffff81095d31>] ? futex_wake+0x1/0x140
 [<ffffffff81141dd0>] ? do_dup2+0xd0/0xd0
 [<ffffffff8105fa31>] ? get_parent_ip+0x11/0x50
 [<ffffffff813cea27>] ? debug_smp_processor_id+0x17/0x20
 [<ffffffff813c33c5>] ? delay_tsc+0x85/0xb0
 [<ffffffff81141ead>] ? __fget+0xdd/0xf0
 [<ffffffff81141dd0>] ? do_dup2+0xd0/0xd0
 [<ffffffff81141f05>] ? __fget_light+0x45/0x60
 [<ffffffff81141f2e>] ? __fdget+0xe/0x10
 [<ffffffff8148a4e4>] __sys_sendmsg+0x44/0x70
 [<ffffffff8148a519>] SyS_sendmsg+0x9/0x10
 [<ffffffff81575b92>] system_call_fastpath+0x16/0x1b

WARNING: CPU: 0 PID: 380 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070640>] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 0 PID: 380 Comm: kswapd0 Tainted: G        W     3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bb6 ffff88007b64bc68 ffffffff8156ec1d 0000000000000000
 ffff88007b64bcb8 ffff88007b64bca8 ffffffff81032c27 0000000000000000
 0000000000000000 ffff88007c062060 0000000000000065 ffffffff8179ca1f
Call Trace:
 [<ffffffff8156ec1d>] dump_stack+0x4f/0x7c
 [<ffffffff81032c27>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81032cf1>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070640>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105b53b>] __might_sleep+0x5b/0x1e0
 [<ffffffff810f7fd3>] __reset_isolation_suitable+0x83/0x140
 [<ffffffff810f83f3>] reset_isolation_suitable+0x33/0x50
 [<ffffffff810eb717>] kswapd+0x2e7/0x4d0
 [<ffffffff810701d0>] ? woken_wake_function+0x10/0x10
 [<ffffffff810eb430>] ? balance_pgdat+0x5b0/0x5b0
 [<ffffffff810539ab>] kthread+0xfb/0x110
 [<ffffffff810538b0>] ? flush_kthread_worker+0x130/0x130
 [<ffffffff81575aec>] ret_from_fork+0x7c/0xb0
 [<ffffffff810538b0>] ? flush_kthread_worker+0x130/0x130

Thanks,

                Ilya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-05  8:33   ` Ilya Dryomov
  (?)
@ 2014-08-05 13:06   ` Peter Zijlstra
  2014-08-06  7:51       ` Ilya Dryomov
  -1 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-05 13:06 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3776 bytes --]

On Tue, Aug 05, 2014 at 12:33:16PM +0400, Ilya Dryomov wrote:
> On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Hi,
> >
> > Ilya recently tripped over a nested sleep which made Ingo suggest we should
> > have debug checks for that. So I did some, see patch 7. Of course that
> > triggered a whole bunch of fail the instant I tried to boot my machine.
> >
> > With this series I can boot my test box and build a kernel on it, I'm fairly
> > sure that's far too limited a test to have found all, but its a start.
> 
> FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
> netdev and linux-mm.

Both are cond_resched() calls, and that's not blocking as such, just a
preemption point, so lets exclude those.

From the school of '_' are free:

---
 include/linux/kernel.h |    3 +++
 include/linux/sched.h  |    6 +++---
 kernel/sched/core.c    |   12 +++++++++---
 3 files changed, 15 insertions(+), 6 deletions(-)

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,7 @@ extern int _cond_resched(void);
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+  void ___might_sleep(const char *file, int line, int preempt_offset);
   void __might_sleep(const char *file, int line, int preempt_offset);
 /**
  * might_sleep - annotation for functions that can sleep
@@ -176,6 +177,8 @@ extern int _cond_resched(void);
 # define might_sleep() \
 	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 #else
+  static inline void ___might_sleep(const char *file, int line,
+				   int preempt_offset) { }
   static inline void __might_sleep(const char *file, int line,
 				   int preempt_offset) { }
 # define might_sleep() do { might_resched(); } while (0)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2754,7 +2754,7 @@ static inline int signal_pending_state(l
 extern int _cond_resched(void);
 
 #define cond_resched() ({			\
-	__might_sleep(__FILE__, __LINE__, 0);	\
+	___might_sleep(__FILE__, __LINE__, 0);	\
 	_cond_resched();			\
 })
 
@@ -2767,14 +2767,14 @@ extern int __cond_resched_lock(spinlock_
 #endif
 
 #define cond_resched_lock(lock) ({				\
-	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
+	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
 	__cond_resched_lock(lock);				\
 })
 
 extern int __cond_resched_softirq(void);
 
 #define cond_resched_softirq() ({					\
-	__might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET);	\
+	___might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET);	\
 	__cond_resched_softirq();					\
 })
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7078,8 +7078,6 @@ static inline int preempt_count_equals(i
 
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
-	static unsigned long prev_jiffy;	/* ratelimiting */
-
 	/*
 	 * Blocking primitives will set (and therefore destroy) current->state,
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
@@ -7093,6 +7091,14 @@ void __might_sleep(const char *file, int
 			(void *)current->task_state_change))
 		__set_current_state(TASK_RUNNING);
 
+	___might_sleep(file, line, preempt_offset);
+}
+EXPORT_SYMBOL(__might_sleep);
+
+void ___might_sleep(const char *file, int line, int preempt_offset)
+{
+	static unsigned long prev_jiffy;	/* ratelimiting */
+
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
 	     !is_idle_task(current)) ||
@@ -7122,7 +7128,7 @@ void __might_sleep(const char *file, int
 #endif
 	dump_stack();
 }
-EXPORT_SYMBOL(__might_sleep);
+EXPORT_SYMBOL(___might_sleep);
 #endif
 
 #ifdef CONFIG_MAGIC_SYSRQ

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 5/7] tty: Deal with nested sleeps
  2014-08-04 10:30 ` [RFC][PATCH 5/7] tty: " Peter Zijlstra
@ 2014-08-05 23:29   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 25+ messages in thread
From: Greg Kroah-Hartman @ 2014-08-05 23:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, oleg, torvalds, tglx, ilya.dryomov, umgwanakikbuti,
	linux-kernel, Jiri Slaby

On Mon, Aug 04, 2014 at 12:30:30PM +0200, Peter Zijlstra wrote:
> n_tty_{read,write} are wait loops with sleeps in. Wait loops rely on
> task_struct::state and sleeps do too, since that's the only means of
> actually sleeping. Therefore the nested sleeps destroy the wait loop
> state and the wait loop breaks the sleep functions that assume
> TASK_RUNNING (mutex_lock).
> 
> Fix this by using the new woken_wake_function and wait_woken() stuff,
> which registers wakeups in wait and thereby allows shrinking the
> task_state::state changes to the actual sleep part.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jiri Slaby <jslaby@suse.cz>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  drivers/tty/n_tty.c |   17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-05 13:06   ` Peter Zijlstra
@ 2014-08-06  7:51       ` Ilya Dryomov
  0 siblings, 0 replies; 25+ messages in thread
From: Ilya Dryomov @ 2014-08-06  7:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

On Tue, Aug 5, 2014 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 05, 2014 at 12:33:16PM +0400, Ilya Dryomov wrote:
>> On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > Hi,
>> >
>> > Ilya recently tripped over a nested sleep which made Ingo suggest we should
>> > have debug checks for that. So I did some, see patch 7. Of course that
>> > triggered a whole bunch of fail the instant I tried to boot my machine.
>> >
>> > With this series I can boot my test box and build a kernel on it, I'm fairly
>> > sure that's far too limited a test to have found all, but its a start.
>>
>> FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
>> netdev and linux-mm.
>
> Both are cond_resched() calls, and that's not blocking as such, just a
> preemption point, so lets exclude those.

OK, this one is a bit different.

WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
Modules linked in:
CPU: 1 PID: 1744 Comm: lt-ceph_test_li Not tainted 3.16.0-vm+ #113
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bc0 ffff88006c4479d8 ffffffff8156f455 0000000000000000
 ffff88006c447a28 ffff88006c447a18 ffffffff81033357 0000000000000001
 0000000000000000 0000000000000950 ffffffff817ee48a ffff88006dba6120
Call Trace:
 [<ffffffff8156f455>] dump_stack+0x4f/0x7c
 [<ffffffff81033357>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81033421>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff81078bb2>] ? trace_hardirqs_on_caller+0x182/0x1f0
 [<ffffffff81070e10>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070e10>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105bc38>] __might_sleep+0x58/0x90
 [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
 [<ffffffff8148dfeb>] ? release_sock+0x1bb/0x200
 [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
 [<ffffffff810709a0>] ? woken_wake_function+0x10/0x10
 [<ffffffff814e058f>] tcp_sendmsg+0xb6f/0xd70
 [<ffffffff81509e9f>] inet_sendmsg+0xdf/0x100
 [<ffffffff81509dc0>] ? inet_recvmsg+0x100/0x100
 [<ffffffff81489f07>] sock_sendmsg+0x67/0x90
 [<ffffffff810fe391>] ? might_fault+0x51/0xb0
 [<ffffffff8148a252>] ___sys_sendmsg+0x2d2/0x2e0
 [<ffffffff811428a0>] ? do_dup2+0xd0/0xd0
 [<ffffffff811428a0>] ? do_dup2+0xd0/0xd0
 [<ffffffff8105bfe0>] ? finish_task_switch+0x50/0x100
 [<ffffffff811429d5>] ? __fget_light+0x45/0x60
 [<ffffffff811429fe>] ? __fdget+0xe/0x10
 [<ffffffff8148ad14>] __sys_sendmsg+0x44/0x70
 [<ffffffff8148ad49>] SyS_sendmsg+0x9/0x10
 [<ffffffff815764d2>] system_call_fastpath+0x16/0x1b

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
@ 2014-08-06  7:51       ` Ilya Dryomov
  0 siblings, 0 replies; 25+ messages in thread
From: Ilya Dryomov @ 2014-08-06  7:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

On Tue, Aug 5, 2014 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 05, 2014 at 12:33:16PM +0400, Ilya Dryomov wrote:
>> On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > Hi,
>> >
>> > Ilya recently tripped over a nested sleep which made Ingo suggest we should
>> > have debug checks for that. So I did some, see patch 7. Of course that
>> > triggered a whole bunch of fail the instant I tried to boot my machine.
>> >
>> > With this series I can boot my test box and build a kernel on it, I'm fairly
>> > sure that's far too limited a test to have found all, but its a start.
>>
>> FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
>> netdev and linux-mm.
>
> Both are cond_resched() calls, and that's not blocking as such, just a
> preemption point, so lets exclude those.

OK, this one is a bit different.

WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
Modules linked in:
CPU: 1 PID: 1744 Comm: lt-ceph_test_li Not tainted 3.16.0-vm+ #113
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 0000000000001bc0 ffff88006c4479d8 ffffffff8156f455 0000000000000000
 ffff88006c447a28 ffff88006c447a18 ffffffff81033357 0000000000000001
 0000000000000000 0000000000000950 ffffffff817ee48a ffff88006dba6120
Call Trace:
 [<ffffffff8156f455>] dump_stack+0x4f/0x7c
 [<ffffffff81033357>] warn_slowpath_common+0x87/0xb0
 [<ffffffff81033421>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff81078bb2>] ? trace_hardirqs_on_caller+0x182/0x1f0
 [<ffffffff81070e10>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff81070e10>] ? prepare_to_wait+0x50/0xa0
 [<ffffffff8105bc38>] __might_sleep+0x58/0x90
 [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
 [<ffffffff8148dfeb>] ? release_sock+0x1bb/0x200
 [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
 [<ffffffff810709a0>] ? woken_wake_function+0x10/0x10
 [<ffffffff814e058f>] tcp_sendmsg+0xb6f/0xd70
 [<ffffffff81509e9f>] inet_sendmsg+0xdf/0x100
 [<ffffffff81509dc0>] ? inet_recvmsg+0x100/0x100
 [<ffffffff81489f07>] sock_sendmsg+0x67/0x90
 [<ffffffff810fe391>] ? might_fault+0x51/0xb0
 [<ffffffff8148a252>] ___sys_sendmsg+0x2d2/0x2e0
 [<ffffffff811428a0>] ? do_dup2+0xd0/0xd0
 [<ffffffff811428a0>] ? do_dup2+0xd0/0xd0
 [<ffffffff8105bfe0>] ? finish_task_switch+0x50/0x100
 [<ffffffff811429d5>] ? __fget_light+0x45/0x60
 [<ffffffff811429fe>] ? __fdget+0xe/0x10
 [<ffffffff8148ad14>] __sys_sendmsg+0x44/0x70
 [<ffffffff8148ad49>] SyS_sendmsg+0x9/0x10
 [<ffffffff815764d2>] system_call_fastpath+0x16/0x1b

Thanks,

                Ilya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-06  7:51       ` Ilya Dryomov
  (?)
@ 2014-08-06  8:31       ` Peter Zijlstra
  2014-08-06 21:16           ` David Miller
  -1 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-06  8:31 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Ingo Molnar, oleg, Linus Torvalds, tglx, Mike Galbraith,
	Linux Kernel Mailing List, netdev, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:

> OK, this one is a bit different.
> 
> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0

>  [<ffffffff8105bc38>] __might_sleep+0x58/0x90
>  [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
>  [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0

Urgh, tedious. Its not an actual bug as is. Due to the condition check
in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.

I'm not entirely sure what the cleanest way is to make this go away.
Possibly something like so:

---
 include/net/sock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 156350745700..37902176c5ab 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -886,6 +886,7 @@ static inline void sock_rps_reset_rxhash(struct sock *sk)
 		if (!__rc) {						\
 			*(__timeo) = schedule_timeout(*(__timeo));	\
 		}							\
+		__set_current_state(TASK_RUNNING);			\
 		lock_sock(__sk);					\
 		__rc = __condition;					\
 		__rc;							\

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-06  8:31       ` Peter Zijlstra
@ 2014-08-06 21:16           ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2014-08-06 21:16 UTC (permalink / raw)
  To: peterz
  Cc: ilya.dryomov, mingo, oleg, torvalds, tglx, umgwanakikbuti,
	linux-kernel, netdev, linux-mm

From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 6 Aug 2014 10:31:34 +0200

> On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:
> 
>> OK, this one is a bit different.
>> 
>> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
>> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
> 
>>  [<ffffffff8105bc38>] __might_sleep+0x58/0x90
>>  [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
>>  [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
> 
> Urgh, tedious. Its not an actual bug as is. Due to the condition check
> in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.
> 
> I'm not entirely sure what the cleanest way is to make this go away.
> Possibly something like so:

If you submit this formally to netdev with a signoff I'm willing to apply
this if it helps the debug infrastructure.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
@ 2014-08-06 21:16           ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2014-08-06 21:16 UTC (permalink / raw)
  To: peterz
  Cc: ilya.dryomov, mingo, oleg, torvalds, tglx, umgwanakikbuti,
	linux-kernel, netdev, linux-mm

From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 6 Aug 2014 10:31:34 +0200

> On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:
> 
>> OK, this one is a bit different.
>> 
>> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
>> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
> 
>>  [<ffffffff8105bc38>] __might_sleep+0x58/0x90
>>  [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
>>  [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
> 
> Urgh, tedious. Its not an actual bug as is. Due to the condition check
> in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.
> 
> I'm not entirely sure what the cleanest way is to make this go away.
> Possibly something like so:

If you submit this formally to netdev with a signoff I'm willing to apply
this if it helps the debug infrastructure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra
  2014-08-06 21:16           ` David Miller
  (?)
@ 2014-08-07  8:10           ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2014-08-07  8:10 UTC (permalink / raw)
  To: David Miller
  Cc: ilya.dryomov, mingo, oleg, torvalds, tglx, umgwanakikbuti,
	linux-kernel, netdev, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

On Wed, Aug 06, 2014 at 02:16:03PM -0700, David Miller wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 6 Aug 2014 10:31:34 +0200
> 
> > On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:
> > 
> >> OK, this one is a bit different.
> >> 
> >> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
> >> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
> > 
> >>  [<ffffffff8105bc38>] __might_sleep+0x58/0x90
> >>  [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
> >>  [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
> > 
> > Urgh, tedious. Its not an actual bug as is. Due to the condition check
> > in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.
> > 
> > I'm not entirely sure what the cleanest way is to make this go away.
> > Possibly something like so:
> 
> If you submit this formally to netdev with a signoff I'm willing to apply
> this if it helps the debug infrastructure.

Thanks, for now I'm just collecting things to see how far I can take
this. But I'll certainly include you and netdev on a next posting.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-08-07  8:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 10:30 [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Peter Zijlstra
2014-08-04 10:30 ` [RFC][PATCH 1/7] wait: Provide infrastructure to deal with nested blocking Peter Zijlstra
2014-08-04 13:44   ` Peter Zijlstra
2014-08-04 18:35     ` Oleg Nesterov
2014-08-04 10:30 ` [RFC][PATCH 2/7] wait: Provide Peter Zijlstra
2014-08-04 10:30 ` [RFC][PATCH 3/7] exit: Desl with nested sleeps Peter Zijlstra
2014-08-04 18:53   ` Oleg Nesterov
2014-08-04 10:30 ` [RFC][PATCH 4/7] inotify: Deal " Peter Zijlstra
2014-08-04 19:23   ` Oleg Nesterov
2014-08-04 21:02     ` Peter Zijlstra
2014-08-05  2:22   ` Lai Jiangshan
2014-08-05  7:28     ` Peter Zijlstra
2014-08-04 10:30 ` [RFC][PATCH 5/7] tty: " Peter Zijlstra
2014-08-05 23:29   ` Greg Kroah-Hartman
2014-08-04 10:30 ` [RFC][PATCH 6/7] smp: Correctly deal " Peter Zijlstra
2014-08-04 10:30 ` [RFC][PATCH 7/7] sched: Debug " Peter Zijlstra
2014-08-05  8:33 ` [RFC][PATCH 0/7] nested sleeps, fixes and debug infra Ilya Dryomov
2014-08-05  8:33   ` Ilya Dryomov
2014-08-05 13:06   ` Peter Zijlstra
2014-08-06  7:51     ` Ilya Dryomov
2014-08-06  7:51       ` Ilya Dryomov
2014-08-06  8:31       ` Peter Zijlstra
2014-08-06 21:16         ` David Miller
2014-08-06 21:16           ` David Miller
2014-08-07  8:10           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.