[PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK
@ 2013-02-16  9:53 Mandeep Singh Baines
  2013-02-16  9:53 ` [PATCH 2/5] lockdep: check that no locks held at freeze time Mandeep Singh Baines
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-16  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

We don't need to call freezer_do_not_count() for in-kernel users
of CLONE_VFORK since exec will get called in bounded time.

We don't want to call freezer_count() for in-kernel users because
they may be holding locks. freezer_count() calls try_to_freeze().
We don't want to freeze an in-kernel user because it may be
holding locks.

Changes since v1:
* <20130215152840.GC30829@redhat.com> Oleg Nesterov
 * Use (p->flags & PF_KTHREAD) checks instead of p->mm.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 kernel/fork.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c535f33..c7ace33 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -722,9 +722,11 @@ static int wait_for_vfork_done(struct task_struct *child,
 {
 	int killed;
 
-	freezer_do_not_count();
+	if (!(current->flags & PF_KTHREAD))
+		freezer_do_not_count();
 	killed = wait_for_completion_killable(vfork);
-	freezer_count();
+	if (!(current->flags & PF_KTHREAD))
+		freezer_count();
 
 	if (killed) {
 		task_lock(child);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/5] lockdep: check that no locks held at freeze time
  2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
@ 2013-02-16  9:53 ` Mandeep Singh Baines
  2013-02-16 17:06   ` Oleg Nesterov
  2013-02-16  9:53 ` [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait Mandeep Singh Baines
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-16  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

We shouldn't try_to_freeze if locks are held. Verified that
I get no lockdep warnings after applying this patch and
"vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".

Changes since v1:
* LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 include/linux/debug_locks.h |  5 +++--
 include/linux/freezer.h     |  4 ++++
 kernel/exit.c               |  2 +-
 kernel/lockdep.c            | 12 ++++++------
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..e6c3c85 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,8 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(struct task_struct *task,
+				      const char *msg);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +68,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..e6f8256 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include <linux/debug_locks.h>
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
@@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+	if (current->flags & PF_NOFREEZE)
+		return false;
+	debug_check_no_locks_held(current, "lock held while trying to freeze");
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..88a706c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
 	/*
 	 * Make sure we are holding no locks:
 	 */
-	debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held(tsk, "lock held at task exit time");
 	/*
 	 * We can do this unlocked here. The futex code uses this flag
 	 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..2d4cfbf 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(struct task_struct *curr, const char *msg)
 {
 	if (!debug_locks_off())
 		return;
@@ -4091,10 +4091,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 		return;
 
 	printk("\n");
-	printk("=====================================\n");
-	printk("[ BUG: lock held at task exit time! ]\n");
+	printk("=======================================\n");
+	printk("[ BUG: %s! ]\n", msg);
 	print_kernel_ident();
-	printk("-------------------------------------\n");
+	printk("---------------------------------------\n");
 	printk("%s/%d is exiting with locks still held!\n",
 		curr->comm, task_pid_nr(curr));
 	lockdep_print_held_locks(curr);
@@ -4103,10 +4103,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 	dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 	if (unlikely(task->lockdep_depth > 0))
-		print_held_locks_bug(task);
+		print_held_locks_bug(task, msg);
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait
  2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
  2013-02-16  9:53 ` [PATCH 2/5] lockdep: check that no locks held at freeze time Mandeep Singh Baines
@ 2013-02-16  9:53 ` Mandeep Singh Baines
  2013-02-16 17:17   ` Oleg Nesterov
  2013-02-16  9:53 ` [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator Mandeep Singh Baines
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-16  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

Prevents hung_task detector from panicing the machine. This is also
needed to prevent this wait from blocking suspend.

(It doesnt' currently block suspend but it would once the next
patch in this series is applied.)

Changes since v1:
* <20130215145323.GA30829@redhat.com> Oleg Nesterov
 * Rebased after dropping earlier cleanup patch.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 88a706c..89532e8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -20,6 +20,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/freezer.h>
 #include <linux/binfmts.h>
 #include <linux/nsproxy.h>
 #include <linux/pid_namespace.h>
@@ -483,7 +484,7 @@ static void exit_mm(struct task_struct * tsk)
 			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 			if (!self.task) /* see coredump_finish() */
 				break;
-			schedule();
+			freezable_schedule();
 		}
 		__set_task_state(tsk, TASK_RUNNING);
 		down_read(&mm->mmap_sem);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
  2013-02-16  9:53 ` [PATCH 2/5] lockdep: check that no locks held at freeze time Mandeep Singh Baines
  2013-02-16  9:53 ` [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait Mandeep Singh Baines
@ 2013-02-16  9:53 ` Mandeep Singh Baines
  2013-02-16 17:06   ` Oleg Nesterov
  2013-02-16  9:53 ` [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe Mandeep Singh Baines
  2013-02-16 17:05 ` [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Oleg Nesterov
  4 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-16  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

In freeze_task, a freeze request is sent as a fake signal.

Recalculate signal pending on exit from __refrigerator so that
TIF_SIGPENDING doesn't remain incorrectly set.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 kernel/freezer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c38893b..fa66d0f 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -81,6 +81,9 @@ bool __refrigerator(bool check_kthr_stop)
 	 */
 	set_current_state(save);
 
+	/* Clear fake signal from freeze_task(). */
+	recalc_sigpending();
+
 	return was_frozen;
 }
 EXPORT_SYMBOL(__refrigerator);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
                   ` (2 preceding siblings ...)
  2013-02-16  9:53 ` [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator Mandeep Singh Baines
@ 2013-02-16  9:53 ` Mandeep Singh Baines
  2013-02-16 17:10   ` Oleg Nesterov
  2013-02-16 17:05 ` [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Oleg Nesterov
  4 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-16  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ben Chan, Mandeep Singh Baines, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Rafael J. Wysocki, Ingo Molnar

From: Ben Chan <benchan@chromium.org>

Make wait_for_dump_helpers() not abort piping the core dump data when the
crashing process has received a non-fatal signal.  The abort still occurs
in the case of SIGKILL.

Testing:

localhost ~ # echo "|/usr/bin/sleep 1d" > /proc/sys/kernel/core_pattern
localhost ~ # sleep 1d &
[1] 2514
localhost ~ # kill -ABRT $! # Cause coredump
localhost ~ # kill -USR1 $! # Send non-fatal signal
localhost ~ # top -p $! -n1 -b # Verify that we aren't dead or busy waiting
top - 16:45:34 up 2 min,  0 users,  load average: 0.71, 0.42, 0.17
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 26.0%us,  8.5%sy,  0.0%ni, 65.1%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:    991516k total,   418556k used,   572960k free,     5948k buffers
Swap:        0k total,        0k used,        0k free,   289928k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2514 root      20   0  1868  392  336 S    0  0.0   0:00.00 sleep

localhost ~ # echo mem > /sys/power/state # Suspend
localhost ~ # top -p $! -n1 -b # Verify that we aren't dead or busy waiting
top - 16:46:46 up 3 min,  0 users,  load average: 1.68, 0.69, 0.28
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 24.1%us,  7.7%sy,  0.0%ni, 67.9%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:    991516k total,   419956k used,   571560k free,     5996k buffers
Swap:        0k total,        0k used,        0k free,   290208k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2514 root      20   0  1868  392  336 S    0  0.0   0:00.00 sleep

localhost ~ # kill -KILL $!
[1]+  Aborted                 (core dumped) sleep 1d

Addresses http://crosbug.com/21559

Changes since v1:
* Mandeep Singh Baines
  * To prevent blocking suspend, add try_to_freeze().
Changes since v2:
* LKML: <20130215150117.GB30829@redhat.com> Oleg Nestorov
  * Block non-fatal signals to avoid poll_wait busy waiting.
* LKML: <20130215152538.9a61a44e.akpm@linux-foundation.org> Andrew Morton
  * Added comment re: try_to_freeze and clarified commit message.
Changes since v3:
* Mandeep Singh Baines
  * Clear signal pending caused by fake signal from freeze_task().
  * Document how the patch was tested.
Changes since v4:
* Mandeep Singh Baines
  * Moved clearing of fake signal to __refrigerator() (separate patch).
  * SIGKILL will remain in shared_pending since SIGNAL_GROUP_EXIT is set,
    so fatal_signal_pending() will return false. Add a sigkill_pending()
    helper that does the right thing.

Signed-off-by: Ben Chan <benchan@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 fs/coredump.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 1774932..3eb799d 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -32,6 +32,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/oom.h>
 #include <linux/compat.h>
+#include <linux/freezer.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -407,9 +408,21 @@ static void coredump_finish(struct mm_struct *mm)
 	mm->core_state = NULL;
 }
 
+static int sigkill_pending(struct task_struct *tsk)
+{
+	return	signal_pending(tsk) &&
+		(sigismember(&tsk->pending.signal, SIGKILL) ||
+		 sigismember(&tsk->signal->shared_pending.signal, SIGKILL));
+}
+
 static void wait_for_dump_helpers(struct file *file)
 {
 	struct pipe_inode_info *pipe;
+	sigset_t blocked, previous;
+
+	/* Block all but fatal signals. */
+	siginitsetinv(&blocked, sigmask(SIGKILL));
+	sigprocmask(SIG_BLOCK, &blocked, &previous);
 
 	pipe = file->f_path.dentry->d_inode->i_pipe;
 
@@ -417,16 +430,26 @@ static void wait_for_dump_helpers(struct file *file)
 	pipe->readers++;
 	pipe->writers--;
 
-	while ((pipe->readers > 1) && (!signal_pending(current))) {
+	while ((pipe->readers > 1) && (!sigkill_pending(current))) {
 		wake_up_interruptible_sync(&pipe->wait);
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 		pipe_wait(pipe);
+
+		/*
+		 * Non-fatal signals are blocked. So we need to try
+		 * to freeze in order to not block suspend.
+		 */
+		pipe_unlock(pipe);
+		try_to_freeze();
+		pipe_lock(pipe);
 	}
 
 	pipe->readers--;
 	pipe->writers++;
 	pipe_unlock(pipe);
 
+	/* Restore signals. */
+	sigprocmask(SIG_SETMASK, &previous, NULL);
 }
 
 /*
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK
  2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
                   ` (3 preceding siblings ...)
  2013-02-16  9:53 ` [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe Mandeep Singh Baines
@ 2013-02-16 17:05 ` Oleg Nesterov
  2013-02-20  0:07   ` Mandeep Singh Baines
  4 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:05 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On 02/16, Mandeep Singh Baines wrote:
>
> We don't need to call freezer_do_not_count() for in-kernel users
> of CLONE_VFORK since exec will get called in bounded time.
>
> We don't want to call freezer_count() for in-kernel users because
> they may be holding locks. freezer_count() calls try_to_freeze().
> We don't want to freeze an in-kernel user because it may be
> holding locks.

I can only repeat my question ;)

	Who? We should not do this anyway. And __call_usermodehelper() doesn't
	afaics.

	OK, its caller (process_one_work) does lock_map_acquire() for debugging
	purposes, this can "confuse" print_held_locks_bug(). But this thread is
	PF_NOFREEZE ?

Previously this was needed to suppress the false positive. Now that 2/5
checks PF_NOFREEZE, why do we need this change?

> @@ -722,9 +722,11 @@ static int wait_for_vfork_done(struct task_struct *child,
>  {
>  	int killed;
>
> -	freezer_do_not_count();
> +	if (!(current->flags & PF_KTHREAD))
> +		freezer_do_not_count();

If I missed something and we really need this, imho this needs a comment.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] lockdep: check that no locks held at freeze time
  2013-02-16  9:53 ` [PATCH 2/5] lockdep: check that no locks held at freeze time Mandeep Singh Baines
@ 2013-02-16 17:06   ` Oleg Nesterov
  2013-02-20  1:57     ` [PATCH v3] " Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:06 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

Well, this is almost cosmetics, and I am not maintaner, but...

On 02/16, Mandeep Singh Baines wrote:
>
>  static inline bool try_to_freeze(void)
>  {
> +	if (current->flags & PF_NOFREEZE)
> +		return false;
> +	debug_check_no_locks_held(current, "lock held while trying to freeze");

I think this should be

	if (!(current->flags & PF_NOFREEZE))
		debug_check_no_locks_held(...);

without "return". This way we avoid the unnecessary PF_NOFREEZE check
if !CONFIG_LOCKDEP. And perhaps more importantly, this way it is clear
that we check PF_NOFREEZE for debugging only and do not change the code
behaviour.

But I leave this to Rafael/Tejun.

And again, unless I missed something, this makes 1/5 unnecessary.

Oleg.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-16  9:53 ` [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator Mandeep Singh Baines
@ 2013-02-16 17:06   ` Oleg Nesterov
  2013-02-16 17:12     ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:06 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On 02/16, Mandeep Singh Baines wrote:
>
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -81,6 +81,9 @@ bool __refrigerator(bool check_kthr_stop)
>  	 */
>  	set_current_state(save);
>
> +	/* Clear fake signal from freeze_task(). */
> +	recalc_sigpending();

NACK. We can't do this lockless.

And I am not sure we should reinstantiate it here.

The coredumping is "special". It is not a kernel thread, but it does
a lot of in-kernel activity and never returns to user-mode. If it play
games with the freezer, perhaps it should take care itself...

Tejun?

Oleg.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-16  9:53 ` [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe Mandeep Singh Baines
@ 2013-02-16 17:10   ` Oleg Nesterov
  2013-02-16 19:46     ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:10 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On 02/16, Mandeep Singh Baines wrote:
>
> Make wait_for_dump_helpers() not abort piping the core dump data when the
> crashing process has received a non-fatal signal.  The abort still occurs
> in the case of SIGKILL.
>
> Testing:
>
> localhost ~ # echo "|/usr/bin/sleep 1d" > /proc/sys/kernel/core_pattern
> localhost ~ # sleep 1d &

As I already said, this is not enough. And if we change send_signal() paths
to "ignore" the non-fatal signals sent to the dumping process (and I think
we should do this anyway), this change is not needed.

Except we have other problems with the freezer.

> +static int sigkill_pending(struct task_struct *tsk)
> +{
> +	return	signal_pending(tsk) &&
> +		(sigismember(&tsk->pending.signal, SIGKILL) ||
> +		 sigismember(&tsk->signal->shared_pending.signal, SIGKILL));
> +}

Why? __fatal_signal_pending() is enough, you do not need to check
->shared_pending. And once again, ignoring the freezer problems I
do not think we need this check at all.

IOW. Yes, we will probably need to do this change but only to be
freezer-friendly.

>  static void wait_for_dump_helpers(struct file *file)
>  {
>  	struct pipe_inode_info *pipe;
> +	sigset_t blocked, previous;
> +
> +	/* Block all but fatal signals. */
> +	siginitsetinv(&blocked, sigmask(SIGKILL));
> +	sigprocmask(SIG_BLOCK, &blocked, &previous);

(sigprocmask() must die, please never use, we have set_current_blocked().
 Although in this particular case this doesn't matter...)

Heh. When I suggested this change a looong ago, my attempt was NACK'ed
because the core handler looks at /proc/pid/status.

If we could do this, we could simply ignore all signals except SIGKILL
at the start, in zap_threads(). This could solve almost all problems
with the signals/SIGKILL.

But see above, we can't.

Anyway. Please look at the patch below. I need to recheck it, and I was
going to send it later, along with other changes I am _trying_ to do. But
if it is correct, it looks certainly better and perhaps it can go ahead.

Afaics it could equally fix the mentioned problems (again, if correct ;).
"equally" also means it is equally incomplete.

Oleg.

--- x/fs/coredump.c
+++ x/fs/coredump.c
@@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
 	pipe_lock(pipe);
 	pipe->readers++;
 	pipe->writers--;
+	// TODO: wake_up_interruptible_sync_poll ?
+	wake_up_interruptible_sync(&pipe->wait);
+	kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
+	pipe_unlock(pipe);

-	while ((pipe->readers > 1) && (!signal_pending(current))) {
-		wake_up_interruptible_sync(&pipe->wait);
-		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
-		pipe_wait(pipe);
-	}
+	wait_event_freezekillable(&pipe->wait, pipe->readers == 1);

+	pipe_lock(pipe);
 	pipe->readers--;
 	pipe->writers++;
 	pipe_unlock(pipe);
-
 }

 /*

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-16 17:06   ` Oleg Nesterov
@ 2013-02-16 17:12     ` Oleg Nesterov
  2013-02-20 18:09       ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:12 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

Forgot to mention...

On 02/16, Oleg Nesterov wrote:
> On 02/16, Mandeep Singh Baines wrote:
> >
> > --- a/kernel/freezer.c
> > +++ b/kernel/freezer.c
> > @@ -81,6 +81,9 @@ bool __refrigerator(bool check_kthr_stop)
> >  	 */
> >  	set_current_state(save);
> >
> > +	/* Clear fake signal from freeze_task(). */
> > +	recalc_sigpending();
>
> NACK. We can't do this lockless.
>
> And I am not sure we should reinstantiate it here.

And note that with the fix I propose in reply to 5/5 this change is
not needed.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait
  2013-02-16  9:53 ` [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait Mandeep Singh Baines
@ 2013-02-16 17:17   ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 17:17 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On 02/16, Mandeep Singh Baines wrote:
>
> @@ -483,7 +484,7 @@ static void exit_mm(struct task_struct * tsk)
>  			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
>  			if (!self.task) /* see coredump_finish() */
>  				break;
> -			schedule();
> +			freezable_schedule();

I think this particular change is fine in any case, no matter what
else will we do.

Acked-by: Oleg Nesterov <oleg@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-16 17:10   ` Oleg Nesterov
@ 2013-02-16 19:46     ` Oleg Nesterov
  2013-02-18 23:55       ` Mandeep Singh Baines
  2013-02-19  5:19       ` Mandeep Singh Baines
  0 siblings, 2 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-16 19:46 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On 02/16, Oleg Nesterov wrote:
>
> On 02/16, Mandeep Singh Baines wrote:
> >
> > +static int sigkill_pending(struct task_struct *tsk)
> > +{
> > +	return	signal_pending(tsk) &&
> > +		(sigismember(&tsk->pending.signal, SIGKILL) ||
> > +		 sigismember(&tsk->signal->shared_pending.signal, SIGKILL));
> > +}
>
> Why? __fatal_signal_pending() is enough, you do not need to check
> ->shared_pending. And once again, ignoring the freezer problems I
> do not think we need this check at all.
>
> IOW. Yes, we will probably need to do this change but only to be
> freezer-friendly.

And, forgot to mention, this logic is not right in the multi-
threaded case. I mean, you can't assume that 'kill -9 dumpingtask"
will wake the coredumping thread up. So this sigkill_pending() or
__fatal_signal_pending() check can only work in the single-threaded
case.

> --- x/fs/coredump.c
> +++ x/fs/coredump.c
> @@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
>  	pipe_lock(pipe);
>  	pipe->readers++;
>  	pipe->writers--;
> +	// TODO: wake_up_interruptible_sync_poll ?
> +	wake_up_interruptible_sync(&pipe->wait);
> +	kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> +	pipe_unlock(pipe);
>
> -	while ((pipe->readers > 1) && (!signal_pending(current))) {
> -		wake_up_interruptible_sync(&pipe->wait);
> -		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> -		pipe_wait(pipe);
> -	}
> +	wait_event_freezekillable(&pipe->wait, pipe->readers == 1);

I tried to check (but didn't even try to test). I think this should
work. Assuming that we teach SIGKILL to actually kill the dumper, but
we need this in any case.

But. Then we need to change pipe_release() to use wake_up_sync_poll()
(which we do not have). Probably we can do this... but otoh if we protect
the dumping thread from the non-fatal signals (and again, we need this
anyway ;) then we can simply do wait_event_freezable().

Damn. I need to think more.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-16 19:46     ` Oleg Nesterov
@ 2013-02-18 23:55       ` Mandeep Singh Baines
  2013-02-19 14:18         ` Oleg Nesterov
  2013-02-19  5:19       ` Mandeep Singh Baines
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-18 23:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On Sat, Feb 16, 2013 at 11:46 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/16, Oleg Nesterov wrote:
>>
>> On 02/16, Mandeep Singh Baines wrote:
>> >
>> > +static int sigkill_pending(struct task_struct *tsk)
>> > +{
>> > +   return  signal_pending(tsk) &&
>> > +           (sigismember(&tsk->pending.signal, SIGKILL) ||
>> > +            sigismember(&tsk->signal->shared_pending.signal, SIGKILL));
>> > +}
>>
>> Why? __fatal_signal_pending() is enough, you do not need to check
>> ->shared_pending. And once again, ignoring the freezer problems I
>> do not think we need this check at all.
>>

The problem is that the kill signal remains in shared pending since
it'll never get dequeued.

localhost ~ # kill -KILL $!
localhost ~ # cat /proc/$!/status | grep -A4 SigPnd
SigPnd: 0000000000000000
ShdPnd: 0000000000000100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000

Normally a fatal signal will get propagated to the whole group but
that doesn't happen here because GROUP_EXIT is set:

        /*
         * Found a killable thread.  If the signal will be fatal,
         * then start taking the whole group down immediately.
         */
        if (sig_fatal(p, sig) &&
            !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
            !sigismember(&t->real_blocked, sig) &&
            (sig == SIGKILL || !t->ptrace)) {
                /*
                 * This signal will be fatal to the whole group.
                 */


GROUP_EXIT is set in do_coredump->coredump_wait->zap_threads->zap_process.

What if complete_signal was changed to propagate KILL even if
SIGNAL_GROUP_EXIT is set?

Regards,
Mandeep

>> IOW. Yes, we will probably need to do this change but only to be
>> freezer-friendly.
>
> And, forgot to mention, this logic is not right in the multi-
> threaded case. I mean, you can't assume that 'kill -9 dumpingtask"
> will wake the coredumping thread up. So this sigkill_pending() or
> __fatal_signal_pending() check can only work in the single-threaded
> case.
>
>> --- x/fs/coredump.c
>> +++ x/fs/coredump.c
>> @@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
>>       pipe_lock(pipe);
>>       pipe->readers++;
>>       pipe->writers--;
>> +     // TODO: wake_up_interruptible_sync_poll ?
>> +     wake_up_interruptible_sync(&pipe->wait);
>> +     kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> +     pipe_unlock(pipe);
>>
>> -     while ((pipe->readers > 1) && (!signal_pending(current))) {
>> -             wake_up_interruptible_sync(&pipe->wait);
>> -             kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> -             pipe_wait(pipe);
>> -     }
>> +     wait_event_freezekillable(&pipe->wait, pipe->readers == 1);
>
> I tried to check (but didn't even try to test). I think this should
> work. Assuming that we teach SIGKILL to actually kill the dumper, but
> we need this in any case.
>
> But. Then we need to change pipe_release() to use wake_up_sync_poll()
> (which we do not have). Probably we can do this... but otoh if we protect
> the dumping thread from the non-fatal signals (and again, we need this
> anyway ;) then we can simply do wait_event_freezable().
>
> Damn. I need to think more.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-16 19:46     ` Oleg Nesterov
  2013-02-18 23:55       ` Mandeep Singh Baines
@ 2013-02-19  5:19       ` Mandeep Singh Baines
  2013-02-19 14:27         ` Oleg Nesterov
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-19  5:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On Sat, Feb 16, 2013 at 11:46 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/16, Oleg Nesterov wrote:
>>
>> On 02/16, Mandeep Singh Baines wrote:
>> >
>> > +static int sigkill_pending(struct task_struct *tsk)
>> > +{
>> > +   return  signal_pending(tsk) &&
>> > +           (sigismember(&tsk->pending.signal, SIGKILL) ||
>> > +            sigismember(&tsk->signal->shared_pending.signal, SIGKILL));
>> > +}
>>
>> Why? __fatal_signal_pending() is enough, you do not need to check
>> ->shared_pending. And once again, ignoring the freezer problems I
>> do not think we need this check at all.
>>
>> IOW. Yes, we will probably need to do this change but only to be
>> freezer-friendly.
>
> And, forgot to mention, this logic is not right in the multi-
> threaded case. I mean, you can't assume that 'kill -9 dumpingtask"
> will wake the coredumping thread up. So this sigkill_pending() or
> __fatal_signal_pending() check can only work in the single-threaded
> case.
>
>> --- x/fs/coredump.c
>> +++ x/fs/coredump.c
>> @@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
>>       pipe_lock(pipe);
>>       pipe->readers++;
>>       pipe->writers--;
>> +     // TODO: wake_up_interruptible_sync_poll ?
>> +     wake_up_interruptible_sync(&pipe->wait);
>> +     kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> +     pipe_unlock(pipe);
>>
>> -     while ((pipe->readers > 1) && (!signal_pending(current))) {
>> -             wake_up_interruptible_sync(&pipe->wait);
>> -             kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> -             pipe_wait(pipe);
>> -     }
>> +     wait_event_freezekillable(&pipe->wait, pipe->readers == 1);
>
> I tried to check (but didn't even try to test). I think this should
> work. Assuming that we teach SIGKILL to actually kill the dumper, but
> we need this in any case.
>
> But. Then we need to change pipe_release() to use wake_up_sync_poll()
> (which we do not have). Probably we can do this... but otoh if we protect
> the dumping thread from the non-fatal signals (and again, we need this
> anyway ;) then we can simply do wait_event_freezable().
>

I like this patch.

Could we ignore/drop signals when SIGNAL_GROUP_EXIT but allow SIGKILL.

For SIGKILL, wake_up everybody (signal_complete sort of already does this).

You'd need to prevent the fake signal from freeezer from setting
TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.

Regards,
Mandeep

> Damn. I need to think more.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-18 23:55       ` Mandeep Singh Baines
@ 2013-02-19 14:18         ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-19 14:18 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On 02/18, Mandeep Singh Baines wrote:
>
> On Sat, Feb 16, 2013 at 11:46 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >>
> >> Why? __fatal_signal_pending() is enough, you do not need to check
> >> ->shared_pending. And once again, ignoring the freezer problems I
> >> do not think we need this check at all.
> >>
>
> The problem is that the kill signal remains in shared pending since
> it'll never get dequeued.
>
> localhost ~ # kill -KILL $!
> localhost ~ # cat /proc/$!/status | grep -A4 SigPnd
> SigPnd: 0000000000000000
> ShdPnd: 0000000000000100
> SigBlk: 0000000000000000
> SigIgn: 0000000000000000
> SigCgt: 0000000000000000
>
> Normally a fatal signal will get propagated to the whole group but
> that doesn't happen here because GROUP_EXIT is set:

Exactly!

>From the changelog in
"[PATCH 2/3] coredump: ensure that SIGKILL always kills the dumping thread"

	even if the dumping process is single-threaded
	...
	the group-wide SIGKILL is not recorded in task->pending
	and thus __fatal_signal_pending() won't be true.

Another reason why I think we should fix the underlying problem(s)
instead of adding more hacks,

> What if complete_signal was changed to propagate KILL even if
> SIGNAL_GROUP_EXIT is set?

See above, I think we can do better. And once again, 1/3 alone should
fix this problem with the non-fatal signals.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-19  5:19       ` Mandeep Singh Baines
@ 2013-02-19 14:27         ` Oleg Nesterov
  2013-02-19 19:33           ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-19 14:27 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On 02/18, Mandeep Singh Baines wrote:
>
> On Sat, Feb 16, 2013 at 11:46 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >> --- x/fs/coredump.c
> >> +++ x/fs/coredump.c
> >> @@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
> >>       pipe_lock(pipe);
> >>       pipe->readers++;
> >>       pipe->writers--;
> >> +     // TODO: wake_up_interruptible_sync_poll ?
> >> +     wake_up_interruptible_sync(&pipe->wait);
> >> +     kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> >> +     pipe_unlock(pipe);
> >>
> >> -     while ((pipe->readers > 1) && (!signal_pending(current))) {
> >> -             wake_up_interruptible_sync(&pipe->wait);
> >> -             kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> >> -             pipe_wait(pipe);
> >> -     }
> >> +     wait_event_freezekillable(&pipe->wait, pipe->readers == 1);
> >
> > I tried to check (but didn't even try to test). I think this should
> > work. Assuming that we teach SIGKILL to actually kill the dumper, but
> > we need this in any case.
> >
> > But. Then we need to change pipe_release() to use wake_up_sync_poll()
> > (which we do not have). Probably we can do this... but otoh if we protect
> > the dumping thread from the non-fatal signals (and again, we need this
> > anyway ;) then we can simply do wait_event_freezable().
> >
>
> I like this patch.
>
> Could we ignore/drop signals when SIGNAL_GROUP_EXIT but allow SIGKILL.
>
> For SIGKILL, wake_up everybody (signal_complete sort of already does this).

Please look at 1-3 I sent. Btw, I slightly tested this series, seems
to work...

> You'd need to prevent the fake signal from freeezer from setting
> TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.

I am thinking about checking SIGNAL_GROUP_COREDUMP but I am not sure,
perhaps we can make a simpler solution. As for wait_for_dump_helper()
we do not need any check at all, but we should either fix
wait_event_freezable (it is actually not right) or change pipe_release()
to use TASK_NORMAL instead of TASK_INTERRUPTIBLE.



Mandeep, Andrew, I am really sorry.

I tried to do these changes many times, but _every_ time I had some
urgent and unexpected work. This time too. I'll try very much to return
on Friday.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-19 14:27         ` Oleg Nesterov
@ 2013-02-19 19:33           ` Mandeep Singh Baines
  2013-02-19 19:45             ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-19 19:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On Tue, Feb 19, 2013 at 6:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/18, Mandeep Singh Baines wrote:
>>
>> On Sat, Feb 16, 2013 at 11:46 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >> --- x/fs/coredump.c
>> >> +++ x/fs/coredump.c
>> >> @@ -416,17 +416,17 @@ static void wait_for_dump_helpers(struct
>> >>       pipe_lock(pipe);
>> >>       pipe->readers++;
>> >>       pipe->writers--;
>> >> +     // TODO: wake_up_interruptible_sync_poll ?
>> >> +     wake_up_interruptible_sync(&pipe->wait);
>> >> +     kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> >> +     pipe_unlock(pipe);
>> >>
>> >> -     while ((pipe->readers > 1) && (!signal_pending(current))) {
>> >> -             wake_up_interruptible_sync(&pipe->wait);
>> >> -             kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>> >> -             pipe_wait(pipe);
>> >> -     }
>> >> +     wait_event_freezekillable(&pipe->wait, pipe->readers == 1);
>> >
>> > I tried to check (but didn't even try to test). I think this should
>> > work. Assuming that we teach SIGKILL to actually kill the dumper, but
>> > we need this in any case.
>> >
>> > But. Then we need to change pipe_release() to use wake_up_sync_poll()
>> > (which we do not have). Probably we can do this... but otoh if we protect
>> > the dumping thread from the non-fatal signals (and again, we need this
>> > anyway ;) then we can simply do wait_event_freezable().
>> >
>>
>> I like this patch.
>>
>> Could we ignore/drop signals when SIGNAL_GROUP_EXIT but allow SIGKILL.
>>
>> For SIGKILL, wake_up everybody (signal_complete sort of already does this).
>
> Please look at 1-3 I sent. Btw, I slightly tested this series, seems
> to work...
>

They look good to me. I plan on applying them to our tree since we
need a fix ASAP.

>> You'd need to prevent the fake signal from freeezer from setting
>> TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.
>
> I am thinking about checking SIGNAL_GROUP_COREDUMP but I am not sure,
> perhaps we can make a simpler solution. As for wait_for_dump_helper()
> we do not need any check at all, but we should either fix
> wait_event_freezable (it is actually not right) or change pipe_release()

Is the bug that it will exit on the fake_signal.

> to use TASK_NORMAL instead of TASK_INTERRUPTIBLE.
>

I don't think that bug will affects this patch though.  I think this
should all work if we add a check to freezer.c (or something similar
that is cleaner).

If you add SIGNAL_GROUP_COREDUMP check to freezer.c:

static void fake_signal_wake_up(struct task_struct *p)
{
        unsigned long flags;

        if (lock_task_sighand(p, &flags)) {
-                signal_wake_up(p, 0);
+              if (!p->signal->flags & SIGNAL_GROUP_COREDUMP)
+                              signal_wake_up(p, 0);
                unlock_task_sighand(p, &flags);
        }
}

And change the wait_event_freezekillable() in this patch to just
wait_event_freezable(), shouldn't that just work.

The fake signal will never get sent.

Regards,
Mandeep

>
>
> Mandeep, Andrew, I am really sorry.
>
> I tried to do these changes many times, but _every_ time I had some
> urgent and unexpected work. This time too. I'll try very much to return
> on Friday.
>
> Oleg.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-19 19:33           ` Mandeep Singh Baines
@ 2013-02-19 19:45             ` Oleg Nesterov
  2013-02-19 20:20               ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-19 19:45 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On 02/19, Mandeep Singh Baines wrote:
>
> On Tue, Feb 19, 2013 at 6:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Please look at 1-3 I sent. Btw, I slightly tested this series, seems
> > to work...
> >
>
> They look good to me. I plan on applying them to our tree since we
> need a fix ASAP.

Great!

> >> You'd need to prevent the fake signal from freeezer from setting
> >> TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.
> >
> > I am thinking about checking SIGNAL_GROUP_COREDUMP but I am not sure,
> > perhaps we can make a simpler solution. As for wait_for_dump_helper()
> > we do not need any check at all, but we should either fix
> > wait_event_freezable (it is actually not right) or change pipe_release()
>
> Is the bug that it will exit on the fake_signal.

Yes, I understand, but

> I don't think that bug will affects this patch though.  I think this
> should all work if we add a check to freezer.c (or something similar
> that is cleaner).
>
> If you add SIGNAL_GROUP_COREDUMP check to freezer.c:
>
> static void fake_signal_wake_up(struct task_struct *p)
> {
>         unsigned long flags;
>
>         if (lock_task_sighand(p, &flags)) {
> -                signal_wake_up(p, 0);
> +              if (!p->signal->flags & SIGNAL_GROUP_COREDUMP)
> +                              signal_wake_up(p, 0);
>                 unlock_task_sighand(p, &flags);
>         }
> }
>
> And change the wait_event_freezekillable() in this patch to just
> wait_event_freezable(), shouldn't that just work.

I doubt,

> The fake signal will never get sent.

Yes but try_to_freeze_tasks() can fail.

And once again, if wait_event_freezable() was correct we do not care
about the fake signal (in wait_for_dump_helper), so we do not need
to change fake_signal_wake_up. So perhaps we should fix it but this
needs some discussion.

Sorry again for the terse reply (and perhaps I misunderstood you),
I'll try to return to this problem asap. In any case I still think
we should do the freezer fixes on top of signal fixes I sent, and
you seem to agree. Good ;)

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-19 19:45             ` Oleg Nesterov
@ 2013-02-19 20:20               ` Mandeep Singh Baines
  2013-02-20 23:30                 ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-19 20:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On Tue, Feb 19, 2013 at 11:45 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/19, Mandeep Singh Baines wrote:
>>
>> On Tue, Feb 19, 2013 at 6:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > Please look at 1-3 I sent. Btw, I slightly tested this series, seems
>> > to work...
>> >
>>
>> They look good to me. I plan on applying them to our tree since we
>> need a fix ASAP.
>
> Great!
>
>> >> You'd need to prevent the fake signal from freeezer from setting
>> >> TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.
>> >
>> > I am thinking about checking SIGNAL_GROUP_COREDUMP but I am not sure,
>> > perhaps we can make a simpler solution. As for wait_for_dump_helper()
>> > we do not need any check at all, but we should either fix
>> > wait_event_freezable (it is actually not right) or change pipe_release()
>>
>> Is the bug that it will exit on the fake_signal.
>
> Yes, I understand, but
>
>> I don't think that bug will affects this patch though.  I think this
>> should all work if we add a check to freezer.c (or something similar
>> that is cleaner).
>>
>> If you add SIGNAL_GROUP_COREDUMP check to freezer.c:
>>
>> static void fake_signal_wake_up(struct task_struct *p)
>> {
>>         unsigned long flags;
>>
>>         if (lock_task_sighand(p, &flags)) {
>> -                signal_wake_up(p, 0);
>> +              if (!p->signal->flags & SIGNAL_GROUP_COREDUMP)
>> +                              signal_wake_up(p, 0);
>>                 unlock_task_sighand(p, &flags);
>>         }
>> }
>>
>> And change the wait_event_freezekillable() in this patch to just
>> wait_event_freezable(), shouldn't that just work.
>
> I doubt,
>
>> The fake signal will never get sent.
>
> Yes but try_to_freeze_tasks() can fail.
>

Ah. Good point. How about this then:

/* can't use wait_event_freezable since we suppress the fake signal on
SIGNAL_GROUP_COREDUMP */
freezer_do_not_count();
wait_event_interruptible(pipe->wait, pipe->readers == 1);
freezer_count();

Regards,
Mandeep

> And once again, if wait_event_freezable() was correct we do not care
> about the fake signal (in wait_for_dump_helper), so we do not need
> to change fake_signal_wake_up. So perhaps we should fix it but this
> needs some discussion.
>
> Sorry again for the terse reply (and perhaps I misunderstood you),
> I'll try to return to this problem asap. In any case I still think
> we should do the freezer fixes on top of signal fixes I sent, and
> you seem to agree. Good ;)
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK
  2013-02-16 17:05 ` [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Oleg Nesterov
@ 2013-02-20  0:07   ` Mandeep Singh Baines
  2013-02-20  1:41     ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20  0:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On Sat, Feb 16, 2013 at 9:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/16, Mandeep Singh Baines wrote:
>>
>> We don't need to call freezer_do_not_count() for in-kernel users
>> of CLONE_VFORK since exec will get called in bounded time.
>>
>> We don't want to call freezer_count() for in-kernel users because
>> they may be holding locks. freezer_count() calls try_to_freeze().
>> We don't want to freeze an in-kernel user because it may be
>> holding locks.
>
> I can only repeat my question ;)
>
>         Who? We should not do this anyway. And __call_usermodehelper() doesn't
>         afaics.
>
>         OK, its caller (process_one_work) does lock_map_acquire() for debugging
>         purposes, this can "confuse" print_held_locks_bug(). But this thread is
>         PF_NOFREEZE ?
>
> Previously this was needed to suppress the false positive. Now that 2/5
> checks PF_NOFREEZE, why do we need this change?
>

After applying the PF_NOFREEZE check, I still get the following:

[    1.001030] =======================================
[    1.001039] [ BUG: lock held while trying to freeze! ]
[    1.001048] 3.4.0 #24 Not tainted
[    1.001053] ---------------------------------------
[    1.001060] kworker/u:0/5 is exiting with locks still held!
[    1.001068] 2 locks held by kworker/u:0/5:
[    1.001073]  #0:  (khelper){.+.+.+}, at: [<8103896f>]
process_one_work+0x108/0
x2ee
[    1.001095]  #1:  ((&sub_info->work)){+.+.+.}, at: [<8103896f>]
process_one_wo
rk+0x108/0x2ee
[    1.001111]
[    1.001113] stack backtrace:
[    1.001119] Pid: 5, comm: kworker/u:0 Not tainted 3.4.0 #24
[    1.001124] Call Trace:
[    1.001135]  [<81025bd6>] ? console_unlock+0x17a/0x18b
[    1.001146]  [<8105d68e>] debug_check_no_locks_held+0x82/0x8a
[    1.001156]  [<8102493f>] do_fork+0x20d/0x2ac
[    1.001167]  [<810366cb>] ? call_usermodehelper_setup+0x8c/0x8c
[    1.001177]  [<81008310>] kernel_thread+0x7a/0x82
[    1.001186]  [<810366cb>] ? call_usermodehelper_setup+0x8c/0x8c
[    1.001198]  [<814b45dc>] ? common_interrupt+0x3c/0x3c
[    1.001208]  [<810365e8>] __call_usermodehelper+0x3b/0x71
[    1.001216]  [<810389ce>] process_one_work+0x167/0x2ee
[    1.001226]  [<810365ad>] ? call_usermodehelper_freeinfo+0x1e/0x1e
[    1.001235]  [<81038dbc>] worker_thread+0xbd/0x18b
[    1.001244]  [<81038cff>] ? rescuer_thread+0x184/0x184
[    1.001254]  [<8103c636>] kthread+0x77/0x7c
[    1.001264]  [<8103c5bf>] ? kthread_freezable_should_stop+0x4a/0x4a
[    1.001273]  [<814b45e2>] kernel_thread_helper+0x6/0x10

Regards,
Mandeep

>> @@ -722,9 +722,11 @@ static int wait_for_vfork_done(struct task_struct *child,
>>  {
>>       int killed;
>>
>> -     freezer_do_not_count();
>> +     if (!(current->flags & PF_KTHREAD))
>> +             freezer_do_not_count();
>
> If I missed something and we really need this, imho this needs a comment.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK
  2013-02-20  0:07   ` Mandeep Singh Baines
@ 2013-02-20  1:41     ` Mandeep Singh Baines
  0 siblings, 0 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20  1:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On Tue, Feb 19, 2013 at 4:07 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
> On Sat, Feb 16, 2013 at 9:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 02/16, Mandeep Singh Baines wrote:
>>>
>>> We don't need to call freezer_do_not_count() for in-kernel users
>>> of CLONE_VFORK since exec will get called in bounded time.
>>>
>>> We don't want to call freezer_count() for in-kernel users because
>>> they may be holding locks. freezer_count() calls try_to_freeze().
>>> We don't want to freeze an in-kernel user because it may be
>>> holding locks.
>>
>> I can only repeat my question ;)
>>
>>         Who? We should not do this anyway. And __call_usermodehelper() doesn't
>>         afaics.
>>
>>         OK, its caller (process_one_work) does lock_map_acquire() for debugging
>>         purposes, this can "confuse" print_held_locks_bug(). But this thread is
>>         PF_NOFREEZE ?
>>
>> Previously this was needed to suppress the false positive. Now that 2/5
>> checks PF_NOFREEZE, why do we need this change?
>>
>
> After applying the PF_NOFREEZE check, I still get the following:
>
> [    1.001030] =======================================
> [    1.001039] [ BUG: lock held while trying to freeze! ]
> [    1.001048] 3.4.0 #24 Not tainted
> [    1.001053] ---------------------------------------
> [    1.001060] kworker/u:0/5 is exiting with locks still held!
> [    1.001068] 2 locks held by kworker/u:0/5:
> [    1.001073]  #0:  (khelper){.+.+.+}, at: [<8103896f>]
> process_one_work+0x108/0
> x2ee
> [    1.001095]  #1:  ((&sub_info->work)){+.+.+.}, at: [<8103896f>]
> process_one_wo
> rk+0x108/0x2ee
> [    1.001111]
> [    1.001113] stack backtrace:
> [    1.001119] Pid: 5, comm: kworker/u:0 Not tainted 3.4.0 #24
> [    1.001124] Call Trace:
> [    1.001135]  [<81025bd6>] ? console_unlock+0x17a/0x18b
> [    1.001146]  [<8105d68e>] debug_check_no_locks_held+0x82/0x8a
> [    1.001156]  [<8102493f>] do_fork+0x20d/0x2ac
> [    1.001167]  [<810366cb>] ? call_usermodehelper_setup+0x8c/0x8c
> [    1.001177]  [<81008310>] kernel_thread+0x7a/0x82
> [    1.001186]  [<810366cb>] ? call_usermodehelper_setup+0x8c/0x8c
> [    1.001198]  [<814b45dc>] ? common_interrupt+0x3c/0x3c
> [    1.001208]  [<810365e8>] __call_usermodehelper+0x3b/0x71
> [    1.001216]  [<810389ce>] process_one_work+0x167/0x2ee
> [    1.001226]  [<810365ad>] ? call_usermodehelper_freeinfo+0x1e/0x1e
> [    1.001235]  [<81038dbc>] worker_thread+0xbd/0x18b
> [    1.001244]  [<81038cff>] ? rescuer_thread+0x184/0x184
> [    1.001254]  [<8103c636>] kthread+0x77/0x7c
> [    1.001264]  [<8103c5bf>] ? kthread_freezable_should_stop+0x4a/0x4a
> [    1.001273]  [<814b45e2>] kernel_thread_helper+0x6/0x10
>

D'oh. I had the logic in my patch inverted. Ignore the trace.

> Regards,
> Mandeep
>
>>> @@ -722,9 +722,11 @@ static int wait_for_vfork_done(struct task_struct *child,
>>>  {
>>>       int killed;
>>>
>>> -     freezer_do_not_count();
>>> +     if (!(current->flags & PF_KTHREAD))
>>> +             freezer_do_not_count();
>>
>> If I missed something and we really need this, imho this needs a comment.
>>
>> Oleg.
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-16 17:06   ` Oleg Nesterov
@ 2013-02-20  1:57     ` Mandeep Singh Baines
  2013-02-20 10:37       ` Ingo Molnar
  2013-02-20 22:30       ` Oleg Nesterov
  0 siblings, 2 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20  1:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

We shouldn't try_to_freeze if locks are held. Verified that
I get no lockdep warnings after applying this patch and
"vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".

Changes since v1:
* LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterov
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: Ingo Molnar <mingo@redhat.com>
---
 include/linux/debug_locks.h |  5 +++--
 include/linux/freezer.h     |  4 ++++
 kernel/exit.c               |  2 +-
 kernel/lockdep.c            | 10 +++++-----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..e6c3c85 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,8 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(struct task_struct *task,
+				      const char *msg);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +68,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..27ea080 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include <linux/debug_locks.h>
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
@@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+	if (!(current->flags & PF_NOFREEZE))
+		debug_check_no_locks_held(current,
+					  "lock held while trying to freeze");
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..88a706c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
 	/*
 	 * Make sure we are holding no locks:
 	 */
-	debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held(tsk, "lock held at task exit time");
 	/*
 	 * We can do this unlocked here. The futex code uses this flag
 	 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..f257b6d 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(struct task_struct *curr, const char *msg)
 {
 	if (!debug_locks_off())
 		return;
@@ -4092,10 +4092,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 
 	printk("\n");
 	printk("=====================================\n");
-	printk("[ BUG: lock held at task exit time! ]\n");
+	printk("[ BUG: %s! ]\n", msg);
 	print_kernel_ident();
 	printk("-------------------------------------\n");
-	printk("%s/%d is exiting with locks still held!\n",
+	printk("%s/%d still has locks held!\n",
 		curr->comm, task_pid_nr(curr));
 	lockdep_print_held_locks(curr);
 
@@ -4103,10 +4103,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 	dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 	if (unlikely(task->lockdep_depth > 0))
-		print_held_locks_bug(task);
+		print_held_locks_bug(task, msg);
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-20  1:57     ` [PATCH v3] " Mandeep Singh Baines
@ 2013-02-20 10:37       ` Ingo Molnar
  2013-02-20 12:55         ` Rafael J. Wysocki
  2013-02-20 22:30       ` Oleg Nesterov
  1 sibling, 1 reply; 45+ messages in thread
From: Ingo Molnar @ 2013-02-20 10:37 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar


* Mandeep Singh Baines <msb@chromium.org> wrote:

> We shouldn't try_to_freeze if locks are held. Verified that
> I get no lockdep warnings after applying this patch and
> "vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".
> 
> Changes since v1:
> * LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
>   * Added a msg string that gets passed in.
> * LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
>   * Check PF_NOFREEZE in try_to_freeze().
> Changes since v2:
> * LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterov
>   * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
> * Mandeep Singh Baines
>   * Generalize an exit specific printk.
> 
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Rafael J. Wysocki <rjw@sisk.pl>
> CC: Ingo Molnar <mingo@redhat.com>

Looks good to me now.

Acked-by: Ingo Molnar <mingo@kernel.org>

Which tree should this go through?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-20 10:37       ` Ingo Molnar
@ 2013-02-20 12:55         ` Rafael J. Wysocki
  2013-02-20 13:52           ` Ingo Molnar
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2013-02-20 12:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, linux-kernel, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Ingo Molnar

On Wednesday, February 20, 2013 11:37:19 AM Ingo Molnar wrote:
> 
> * Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > We shouldn't try_to_freeze if locks are held. Verified that
> > I get no lockdep warnings after applying this patch and
> > "vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".
> > 
> > Changes since v1:
> > * LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
> >   * Added a msg string that gets passed in.
> > * LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
> >   * Check PF_NOFREEZE in try_to_freeze().
> > Changes since v2:
> > * LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterov
> >   * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
> > * Mandeep Singh Baines
> >   * Generalize an exit specific printk.
> > 
> > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > CC: Oleg Nesterov <oleg@redhat.com>
> > CC: Tejun Heo <tj@kernel.org>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Rafael J. Wysocki <rjw@sisk.pl>
> > CC: Ingo Molnar <mingo@redhat.com>
> 
> Looks good to me now.
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> Which tree should this go through?

Well, I can take it if that's OK.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-20 12:55         ` Rafael J. Wysocki
@ 2013-02-20 13:52           ` Ingo Molnar
  0 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2013-02-20 13:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mandeep Singh Baines, linux-kernel, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Ingo Molnar


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Wednesday, February 20, 2013 11:37:19 AM Ingo Molnar wrote:
> > 
> > * Mandeep Singh Baines <msb@chromium.org> wrote:
> > 
> > > We shouldn't try_to_freeze if locks are held. Verified that
> > > I get no lockdep warnings after applying this patch and
> > > "vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".
> > > 
> > > Changes since v1:
> > > * LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
> > >   * Added a msg string that gets passed in.
> > > * LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
> > >   * Check PF_NOFREEZE in try_to_freeze().
> > > Changes since v2:
> > > * LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterov
> > >   * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
> > > * Mandeep Singh Baines
> > >   * Generalize an exit specific printk.
> > > 
> > > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > > CC: Oleg Nesterov <oleg@redhat.com>
> > > CC: Tejun Heo <tj@kernel.org>
> > > CC: Andrew Morton <akpm@linux-foundation.org>
> > > CC: Rafael J. Wysocki <rjw@sisk.pl>
> > > CC: Ingo Molnar <mingo@redhat.com>
> > 
> > Looks good to me now.
> > 
> > Acked-by: Ingo Molnar <mingo@kernel.org>
> > 
> > Which tree should this go through?
> 
> Well, I can take it if that's OK.

Sure, that's fine with me - this is mainly a freezer feature 
after all.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-16 17:12     ` Oleg Nesterov
@ 2013-02-20 18:09       ` Mandeep Singh Baines
  2013-02-23 19:41         ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20 18:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki,
	Ingo Molnar, adurbin

On Sat, Feb 16, 2013 at 9:12 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Forgot to mention...
>
> On 02/16, Oleg Nesterov wrote:
>> On 02/16, Mandeep Singh Baines wrote:
>> >
>> > --- a/kernel/freezer.c
>> > +++ b/kernel/freezer.c
>> > @@ -81,6 +81,9 @@ bool __refrigerator(bool check_kthr_stop)
>> >      */
>> >     set_current_state(save);
>> >
>> > +   /* Clear fake signal from freeze_task(). */
>> > +   recalc_sigpending();
>>
>> NACK. We can't do this lockless.
>>
>> And I am not sure we should reinstantiate it here.
>
> And note that with the fix I propose in reply to 5/5 this change is
> not needed.
>

I think we need something like this in order to be able to fix
wait_event_freezable and friends. Here is one idea:

#define __wait_event_freezable(wq, condition, ret)                  \
do {                                                                    \
        DEFINE_WAIT(__wait);                                            \
                                                                        \
        for (;;) {                                                      \
                prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);      \
                if (condition)                                          \
                        break;                                          \
                if (!signal_pending(current)) {                         \
                        freezable_schedule();
           \
                        continue;                                       \
                }                                                       \
                ret = -ERESTARTSYS;                                     \
                break;                                                  \
        }                                                               \
        finish_wait(&wq, &__wait);                                      \
} while (0)


#define wait_event_freezable(wq, condition)                         \
({                                                                      \
        int __ret = 0;                                                  \
        if (!(condition))                                               \
                __wait_event_freezable(wq, condition, __ret);       \
        __ret;                                                          \
})


If you cleaned up the fake signal in __refrigerator() then this
implementation of wait_event_freezable should address all the fake
signal bugs.
By the time you get to signal_pending() the fake signal is cleared and
the bug it avoided.

Regards,
Mandeep

> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-20  1:57     ` [PATCH v3] " Mandeep Singh Baines
  2013-02-20 10:37       ` Ingo Molnar
@ 2013-02-20 22:30       ` Oleg Nesterov
  2013-02-20 23:11         ` Mandeep Singh Baines
  2013-02-20 23:17         ` [PATCH v4] " Mandeep Singh Baines
  1 sibling, 2 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-20 22:30 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On 02/19, Mandeep Singh Baines wrote:
>
> We shouldn't try_to_freeze if locks are held. Verified that
> I get no lockdep warnings after applying this patch and
> "vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".

I believe the patch is fine.

But I don't understand the changelog, unless I am totally confused
(and I have a lot of unread emails so perhaps I missed something),
this version do not need the changes in CLONE_VFORK.

And, Mandeep, I'll try to answer your emails asap...

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v3] lockdep: check that no locks held at freeze time
  2013-02-20 22:30       ` Oleg Nesterov
@ 2013-02-20 23:11         ` Mandeep Singh Baines
  2013-02-20 23:17         ` [PATCH v4] " Mandeep Singh Baines
  1 sibling, 0 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20 23:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki, Ingo Molnar

On Wed, Feb 20, 2013 at 2:30 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/19, Mandeep Singh Baines wrote:
>>
>> We shouldn't try_to_freeze if locks are held. Verified that
>> I get no lockdep warnings after applying this patch and
>> "vfork: don't freezer_count() for in-kernel users of CLONE_VFORK".
>
> I believe the patch is fine.
>
> But I don't understand the changelog, unless I am totally confused
> (and I have a lot of unread emails so perhaps I missed something),
> this version do not need the changes in CLONE_VFORK.
>

Thanks for catching that. The changelog is stale. I'll resend with a
new changelog that omits that sentence.

> And, Mandeep, I'll try to answer your emails asap...
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-20 22:30       ` Oleg Nesterov
  2013-02-20 23:11         ` Mandeep Singh Baines
@ 2013-02-20 23:17         ` Mandeep Singh Baines
  2013-02-20 23:24           ` Andrew Morton
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20 23:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Oleg Nesterov, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki

We shouldn't try_to_freeze if locks are held.

Changes since v1:
* LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterovw
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.
Changes since v3:
* LKML: <20130220223013.GA15760@redhat.com> Oleg Nesterovw
  * Remove stale vfork comment from commit message.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/debug_locks.h |  5 +++--
 include/linux/freezer.h     |  4 ++++
 kernel/exit.c               |  2 +-
 kernel/lockdep.c            | 10 +++++-----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..e6c3c85 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,8 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(struct task_struct *task,
+				      const char *msg);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +68,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..27ea080 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include <linux/debug_locks.h>
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
@@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+	if (!(current->flags & PF_NOFREEZE))
+		debug_check_no_locks_held(current,
+					  "lock held while trying to freeze");
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..88a706c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
 	/*
 	 * Make sure we are holding no locks:
 	 */
-	debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held(tsk, "lock held at task exit time");
 	/*
 	 * We can do this unlocked here. The futex code uses this flag
 	 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..f257b6d 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(struct task_struct *curr, const char *msg)
 {
 	if (!debug_locks_off())
 		return;
@@ -4092,10 +4092,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 
 	printk("\n");
 	printk("=====================================\n");
-	printk("[ BUG: lock held at task exit time! ]\n");
+	printk("[ BUG: %s! ]\n", msg);
 	print_kernel_ident();
 	printk("-------------------------------------\n");
-	printk("%s/%d is exiting with locks still held!\n",
+	printk("%s/%d still has locks held!\n",
 		curr->comm, task_pid_nr(curr));
 	lockdep_print_held_locks(curr);
 
@@ -4103,10 +4103,10 @@ static void print_held_locks_bug(struct task_struct *curr)
 	dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(struct task_struct *task, const char *msg)
 {
 	if (unlikely(task->lockdep_depth > 0))
-		print_held_locks_bug(task);
+		print_held_locks_bug(task, msg);
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-20 23:17         ` [PATCH v4] " Mandeep Singh Baines
@ 2013-02-20 23:24           ` Andrew Morton
  2013-02-21  0:17             ` Mandeep Singh Baines
  2013-02-21  3:17             ` [PATCH v5] " Mandeep Singh Baines
  0 siblings, 2 replies; 45+ messages in thread
From: Andrew Morton @ 2013-02-20 23:24 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, 20 Feb 2013 15:17:16 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> We shouldn't try_to_freeze if locks are held.
> 
> ...
>
> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>  
> +	if (!(current->flags & PF_NOFREEZE))
> +		debug_check_no_locks_held(current,
> +
>					  "lock held while trying to freeze");
> ...
>
> +	debug_check_no_locks_held(tsk, "lock held at task exit time");

There doesn't seem much point in adding the `msg' to
debug_check_no_locks_held() - the dump_stack() in
print_held_locks_bug() will tell us the same thing.  Maybe just change
the print_held_locks_bug() messages so they stop assuming they were
called from do_exit()?

Also, I wonder if the `tsk' arg is needed.  In both callers
tsk==current.  Is it likely that we'll ever call
debug_check_no_locks_held() for any task other than `current'?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-19 20:20               ` Mandeep Singh Baines
@ 2013-02-20 23:30                 ` Mandeep Singh Baines
  2013-02-23 19:21                   ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-20 23:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

On Tue, Feb 19, 2013 at 12:20 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
> On Tue, Feb 19, 2013 at 11:45 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 02/19, Mandeep Singh Baines wrote:
>>>
>>> On Tue, Feb 19, 2013 at 6:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> > Please look at 1-3 I sent. Btw, I slightly tested this series, seems
>>> > to work...
>>> >
>>>
>>> They look good to me. I plan on applying them to our tree since we
>>> need a fix ASAP.
>>
>> Great!
>>
>>> >> You'd need to prevent the fake signal from freeezer from setting
>>> >> TIF_SIGPENDING. Maybe just add a SIGNAL_GROUP_EXIT check in freezer.c.
>>> >
>>> > I am thinking about checking SIGNAL_GROUP_COREDUMP but I am not sure,
>>> > perhaps we can make a simpler solution. As for wait_for_dump_helper()
>>> > we do not need any check at all, but we should either fix
>>> > wait_event_freezable (it is actually not right) or change pipe_release()
>>>
>>> Is the bug that it will exit on the fake_signal.
>>
>> Yes, I understand, but
>>
>>> I don't think that bug will affects this patch though.  I think this
>>> should all work if we add a check to freezer.c (or something similar
>>> that is cleaner).
>>>
>>> If you add SIGNAL_GROUP_COREDUMP check to freezer.c:
>>>
>>> static void fake_signal_wake_up(struct task_struct *p)
>>> {
>>>         unsigned long flags;
>>>
>>>         if (lock_task_sighand(p, &flags)) {
>>> -                signal_wake_up(p, 0);
>>> +              if (!p->signal->flags & SIGNAL_GROUP_COREDUMP)
>>> +                              signal_wake_up(p, 0);
>>>                 unlock_task_sighand(p, &flags);
>>>         }
>>> }
>>>
>>> And change the wait_event_freezekillable() in this patch to just
>>> wait_event_freezable(), shouldn't that just work.
>>
>> I doubt,
>>
>>> The fake signal will never get sent.
>>
>> Yes but try_to_freeze_tasks() can fail.
>>
>
> Ah. Good point. How about this then:
>
> /* can't use wait_event_freezable since we suppress the fake signal on
> SIGNAL_GROUP_COREDUMP */
> freezer_do_not_count();
> wait_event_interruptible(pipe->wait, pipe->readers == 1);
> freezer_count();
>

No that won't work either. You can't block the fake signal. Since the
dump catcher can keep pipe_write blocked for unbounded time,
user-space can block suspend for unbounded time.

Even worse than bad core dumps is unreliable suspend/resume.

Here the patch I eventually applied to out tree:

{
        struct pipe_inode_info *pipe;

        pipe = file->f_path.dentry->d_inode->i_pipe;

        pipe_lock(pipe);
        pipe->readers++;
        pipe->writers--;

        while (pipe->readers > 1) {
                unsigned long flags;

                wake_up_interruptible_sync(&pipe->wait);
                kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
                pipe_wait(pipe);

                pipe_unlock(pipe);
                try_to_freeze();
                pipe_lock(pipe);

                if (fatal_signal_pending(current))
                        break;

                /* Clear fake signal from freeze_task(). */
                spin_lock_irqsave(&current->sighand->siglock, flags);
                recalc_sigpending();
                spin_unlock_irqrestore(&current->sighand->siglock, flags);
        }

        pipe->readers--;
        pipe->writers++;
        pipe_unlock(pipe);

}

On suspend, you might truncate a core-dump but at least you can
reliably suspend.

Regards,
Mandeep

> Regards,
> Mandeep
>
>> And once again, if wait_event_freezable() was correct we do not care
>> about the fake signal (in wait_for_dump_helper), so we do not need
>> to change fake_signal_wake_up. So perhaps we should fix it but this
>> needs some discussion.
>>
>> Sorry again for the terse reply (and perhaps I misunderstood you),
>> I'll try to return to this problem asap. In any case I still think
>> we should do the freezer fixes on top of signal fixes I sent, and
>> you seem to agree. Good ;)
>>
>> Oleg.
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-20 23:24           ` Andrew Morton
@ 2013-02-21  0:17             ` Mandeep Singh Baines
  2013-02-21  0:20               ` Andrew Morton
  2013-02-21  3:17             ` [PATCH v5] " Mandeep Singh Baines
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21  0:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 20 Feb 2013 15:17:16 -0800
> Mandeep Singh Baines <msb@chromium.org> wrote:
>
>> We shouldn't try_to_freeze if locks are held.
>>
>> ...
>>
>> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>>
>> +     if (!(current->flags & PF_NOFREEZE))
>> +             debug_check_no_locks_held(current,
>> +
>>                                         "lock held while trying to freeze");
>> ...
>>
>> +     debug_check_no_locks_held(tsk, "lock held at task exit time");
>
> There doesn't seem much point in adding the `msg' to
> debug_check_no_locks_held() - the dump_stack() in
> print_held_locks_bug() will tell us the same thing.  Maybe just change

dump_stack() can be confusing when there is inlining. On occasion I've
looked at the wrong mutex_lock, for example, when there was another
mutex_lock that was inlined. Of course, you can start objdump and
verify the offsets. But that requires that you have the object file.
You could have a try_to_freeze added to do_exit. I was thinking of
adding another locks_held in the return from syscall path.

How about if we did some inlining and printed out the function, file
and line number where the check was placed:

#define debug_check_no_locks_held() do { \
        if (unlikely(current->lockdep_depth > 0)) { \
                printk("BUG: locks helds at %s:%d/%s()!\n", __FILE__,
__LINE__, __func__); \
                print_held_locks_bug(); \
        } \
} while (0)

That we avoid any potential confusion.

> the print_held_locks_bug() messages so they stop assuming they were
> called from do_exit()?
>
> Also, I wonder if the `tsk' arg is needed.  In both callers
> tsk==current.  Is it likely that we'll ever call
> debug_check_no_locks_held() for any task other than `current'?
>

I agree. I'll add that change to the patch once we decide what to about msg.

Regards,
Mandeep

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-21  0:17             ` Mandeep Singh Baines
@ 2013-02-21  0:20               ` Andrew Morton
  2013-02-21  0:28                 ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2013-02-21  0:20 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, 20 Feb 2013 16:17:39 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 20 Feb 2013 15:17:16 -0800
> > Mandeep Singh Baines <msb@chromium.org> wrote:
> >
> >> We shouldn't try_to_freeze if locks are held.
> >>
> >> ...
> >>
> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
> >>
> >> +     if (!(current->flags & PF_NOFREEZE))
> >> +             debug_check_no_locks_held(current,
> >> +
> >>                                         "lock held while trying to freeze");
> >> ...
> >>
> >> +     debug_check_no_locks_held(tsk, "lock held at task exit time");
> >
> > There doesn't seem much point in adding the `msg' to
> > debug_check_no_locks_held() - the dump_stack() in
> > print_held_locks_bug() will tell us the same thing.  Maybe just change
> 
> dump_stack() can be confusing when there is inlining. On occasion I've
> looked at the wrong mutex_lock, for example, when there was another
> mutex_lock that was inlined. Of course, you can start objdump and
> verify the offsets. But that requires that you have the object file.
> You could have a try_to_freeze added to do_exit. I was thinking of
> adding another locks_held in the return from syscall path.

Backtraces aren't *that* bad.  We'll easily be able to tell which of
the two callsites triggered the trace.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-21  0:20               ` Andrew Morton
@ 2013-02-21  0:28                 ` Mandeep Singh Baines
  2013-02-21  0:42                   ` Andrew Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, Feb 20, 2013 at 4:20 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 20 Feb 2013 16:17:39 -0800
> Mandeep Singh Baines <msb@chromium.org> wrote:
>
>> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Wed, 20 Feb 2013 15:17:16 -0800
>> > Mandeep Singh Baines <msb@chromium.org> wrote:
>> >
>> >> We shouldn't try_to_freeze if locks are held.
>> >>
>> >> ...
>> >>
>> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>> >>
>> >> +     if (!(current->flags & PF_NOFREEZE))
>> >> +             debug_check_no_locks_held(current,
>> >> +
>> >>                                         "lock held while trying to freeze");
>> >> ...
>> >>
>> >> +     debug_check_no_locks_held(tsk, "lock held at task exit time");
>> >
>> > There doesn't seem much point in adding the `msg' to
>> > debug_check_no_locks_held() - the dump_stack() in
>> > print_held_locks_bug() will tell us the same thing.  Maybe just change
>>
>> dump_stack() can be confusing when there is inlining. On occasion I've
>> looked at the wrong mutex_lock, for example, when there was another
>> mutex_lock that was inlined. Of course, you can start objdump and
>> verify the offsets. But that requires that you have the object file.
>> You could have a try_to_freeze added to do_exit. I was thinking of
>> adding another locks_held in the return from syscall path.
>
> Backtraces aren't *that* bad.  We'll easily be able to tell which of
> the two callsites triggered the trace.
>

Let's say there was a try_to_freeze() that got inlined indirectly
(multiple levels of inline) into do_exit. Wouldn't the backtraces for
the regular exit check and the try_to_freeze check be identical except
for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
an object file you wouldn't know which check you hit.

Regards,
Mandeep

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-21  0:28                 ` Mandeep Singh Baines
@ 2013-02-21  0:42                   ` Andrew Morton
  2013-02-21  3:19                     ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2013-02-21  0:42 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, 20 Feb 2013 16:28:07 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
> > the two callsites triggered the trace.
> >
> 
> Let's say there was a try_to_freeze() that got inlined indirectly
> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
> the regular exit check and the try_to_freeze check be identical except
> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
> an object file you wouldn't know which check you hit.

Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
__func__.

Or uninline try_to_freeze().  If anything's calling that at high
frequency, we have a problem.  And given the number of callsites,
getting it into icache might result in a faster kernel...

(Someone needs to teach __might_sleep() about __ratelimit())

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5] lockdep: check that no locks held at freeze time
  2013-02-20 23:24           ` Andrew Morton
  2013-02-21  0:17             ` Mandeep Singh Baines
@ 2013-02-21  3:17             ` Mandeep Singh Baines
  2013-02-21 15:42               ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Ingo Molnar, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Rafael J. Wysocki

We shouldn't try_to_freeze if locks are held.

Changes since v1:
* LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterovw
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.
Changes since v3:
* LKML: <20130220223013.GA15760@redhat.com> Oleg Nesterovw
  * Remove stale vfork comment from commit message.
Changes since v4:
* LKML: <20130220152446.a65ff84f.akpm@linux-foundation.org> Andrew Morton
  * Remove tsk param since tsk is always current.
  * Remove msg param, dump_stack() should tell us all we need to know.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/debug_locks.h |  4 ++--
 include/linux/freezer.h     |  3 +++
 kernel/exit.c               |  2 +-
 kernel/lockdep.c            | 16 +++++++---------
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..c5bd118 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include <linux/debug_locks.h>
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
@@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+	if (!(current->flags & PF_NOFREEZE))
+		debug_check_no_locks_held();
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..aff5bdb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
 	/*
 	 * Make sure we are holding no locks:
 	 */
-	debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held();
 	/*
 	 * We can do this unlocked here. The futex code uses this flag
 	 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..8e28f56 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
 {
 	if (!debug_locks_off())
 		return;
@@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct *curr)
 
 	printk("\n");
 	printk("=====================================\n");
-	printk("[ BUG: lock held at task exit time! ]\n");
+	printk("[ BUG: %s/%d still has locks held! ]\n",
+	       current->comm, task_pid_nr(current));
 	print_kernel_ident();
 	printk("-------------------------------------\n");
-	printk("%s/%d is exiting with locks still held!\n",
-		curr->comm, task_pid_nr(curr));
-	lockdep_print_held_locks(curr);
-
+	lockdep_print_held_locks(current);
 	printk("\nstack backtrace:\n");
 	dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
 {
-	if (unlikely(task->lockdep_depth > 0))
-		print_held_locks_bug(task);
+	if (unlikely(current->lockdep_depth > 0))
+		print_held_locks_bug();
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] lockdep: check that no locks held at freeze time
  2013-02-21  0:42                   ` Andrew Morton
@ 2013-02-21  3:19                     ` Mandeep Singh Baines
  0 siblings, 0 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21  3:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Wed, Feb 20, 2013 at 4:42 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 20 Feb 2013 16:28:07 -0800
> Mandeep Singh Baines <msb@chromium.org> wrote:
>
>> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
>> > the two callsites triggered the trace.
>> >
>>
>> Let's say there was a try_to_freeze() that got inlined indirectly
>> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
>> the regular exit check and the try_to_freeze check be identical except
>> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
>> an object file you wouldn't know which check you hit.
>
> Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
> wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
> __func__.
>

Fair enough. I'll avoid using a macro unless/until its actually needed.

Regards,
Mandeep

> Or uninline try_to_freeze().  If anything's calling that at high
> frequency, we have a problem.  And given the number of callsites,
> getting it into icache might result in a faster kernel...
>
> (Someone needs to teach __might_sleep() about __ratelimit())

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] lockdep: check that no locks held at freeze time
  2013-02-21  3:17             ` [PATCH v5] " Mandeep Singh Baines
@ 2013-02-21 15:42               ` Rafael J. Wysocki
  2013-02-21 16:24                 ` Mandeep Singh Baines
  2013-02-21 16:51                 ` [PATCH v6] " Mandeep Singh Baines
  0 siblings, 2 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2013-02-21 15:42 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ingo Molnar, Oleg Nesterov, Tejun Heo, Andrew Morton

On Wednesday, February 20, 2013 07:17:07 PM Mandeep Singh Baines wrote:
> We shouldn't try_to_freeze if locks are held.

Has Ingo acked one of the previous versions or is my memory doing tricks?

Anyway, can you please write something more about what the patch is doing
in the changelog?  While the statement above is correct, it doesn't really
explain why it will be a good idea to apply your patch, does it?

Rafael


> Changes since v1:
> * LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
>   * Added a msg string that gets passed in.
> * LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
>   * Check PF_NOFREEZE in try_to_freeze().
> Changes since v2:
> * LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterovw
>   * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
> * Mandeep Singh Baines
>   * Generalize an exit specific printk.
> Changes since v3:
> * LKML: <20130220223013.GA15760@redhat.com> Oleg Nesterovw
>   * Remove stale vfork comment from commit message.
> Changes since v4:
> * LKML: <20130220152446.a65ff84f.akpm@linux-foundation.org> Andrew Morton
>   * Remove tsk param since tsk is always current.
>   * Remove msg param, dump_stack() should tell us all we need to know.
> 
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  include/linux/debug_locks.h |  4 ++--
>  include/linux/freezer.h     |  3 +++
>  kernel/exit.c               |  2 +-
>  kernel/lockdep.c            | 16 +++++++---------
>  4 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> index 3bd46f7..a975de1 100644
> --- a/include/linux/debug_locks.h
> +++ b/include/linux/debug_locks.h
> @@ -51,7 +51,7 @@ struct task_struct;
>  extern void debug_show_all_locks(void);
>  extern void debug_show_held_locks(struct task_struct *task);
>  extern void debug_check_no_locks_freed(const void *from, unsigned long len);
> -extern void debug_check_no_locks_held(struct task_struct *task);
> +extern void debug_check_no_locks_held(void);
>  #else
>  static inline void debug_show_all_locks(void)
>  {
> @@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
>  }
>  
>  static inline void
> -debug_check_no_locks_held(struct task_struct *task)
> +debug_check_no_locks_held(void)
>  {
>  }
>  #endif
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index e4238ce..c5bd118 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -3,6 +3,7 @@
>  #ifndef FREEZER_H_INCLUDED
>  #define FREEZER_H_INCLUDED
>  
> +#include <linux/debug_locks.h>
>  #include <linux/sched.h>
>  #include <linux/wait.h>
>  #include <linux/atomic.h>
> @@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
>  
>  static inline bool try_to_freeze(void)
>  {
> +	if (!(current->flags & PF_NOFREEZE))
> +		debug_check_no_locks_held();
>  	might_sleep();
>  	if (likely(!freezing(current)))
>  		return false;
> diff --git a/kernel/exit.c b/kernel/exit.c
> index b4df219..aff5bdb 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -833,7 +833,7 @@ void do_exit(long code)
>  	/*
>  	 * Make sure we are holding no locks:
>  	 */
> -	debug_check_no_locks_held(tsk);
> +	debug_check_no_locks_held();
>  	/*
>  	 * We can do this unlocked here. The futex code uses this flag
>  	 * just to verify whether the pi state cleanup has been done
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 7981e5b..8e28f56 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
>  }
>  EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
>  
> -static void print_held_locks_bug(struct task_struct *curr)
> +static void print_held_locks_bug(void)
>  {
>  	if (!debug_locks_off())
>  		return;
> @@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct *curr)
>  
>  	printk("\n");
>  	printk("=====================================\n");
> -	printk("[ BUG: lock held at task exit time! ]\n");
> +	printk("[ BUG: %s/%d still has locks held! ]\n",
> +	       current->comm, task_pid_nr(current));
>  	print_kernel_ident();
>  	printk("-------------------------------------\n");
> -	printk("%s/%d is exiting with locks still held!\n",
> -		curr->comm, task_pid_nr(curr));
> -	lockdep_print_held_locks(curr);
> -
> +	lockdep_print_held_locks(current);
>  	printk("\nstack backtrace:\n");
>  	dump_stack();
>  }
>  
> -void debug_check_no_locks_held(struct task_struct *task)
> +void debug_check_no_locks_held(void)
>  {
> -	if (unlikely(task->lockdep_depth > 0))
> -		print_held_locks_bug(task);
> +	if (unlikely(current->lockdep_depth > 0))
> +		print_held_locks_bug();
>  }
>  
>  void debug_show_all_locks(void)
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] lockdep: check that no locks held at freeze time
  2013-02-21 15:42               ` Rafael J. Wysocki
@ 2013-02-21 16:24                 ` Mandeep Singh Baines
  2013-02-21 16:51                 ` [PATCH v6] " Mandeep Singh Baines
  1 sibling, 0 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21 16:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Ingo Molnar, Oleg Nesterov, Tejun Heo, Andrew Morton

On Thu, Feb 21, 2013 at 7:42 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Wednesday, February 20, 2013 07:17:07 PM Mandeep Singh Baines wrote:
>> We shouldn't try_to_freeze if locks are held.
>
> Has Ingo acked one of the previous versions or is my memory doing tricks?
>

Yes, Ingo had acked a previous version. However, the patch has changed
non-trivially since his ack. I wasn't really sure what do in this
situation. Is it OK to keep the ack?

> Anyway, can you please write something more about what the patch is doing
> in the changelog?  While the statement above is correct, it doesn't really
> explain why it will be a good idea to apply your patch, does it?
>

k, will do

> Rafael
>
>
>> Changes since v1:
>> * LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
>>   * Added a msg string that gets passed in.
>> * LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
>>   * Check PF_NOFREEZE in try_to_freeze().
>> Changes since v2:
>> * LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterovw
>>   * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
>> * Mandeep Singh Baines
>>   * Generalize an exit specific printk.
>> Changes since v3:
>> * LKML: <20130220223013.GA15760@redhat.com> Oleg Nesterovw
>>   * Remove stale vfork comment from commit message.
>> Changes since v4:
>> * LKML: <20130220152446.a65ff84f.akpm@linux-foundation.org> Andrew Morton
>>   * Remove tsk param since tsk is always current.
>>   * Remove msg param, dump_stack() should tell us all we need to know.
>>
>> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: Tejun Heo <tj@kernel.org>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Rafael J. Wysocki <rjw@sisk.pl>
>> ---
>>  include/linux/debug_locks.h |  4 ++--
>>  include/linux/freezer.h     |  3 +++
>>  kernel/exit.c               |  2 +-
>>  kernel/lockdep.c            | 16 +++++++---------
>>  4 files changed, 13 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
>> index 3bd46f7..a975de1 100644
>> --- a/include/linux/debug_locks.h
>> +++ b/include/linux/debug_locks.h
>> @@ -51,7 +51,7 @@ struct task_struct;
>>  extern void debug_show_all_locks(void);
>>  extern void debug_show_held_locks(struct task_struct *task);
>>  extern void debug_check_no_locks_freed(const void *from, unsigned long len);
>> -extern void debug_check_no_locks_held(struct task_struct *task);
>> +extern void debug_check_no_locks_held(void);
>>  #else
>>  static inline void debug_show_all_locks(void)
>>  {
>> @@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
>>  }
>>
>>  static inline void
>> -debug_check_no_locks_held(struct task_struct *task)
>> +debug_check_no_locks_held(void)
>>  {
>>  }
>>  #endif
>> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
>> index e4238ce..c5bd118 100644
>> --- a/include/linux/freezer.h
>> +++ b/include/linux/freezer.h
>> @@ -3,6 +3,7 @@
>>  #ifndef FREEZER_H_INCLUDED
>>  #define FREEZER_H_INCLUDED
>>
>> +#include <linux/debug_locks.h>
>>  #include <linux/sched.h>
>>  #include <linux/wait.h>
>>  #include <linux/atomic.h>
>> @@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
>>
>>  static inline bool try_to_freeze(void)
>>  {
>> +     if (!(current->flags & PF_NOFREEZE))
>> +             debug_check_no_locks_held();
>>       might_sleep();
>>       if (likely(!freezing(current)))
>>               return false;
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index b4df219..aff5bdb 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -833,7 +833,7 @@ void do_exit(long code)
>>       /*
>>        * Make sure we are holding no locks:
>>        */
>> -     debug_check_no_locks_held(tsk);
>> +     debug_check_no_locks_held();
>>       /*
>>        * We can do this unlocked here. The futex code uses this flag
>>        * just to verify whether the pi state cleanup has been done
>> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
>> index 7981e5b..8e28f56 100644
>> --- a/kernel/lockdep.c
>> +++ b/kernel/lockdep.c
>> @@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
>>  }
>>  EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
>>
>> -static void print_held_locks_bug(struct task_struct *curr)
>> +static void print_held_locks_bug(void)
>>  {
>>       if (!debug_locks_off())
>>               return;
>> @@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct *curr)
>>
>>       printk("\n");
>>       printk("=====================================\n");
>> -     printk("[ BUG: lock held at task exit time! ]\n");
>> +     printk("[ BUG: %s/%d still has locks held! ]\n",
>> +            current->comm, task_pid_nr(current));
>>       print_kernel_ident();
>>       printk("-------------------------------------\n");
>> -     printk("%s/%d is exiting with locks still held!\n",
>> -             curr->comm, task_pid_nr(curr));
>> -     lockdep_print_held_locks(curr);
>> -
>> +     lockdep_print_held_locks(current);
>>       printk("\nstack backtrace:\n");
>>       dump_stack();
>>  }
>>
>> -void debug_check_no_locks_held(struct task_struct *task)
>> +void debug_check_no_locks_held(void)
>>  {
>> -     if (unlikely(task->lockdep_depth > 0))
>> -             print_held_locks_bug(task);
>> +     if (unlikely(current->lockdep_depth > 0))
>> +             print_held_locks_bug();
>>  }
>>
>>  void debug_show_all_locks(void)
>>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v6] lockdep: check that no locks held at freeze time
  2013-02-21 15:42               ` Rafael J. Wysocki
  2013-02-21 16:24                 ` Mandeep Singh Baines
@ 2013-02-21 16:51                 ` Mandeep Singh Baines
  2013-02-21 21:42                   ` Andrew Morton
  1 sibling, 1 reply; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21 16:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Ingo Molnar, Oleg Nesterov, Tejun Heo,
	Andrew Morton, Rafael J. Wysocki

We shouldn't try_to_freeze if locks are held. Holding a lock
can cause a deadlock if the lock is later acquired in the
suspend or hibernate path (e.g. by dpm). Holding a lock can
also cause a deadlock in the case of cgroup_freezer if a
lock is held inside a frozen cgroup that is later acquired by
a process outside that group.

Changes since v1:
* LKML: <20130215111635.GA26955@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.GD30829@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.GC4910@redhat.com> Oleg Nesterovw
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.
Changes since v3:
* LKML: <20130220223013.GA15760@redhat.com> Oleg Nesterovw
  * Remove stale vfork comment from commit message.
Changes since v4:
* LKML: <20130220152446.a65ff84f.akpm@linux-foundation.org> Andrew Morton
  * Remove tsk param since tsk is always current.
  * Remove msg param, dump_stack() should tell us all we need to know.
Changes since v5:
* LKML: <3556388.kjl7k6r8W6@vostro.rjw.lan> Rafael J. Wysocki
  * Updated changelog explaining why we can't freeze with locks held.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Tejun Heo <tj@kernel.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/debug_locks.h |  4 ++--
 include/linux/freezer.h     |  3 +++
 kernel/exit.c               |  2 +-
 kernel/lockdep.c            | 16 +++++++---------
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..c5bd118 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include <linux/debug_locks.h>
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
@@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+	if (!(current->flags & PF_NOFREEZE))
+		debug_check_no_locks_held();
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..aff5bdb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
 	/*
 	 * Make sure we are holding no locks:
 	 */
-	debug_check_no_locks_held(tsk);
+	debug_check_no_locks_held();
 	/*
 	 * We can do this unlocked here. The futex code uses this flag
 	 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..8e28f56 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
 {
 	if (!debug_locks_off())
 		return;
@@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct *curr)
 
 	printk("\n");
 	printk("=====================================\n");
-	printk("[ BUG: lock held at task exit time! ]\n");
+	printk("[ BUG: %s/%d still has locks held! ]\n",
+	       current->comm, task_pid_nr(current));
 	print_kernel_ident();
 	printk("-------------------------------------\n");
-	printk("%s/%d is exiting with locks still held!\n",
-		curr->comm, task_pid_nr(curr));
-	lockdep_print_held_locks(curr);
-
+	lockdep_print_held_locks(current);
 	printk("\nstack backtrace:\n");
 	dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
 {
-	if (unlikely(task->lockdep_depth > 0))
-		print_held_locks_bug(task);
+	if (unlikely(current->lockdep_depth > 0))
+		print_held_locks_bug();
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v6] lockdep: check that no locks held at freeze time
  2013-02-21 16:51                 ` [PATCH v6] " Mandeep Singh Baines
@ 2013-02-21 21:42                   ` Andrew Morton
  2013-02-21 21:57                     ` Mandeep Singh Baines
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2013-02-21 21:42 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ingo Molnar, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Thu, 21 Feb 2013 08:51:41 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> We shouldn't try_to_freeze if locks are held. Holding a lock
> can cause a deadlock if the lock is later acquired in the
> suspend or hibernate path (e.g. by dpm). Holding a lock can
> also cause a deadlock in the case of cgroup_freezer if a
> lock is held inside a frozen cgroup that is later acquired by
> a process outside that group.
> 
> ...
>
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
>
> ...
>
> @@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
>  
>  static inline bool try_to_freeze(void)
>  {
> +	if (!(current->flags & PF_NOFREEZE))
> +		debug_check_no_locks_held();
>  	might_sleep();
>  	if (likely(!freezing(current)))
>  		return false;
>
> ...
>

It still needs
http://ozlabs.org/~akpm/mmots/broken-out/lockdep-check-that-no-locks-held-at-freeze-time-fix.patch

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v6] lockdep: check that no locks held at freeze time
  2013-02-21 21:42                   ` Andrew Morton
@ 2013-02-21 21:57                     ` Mandeep Singh Baines
  0 siblings, 0 replies; 45+ messages in thread
From: Mandeep Singh Baines @ 2013-02-21 21:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Ingo Molnar, Oleg Nesterov, Tejun Heo, Rafael J. Wysocki

On Thu, Feb 21, 2013 at 1:42 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 21 Feb 2013 08:51:41 -0800
> Mandeep Singh Baines <msb@chromium.org> wrote:
>
>> We shouldn't try_to_freeze if locks are held. Holding a lock
>> can cause a deadlock if the lock is later acquired in the
>> suspend or hibernate path (e.g. by dpm). Holding a lock can
>> also cause a deadlock in the case of cgroup_freezer if a
>> lock is held inside a frozen cgroup that is later acquired by
>> a process outside that group.
>>
>> ...
>>
>> --- a/include/linux/freezer.h
>> +++ b/include/linux/freezer.h
>>
>> ...
>>
>> @@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
>>
>>  static inline bool try_to_freeze(void)
>>  {
>> +     if (!(current->flags & PF_NOFREEZE))
>> +             debug_check_no_locks_held();
>>       might_sleep();
>>       if (likely(!freezing(current)))
>>               return false;
>>
>> ...
>>
>
> It still needs
> http://ozlabs.org/~akpm/mmots/broken-out/lockdep-check-that-no-locks-held-at-freeze-time-fix.patch

D'oh. I tricked myself into thinking its part of freezer so no export
needed but its inlined so can be pulled in anywhere.

Thanks for the fix:

Acked-by: Mandeep Singh Baines <msb@chromium.org>

Or should I roll in the fix and re-send with your Sign-off.

Regards,
Mandeep

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe
  2013-02-20 23:30                 ` Mandeep Singh Baines
@ 2013-02-23 19:21                   ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-23 19:21 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Ben Chan, Tejun Heo, Andrew Morton,
	Rafael J. Wysocki, Ingo Molnar

Mandeep, sorry for delay.

On 02/20, Mandeep Singh Baines wrote:
>
> On Tue, Feb 19, 2013 at 12:20 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
> > Ah. Good point. How about this then:
> >
> > /* can't use wait_event_freezable since we suppress the fake signal on
> > SIGNAL_GROUP_COREDUMP */
> > freezer_do_not_count();
> > wait_event_interruptible(pipe->wait, pipe->readers == 1);
> > freezer_count();

Yes, I thought about freezer_do_not_count() + wait_event_interruptible()
too, but

> No that won't work either. You can't block the fake signal.

Yes.

And wait_event_freezable() can't work because it is buggy, it doesn't
clear TIF_SIGPENDING so it becomes a busy-wait loop after freezing.
34b087e48 broke the logic. Perhaps my fault, I do not remember when
I sent the patch which was applied as 24b7ead3 (but the code was wrong
anyway). And its usage is wrong too, unless I missed something. But
lets discuss this later.

As for coredump, we can fix wait_event_freezable() or reintroduce
recalc_sigpending() in __refrigerator() as you suggested.

But this doesn't solve other problems, we need more try_to_freeze's
before wait_for_dump_helper() is called.

> On suspend, you might truncate a core-dump but at least you can
> reliably suspend.

Btw, yes. Perhaps coredump should simply fail if it races with
freezer, in this case the necessary changes are simple.

But so far I'll try to make the fixes which will allow to freeze
the dumping task correctly, I hope I'll send the patches tomorrow.

It is still not clear to me whether we should change freeze_task(),
but most probably yes, this would be simpler.

Oleg.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-20 18:09       ` Mandeep Singh Baines
@ 2013-02-23 19:41         ` Oleg Nesterov
  2013-02-23 19:59           ` Oleg Nesterov
  0 siblings, 1 reply; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-23 19:41 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki,
	Ingo Molnar, adurbin

On 02/20, Mandeep Singh Baines wrote:
>
> I think we need something like this in order to be able to fix
> wait_event_freezable and friends. Here is one idea:
>
> #define __wait_event_freezable(wq, condition, ret)                  \
> do {                                                                    \
>         DEFINE_WAIT(__wait);                                            \
>                                                                         \
>         for (;;) {                                                      \
>                 prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);      \
>                 if (condition)                                          \
>                         break;                                          \
>                 if (!signal_pending(current)) {                         \
>                         freezable_schedule();

...

> If you cleaned up the fake signal in __refrigerator()

Perhaps. Or we can add recalc_sigpending into wait_freezable().
But note that

	if (!signal_pending(current))
		freezable_schedule();

is not actually right, wait_event_freezable() should be interruptible,
but not by freezer.

And let me repeat, as for coredump this can only solve the problems in
wait_for_dump_handler(). As for other users, I simply do not see any
valid user today, so perhaps wait_event_freezable() should die... Or
I missed something.

I'll try to make the coredumping fixes tomorrow, then we discuss this
again.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator
  2013-02-23 19:41         ` Oleg Nesterov
@ 2013-02-23 19:59           ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2013-02-23 19:59 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Tejun Heo, Andrew Morton, Rafael J. Wysocki,
	Ingo Molnar, adurbin

Sorry, forgot to mention...

On 02/23, Oleg Nesterov wrote:
>
> And let me repeat, as for coredump this can only solve the problems in
> wait_for_dump_handler().

And, otoh, if we change freeze_task() then wait_event_freezable()
should work just fine in wait_for_dump_handler().

> I'll try to make the coredumping fixes tomorrow, then we discuss this
> again.

Yes, I'm afraid my comments were very confusing ;) I think the patches
will explain what I mean.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2013-02-23 20:00 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-16  9:53 [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Mandeep Singh Baines
2013-02-16  9:53 ` [PATCH 2/5] lockdep: check that no locks held at freeze time Mandeep Singh Baines
2013-02-16 17:06   ` Oleg Nesterov
2013-02-20  1:57     ` [PATCH v3] " Mandeep Singh Baines
2013-02-20 10:37       ` Ingo Molnar
2013-02-20 12:55         ` Rafael J. Wysocki
2013-02-20 13:52           ` Ingo Molnar
2013-02-20 22:30       ` Oleg Nesterov
2013-02-20 23:11         ` Mandeep Singh Baines
2013-02-20 23:17         ` [PATCH v4] " Mandeep Singh Baines
2013-02-20 23:24           ` Andrew Morton
2013-02-21  0:17             ` Mandeep Singh Baines
2013-02-21  0:20               ` Andrew Morton
2013-02-21  0:28                 ` Mandeep Singh Baines
2013-02-21  0:42                   ` Andrew Morton
2013-02-21  3:19                     ` Mandeep Singh Baines
2013-02-21  3:17             ` [PATCH v5] " Mandeep Singh Baines
2013-02-21 15:42               ` Rafael J. Wysocki
2013-02-21 16:24                 ` Mandeep Singh Baines
2013-02-21 16:51                 ` [PATCH v6] " Mandeep Singh Baines
2013-02-21 21:42                   ` Andrew Morton
2013-02-21 21:57                     ` Mandeep Singh Baines
2013-02-16  9:53 ` [PATCH 3/5] coredump: use a freezable_schedule for the coredump_finish wait Mandeep Singh Baines
2013-02-16 17:17   ` Oleg Nesterov
2013-02-16  9:53 ` [PATCH 4/5] freezer: clear fake signal on exit from __refrigerator Mandeep Singh Baines
2013-02-16 17:06   ` Oleg Nesterov
2013-02-16 17:12     ` Oleg Nesterov
2013-02-20 18:09       ` Mandeep Singh Baines
2013-02-23 19:41         ` Oleg Nesterov
2013-02-23 19:59           ` Oleg Nesterov
2013-02-16  9:53 ` [PATCH 5/5] coredump: ignore non-fatal signals when core dumping to a pipe Mandeep Singh Baines
2013-02-16 17:10   ` Oleg Nesterov
2013-02-16 19:46     ` Oleg Nesterov
2013-02-18 23:55       ` Mandeep Singh Baines
2013-02-19 14:18         ` Oleg Nesterov
2013-02-19  5:19       ` Mandeep Singh Baines
2013-02-19 14:27         ` Oleg Nesterov
2013-02-19 19:33           ` Mandeep Singh Baines
2013-02-19 19:45             ` Oleg Nesterov
2013-02-19 20:20               ` Mandeep Singh Baines
2013-02-20 23:30                 ` Mandeep Singh Baines
2013-02-23 19:21                   ` Oleg Nesterov
2013-02-16 17:05 ` [PATCH 1/5] vfork: don't freezer_count() for in-kernel users of CLONE_VFORK Oleg Nesterov
2013-02-20  0:07   ` Mandeep Singh Baines
2013-02-20  1:41     ` Mandeep Singh Baines

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.