linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] exec: make de_thread() freezable
@ 2018-11-12  3:54 Chanho Min
  2018-11-12  8:15 ` Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Chanho Min @ 2018-11-12  3:54 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Andrew Morton,
	Eric W. Biederman, Christian Brauner, Oleg Nesterov,
	Anna-Maria Gleixner, Alexander Viro
  Cc: linux-pm, linux-kernel, linux-fsdevel, Seungho Park, Inkyu Hwang,
	Donghwan Jung, Jongsung Kim, Chanho Min

Suspend fails due to the exec family of functions blocking the freezer.
The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
all sub-threads to die, and we have the deadlock if one of them is frozen.
This also can occur with the schedule() waiting for the group thread leader
to exit if it is frozen.

In our machine, it causes freeze timeout as bellows.

Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
 Call trace:
[<ffffffc00008ed70>] __switch_to+0x88/0xa0
[<ffffffc000d1c30c>] __schedule+0x1bc/0x720
[<ffffffc000d1ca90>] schedule+0x40/0xa8
[<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
[<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
[<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
[<ffffffc00021c584>] load_script+0x20c/0x228
[<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
[<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
[<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
[<ffffffc00008de30>] el0_svc_naked+0x24/0x28

To fix this, make de_thread() freezable. It looks safe and works fine.

Changes in v2:
 - changes for the same reason in "if (!thread_group_leader(tsk))" branch.
   (reported by Oleg)

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Chanho Min <chanho.min@lge.com>
---
 fs/exec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1ebf6e5..6da8745 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,6 +62,7 @@
 #include <linux/oom.h>
 #include <linux/compat.h>
 #include <linux/vmalloc.h>
+#include <linux/freezer.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1083,7 +1084,7 @@ static int de_thread(struct task_struct *tsk)
 	while (sig->notify_count) {
 		__set_current_state(TASK_KILLABLE);
 		spin_unlock_irq(lock);
-		schedule();
+		freezable_schedule();
 		if (unlikely(__fatal_signal_pending(tsk)))
 			goto killed;
 		spin_lock_irq(lock);
@@ -1111,7 +1112,7 @@ static int de_thread(struct task_struct *tsk)
 			__set_current_state(TASK_KILLABLE);
 			write_unlock_irq(&tasklist_lock);
 			cgroup_threadgroup_change_end(tsk);
-			schedule();
+			freezable_schedule();
 			if (unlikely(__fatal_signal_pending(tsk)))
 				goto killed;
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-12  3:54 [PATCH v2] exec: make de_thread() freezable Chanho Min
@ 2018-11-12  8:15 ` Oleg Nesterov
  2018-11-21 23:35   ` Rafael J. Wysocki
  2018-11-12  9:51 ` Pavel Machek
  2018-11-13 14:53 ` Michal Hocko
  2 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2018-11-12  8:15 UTC (permalink / raw)
  To: Chanho Min
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Andrew Morton,
	Eric W. Biederman, Christian Brauner, Anna-Maria Gleixner,
	Alexander Viro, linux-pm, linux-kernel, linux-fsdevel,
	Seungho Park, Inkyu Hwang, Donghwan Jung, Jongsung Kim

On 11/12, Chanho Min wrote:
>
> @@ -1083,7 +1084,7 @@ static int de_thread(struct task_struct *tsk)
>  	while (sig->notify_count) {
>  		__set_current_state(TASK_KILLABLE);
>  		spin_unlock_irq(lock);
> -		schedule();
> +		freezable_schedule();
>  		if (unlikely(__fatal_signal_pending(tsk)))
>  			goto killed;
>  		spin_lock_irq(lock);
> @@ -1111,7 +1112,7 @@ static int de_thread(struct task_struct *tsk)
>  			__set_current_state(TASK_KILLABLE);
>  			write_unlock_irq(&tasklist_lock);
>  			cgroup_threadgroup_change_end(tsk);
> -			schedule();
> +			freezable_schedule();
>  			if (unlikely(__fatal_signal_pending(tsk)))
>  				goto killed;
>  		}

Thanks, looks good to me.

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-12  3:54 [PATCH v2] exec: make de_thread() freezable Chanho Min
  2018-11-12  8:15 ` Oleg Nesterov
@ 2018-11-12  9:51 ` Pavel Machek
  2018-11-13 14:53 ` Michal Hocko
  2 siblings, 0 replies; 12+ messages in thread
From: Pavel Machek @ 2018-11-12  9:51 UTC (permalink / raw)
  To: Chanho Min
  Cc: Rafael J. Wysocki, Len Brown, Andrew Morton, Eric W. Biederman,
	Christian Brauner, Oleg Nesterov, Anna-Maria Gleixner,
	Alexander Viro, linux-pm, linux-kernel, linux-fsdevel,
	Seungho Park, Inkyu Hwang, Donghwan Jung, Jongsung Kim

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

On Mon 2018-11-12 12:54:45, Chanho Min wrote:
> Suspend fails due to the exec family of functions blocking the freezer.
> The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> all sub-threads to die, and we have the deadlock if one of them is frozen.
> This also can occur with the schedule() waiting for the group thread leader
> to exit if it is frozen.
> 
> In our machine, it causes freeze timeout as bellows.
> 
> Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
> setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
>  Call trace:
> [<ffffffc00008ed70>] __switch_to+0x88/0xa0
> [<ffffffc000d1c30c>] __schedule+0x1bc/0x720
> [<ffffffc000d1ca90>] schedule+0x40/0xa8
> [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
> [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
> [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> [<ffffffc00021c584>] load_script+0x20c/0x228
> [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
> [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
> [<ffffffc00008de30>] el0_svc_naked+0x24/0x28
> 
> To fix this, make de_thread() freezable. It looks safe and works fine.
> 
> Changes in v2:
>  - changes for the same reason in "if (!thread_group_leader(tsk))" branch.
>    (reported by Oleg)
> 
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Chanho Min <chanho.min@lge.com>

Acked-by: Pavel Machek <pavel@ucw.cz>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-12  3:54 [PATCH v2] exec: make de_thread() freezable Chanho Min
  2018-11-12  8:15 ` Oleg Nesterov
  2018-11-12  9:51 ` Pavel Machek
@ 2018-11-13 14:53 ` Michal Hocko
  2018-11-13 16:18   ` Oleg Nesterov
  2018-11-14 11:31   ` Michal Hocko
  2 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-13 14:53 UTC (permalink / raw)
  To: Chanho Min
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Andrew Morton,
	Eric W. Biederman, Christian Brauner, Oleg Nesterov,
	Anna-Maria Gleixner, Alexander Viro, linux-pm, linux-kernel,
	linux-fsdevel, Seungho Park, Inkyu Hwang, Donghwan Jung,
	Jongsung Kim

On Mon 12-11-18 12:54:45, Chanho Min wrote:
> Suspend fails due to the exec family of functions blocking the freezer.
> The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> all sub-threads to die, and we have the deadlock if one of them is frozen.
> This also can occur with the schedule() waiting for the group thread leader
> to exit if it is frozen.
> 
> In our machine, it causes freeze timeout as bellows.
> 
> Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
> setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
>  Call trace:
> [<ffffffc00008ed70>] __switch_to+0x88/0xa0
> [<ffffffc000d1c30c>] __schedule+0x1bc/0x720
> [<ffffffc000d1ca90>] schedule+0x40/0xa8
> [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
> [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
> [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> [<ffffffc00021c584>] load_script+0x20c/0x228
> [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
> [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
> [<ffffffc00008de30>] el0_svc_naked+0x24/0x28
> 
> To fix this, make de_thread() freezable. It looks safe and works fine.

It's been some time since I have looked into this code so bear with me.
One thing is not really clear to me. Why does it help to exclude this
particular task from the freezer when it is not sleeping in the freezer.
I can see how other threads need to be zapped and TASK_WAKEKILL doesn't
do that but shouldn't we fix that instead?

Or maybe I am missing something important here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-13 14:53 ` Michal Hocko
@ 2018-11-13 16:18   ` Oleg Nesterov
  2018-11-13 18:00     ` Michal Hocko
  2018-11-14 11:31   ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2018-11-13 16:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Chanho Min, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Andrew Morton, Eric W. Biederman, Christian Brauner,
	Anna-Maria Gleixner, Alexander Viro, linux-pm, linux-kernel,
	linux-fsdevel, Seungho Park, Inkyu Hwang, Donghwan Jung,
	Jongsung Kim

On 11/13, Michal Hocko wrote:
>
> On Mon 12-11-18 12:54:45, Chanho Min wrote:
> > Suspend fails due to the exec family of functions blocking the freezer.
> > The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> > all sub-threads to die, and we have the deadlock if one of them is frozen.
> > This also can occur with the schedule() waiting for the group thread leader
> > to exit if it is frozen.
> >
> > In our machine, it causes freeze timeout as bellows.
> >
> > Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
> > setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
> >  Call trace:
> > [<ffffffc00008ed70>] __switch_to+0x88/0xa0
> > [<ffffffc000d1c30c>] __schedule+0x1bc/0x720
> > [<ffffffc000d1ca90>] schedule+0x40/0xa8
> > [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
> > [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
> > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > [<ffffffc00021c584>] load_script+0x20c/0x228
> > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
> > [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
> > [<ffffffc00008de30>] el0_svc_naked+0x24/0x28
> >
> > To fix this, make de_thread() freezable. It looks safe and works fine.
>
> It's been some time since I have looked into this code so bear with me.
> One thing is not really clear to me. Why does it help to exclude this
> particular task from the freezer

we don't exclude it,

> when it is not sleeping in the freezer.

Yes, it is not sleeping in __refrigerator(), but it does

	schedule();
	freezer_count();

so it will enter __refrigerator() right after wakeup. If it won't be woken
up we do not care, we can consider it "frozen".

> I can see how other threads need to be zapped and TASK_WAKEKILL doesn't
> do that but shouldn't we fix that instead?

Not sure I understand, but unlikely we can (or want) to make __refrigerator()
killable.

Otherwise, how can we fix that?

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-13 16:18   ` Oleg Nesterov
@ 2018-11-13 18:00     ` Michal Hocko
  2018-11-14 10:18       ` Chanho Min
  2018-11-14 14:29       ` Oleg Nesterov
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-13 18:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Chanho Min, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Andrew Morton, Eric W. Biederman, Christian Brauner,
	Anna-Maria Gleixner, Alexander Viro, linux-pm, linux-kernel,
	linux-fsdevel, Seungho Park, Inkyu Hwang, Donghwan Jung,
	Jongsung Kim

On Tue 13-11-18 17:18:58, Oleg Nesterov wrote:
> On 11/13, Michal Hocko wrote:
> >
> > On Mon 12-11-18 12:54:45, Chanho Min wrote:
> > > Suspend fails due to the exec family of functions blocking the freezer.
> > > The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> > > all sub-threads to die, and we have the deadlock if one of them is frozen.
> > > This also can occur with the schedule() waiting for the group thread leader
> > > to exit if it is frozen.
> > >
> > > In our machine, it causes freeze timeout as bellows.
> > >
> > > Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
> > > setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
> > >  Call trace:
> > > [<ffffffc00008ed70>] __switch_to+0x88/0xa0
> > > [<ffffffc000d1c30c>] __schedule+0x1bc/0x720
> > > [<ffffffc000d1ca90>] schedule+0x40/0xa8
> > > [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
> > > [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
> > > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > > [<ffffffc00021c584>] load_script+0x20c/0x228
> > > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > > [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
> > > [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
> > > [<ffffffc00008de30>] el0_svc_naked+0x24/0x28
> > >
> > > To fix this, make de_thread() freezable. It looks safe and works fine.
> >
> > It's been some time since I have looked into this code so bear with me.
> > One thing is not really clear to me. Why does it help to exclude this
> > particular task from the freezer
> 
> we don't exclude it,
> 
> > when it is not sleeping in the freezer.
> 
> Yes, it is not sleeping in __refrigerator(), but it does
> 
> 	schedule();
> 	freezer_count();
> 
> so it will enter __refrigerator() right after wakeup. If it won't be woken
> up we do not care, we can consider it "frozen".

Right, but this is just silencing the freezing code to exclude this
task, right?

> > I can see how other threads need to be zapped and TASK_WAKEKILL doesn't
> > do that but shouldn't we fix that instead?
> 
> Not sure I understand, but unlikely we can (or want) to make __refrigerator()
> killable.

Why would that be a problem. If the kill is fatal then why to keep the
killed task in the fridge?

> Otherwise, how can we fix that?

We can mark all threads PF_NOFREEZE and wake them up. This would require
some more changes of course but wouldn't that be a more appropriate
solution? Do we want to block exec for ever just because some threads
are in the fridge?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] exec: make de_thread() freezable
  2018-11-13 18:00     ` Michal Hocko
@ 2018-11-14 10:18       ` Chanho Min
  2018-11-14 10:30         ` Michal Hocko
  2018-11-14 14:29       ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Chanho Min @ 2018-11-14 10:18 UTC (permalink / raw)
  To: 'Michal Hocko', 'Oleg Nesterov'
  Cc: 'Rafael J. Wysocki', 'Pavel Machek',
	'Len Brown', 'Andrew Morton',
	'Eric W. Biederman', 'Christian Brauner',
	'Anna-Maria Gleixner', 'Alexander Viro',
	linux-pm, linux-kernel, linux-fsdevel, 'Seungho Park',
	'Inkyu Hwang', 'Donghwan Jung',
	'Jongsung Kim'

> > > It's been some time since I have looked into this code so bear with
me.
> > > One thing is not really clear to me. Why does it help to exclude this
> > > particular task from the freezer
> >
> > we don't exclude it,
> >
> > > when it is not sleeping in the freezer.
> >
> > Yes, it is not sleeping in __refrigerator(), but it does
> >
> > 	schedule();
> > 	freezer_count();
> >
> > so it will enter __refrigerator() right after wakeup. If it won't be
> woken
> > up we do not care, we can consider it "frozen".
> 
> Right, but this is just silencing the freezing code to exclude this
> task, right?
> 
> > > I can see how other threads need to be zapped and TASK_WAKEKILL
> doesn't
> > > do that but shouldn't we fix that instead?
> >
> > Not sure I understand, but unlikely we can (or want) to make
> __refrigerator()
> > killable.
> 
> Why would that be a problem. If the kill is fatal then why to keep the
> killed task in the fridge?
> 

Is it  different between 'the killed task is frozen' and '__refrigerator()
is killable'?
From a general '__refrigerator()' implementation point of view I know that
it should not be killable.

> > Otherwise, how can we fix that?
> 
> We can mark all threads PF_NOFREEZE and wake them up. This would require
> some more changes of course but wouldn't that be a more appropriate
> solution? Do we want to block exec for ever just because some threads
> are in the fridge?
> 

IMHO, It seems to be difficult and buggy to control with PF_NOFREEZE.
Because,
The sub-thread can freeze and receive SIG_KILL before the marking of
PF_NOFREEZE
and it should be freezable in other cases. I don't understand why it isn't
appropriate
for exec to block. The exec can freeze. When tasks are thawed, the killed
sub-thread
will die and wake de_thread(). The exec will continue to work from resume.

Chanho


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-14 10:18       ` Chanho Min
@ 2018-11-14 10:30         ` Michal Hocko
  2018-11-14 14:37           ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-11-14 10:30 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'Oleg Nesterov', 'Rafael J. Wysocki',
	'Pavel Machek', 'Len Brown',
	'Andrew Morton', 'Eric W. Biederman',
	'Christian Brauner', 'Anna-Maria Gleixner',
	'Alexander Viro',
	linux-pm, linux-kernel, linux-fsdevel, 'Seungho Park',
	'Inkyu Hwang', 'Donghwan Jung',
	'Jongsung Kim'

On Wed 14-11-18 19:18:42, Chanho Min wrote:
> > > > It's been some time since I have looked into this code so bear with
> me.
> > > > One thing is not really clear to me. Why does it help to exclude this
> > > > particular task from the freezer
> > >
> > > we don't exclude it,
> > >
> > > > when it is not sleeping in the freezer.
> > >
> > > Yes, it is not sleeping in __refrigerator(), but it does
> > >
> > > 	schedule();
> > > 	freezer_count();
> > >
> > > so it will enter __refrigerator() right after wakeup. If it won't be
> > woken
> > > up we do not care, we can consider it "frozen".
> > 
> > Right, but this is just silencing the freezing code to exclude this
> > task, right?
> > 
> > > > I can see how other threads need to be zapped and TASK_WAKEKILL
> > doesn't
> > > > do that but shouldn't we fix that instead?
> > >
> > > Not sure I understand, but unlikely we can (or want) to make
> > __refrigerator()
> > > killable.
> > 
> > Why would that be a problem. If the kill is fatal then why to keep the
> > killed task in the fridge?
> > 
> 
> Is it  different between 'the killed task is frozen' and '__refrigerator()
> is killable'?
> From a general '__refrigerator()' implementation point of view I know that
> it should not be killable.

Is that because there are many paths that do not terminate right after
the task get out of the fridge? Like the signal path?

> > > Otherwise, how can we fix that?
> > 
> > We can mark all threads PF_NOFREEZE and wake them up. This would require
> > some more changes of course but wouldn't that be a more appropriate
> > solution? Do we want to block exec for ever just because some threads
> > are in the fridge?
> > 
> 
> IMHO, It seems to be difficult and buggy to control with PF_NOFREEZE.
> Because,
> The sub-thread can freeze and receive SIG_KILL before the marking of
> PF_NOFREEZE
> and it should be freezable in other cases.

But we do control the ordering in this path no?

> I don't understand why it isn't appropriate for exec to block. The
> exec can freeze. When tasks are thawed, the killed sub-thread will die
> and wake de_thread(). The exec will continue to work from resume.

Because this is fragile. I haven't checked the full set of resources the
task holds when in this path but I can imagine we can introduce lock
dependency on freezing really easily.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-13 14:53 ` Michal Hocko
  2018-11-13 16:18   ` Oleg Nesterov
@ 2018-11-14 11:31   ` Michal Hocko
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-14 11:31 UTC (permalink / raw)
  To: Chanho Min
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Andrew Morton,
	Eric W. Biederman, Christian Brauner, Oleg Nesterov,
	Anna-Maria Gleixner, Alexander Viro, linux-pm, linux-kernel,
	linux-fsdevel, Seungho Park, Inkyu Hwang, Donghwan Jung,
	Jongsung Kim

On Tue 13-11-18 15:53:39, Michal Hocko wrote:
> On Mon 12-11-18 12:54:45, Chanho Min wrote:
> > Suspend fails due to the exec family of functions blocking the freezer.
> > The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> > all sub-threads to die, and we have the deadlock if one of them is frozen.
> > This also can occur with the schedule() waiting for the group thread leader
> > to exit if it is frozen.
> > 
> > In our machine, it causes freeze timeout as bellows.
> > 
> > Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0):
> > setcpushares-ls D ffffffc00008ed70     0  5817   1483 0x0040000d
> >  Call trace:
> > [<ffffffc00008ed70>] __switch_to+0x88/0xa0
> > [<ffffffc000d1c30c>] __schedule+0x1bc/0x720
> > [<ffffffc000d1ca90>] schedule+0x40/0xa8
> > [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640
> > [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090
> > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > [<ffffffc00021c584>] load_script+0x20c/0x228
> > [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240
> > [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8
> > [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48
> > [<ffffffc00008de30>] el0_svc_naked+0x24/0x28
> > 
> > To fix this, make de_thread() freezable. It looks safe and works fine.
> 
> It's been some time since I have looked into this code so bear with me.
> One thing is not really clear to me. Why does it help to exclude this
> particular task from the freezer when it is not sleeping in the freezer.
> I can see how other threads need to be zapped and TASK_WAKEKILL doesn't
> do that but shouldn't we fix that instead?
> 
> Or maybe I am missing something important here.

Just to not distract from the quick fix. This will silence the freezer
but I believe that we need a better fix to not block exec by freezer
long term. It seems much more complicated though so a quick fix makes
some sense for now.

Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-13 18:00     ` Michal Hocko
  2018-11-14 10:18       ` Chanho Min
@ 2018-11-14 14:29       ` Oleg Nesterov
  1 sibling, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2018-11-14 14:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Chanho Min, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Andrew Morton, Eric W. Biederman, Christian Brauner,
	Anna-Maria Gleixner, Alexander Viro, linux-pm, linux-kernel,
	linux-fsdevel, Seungho Park, Inkyu Hwang, Donghwan Jung,
	Jongsung Kim

On 11/13, Michal Hocko wrote:
>
> > > >
> > > > To fix this, make de_thread() freezable. It looks safe and works fine.
> > >
> > > It's been some time since I have looked into this code so bear with me.
> > > One thing is not really clear to me. Why does it help to exclude this
> > > particular task from the freezer
> >
> > we don't exclude it,
> >
> > > when it is not sleeping in the freezer.
> >
> > Yes, it is not sleeping in __refrigerator(), but it does
> >
> > 	schedule();
> > 	freezer_count();
> >
> > so it will enter __refrigerator() right after wakeup. If it won't be woken
> > up we do not care, we can consider it "frozen".
>
> Right, but this is just silencing the freezing code to exclude this
> task, right?

Well yes... but I'd say this tells the freezing code that the caller is frozen,
because it can do nothing till thaw_processes(). Except it can actually call
__refrigerator() if, say, it is killed.

> > > I can see how other threads need to be zapped and TASK_WAKEKILL doesn't
> > > do that but shouldn't we fix that instead?
> >
> > Not sure I understand, but unlikely we can (or want) to make __refrigerator()
> > killable.
>
> Why would that be a problem. If the kill is fatal then why to keep the
> killed task in the fridge?

This is the question to Rafael, but I think that uninterruptible fridge
makes sense.

Because the exiting task can do a lot of things, say IO. So at least we need
to ensure that nobody can be killed after try_to_freeze_tasks() succeeds, and
this needs the changes in kernel/power/process.c and can lead to other problems.

And it is not clear to me why would we want to do this.

> > Otherwise, how can we fix that?
>
> We can mark all threads PF_NOFREEZE and wake them up.

We can't mark them PF_NOFREEZE but of course we could do something else
for de_thread() in particular, see the 1st version of Chanho's fix:
https://lore.kernel.org/lkml/1541671796-8725-1-git-send-email-chanho.min@lge.com/

> This would require
> some more changes of course

Yes,

> but wouldn't that be a more appropriate
> solution? Do we want to block exec for ever just because some threads
> are in the fridge?

Why not?

------------------------------------------------------------------------------
To clarify. speaking of de_thread() in particular, this change can not solve
all problems with freezer because de_thread() is called with cred_guard_mutex
held. And this obviously means that try_to_freeze_tasks() still can fail if
another task waits for this mutex.

But. freezable_schedule() doesn't make the thing worse, we have a lot more
problems (deadlocks) exactly because de_thread() sleeps wating for other threads
with this mutex held.

So I didn't even mention this problem, we need to narrow the scope of this mutex
in any case, so imo this has nothing to do with s/schedule/freezable_schedule/.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-14 10:30         ` Michal Hocko
@ 2018-11-14 14:37           ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2018-11-14 14:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Chanho Min, 'Rafael J. Wysocki', 'Pavel Machek',
	'Len Brown', 'Andrew Morton',
	'Eric W. Biederman', 'Christian Brauner',
	'Anna-Maria Gleixner', 'Alexander Viro',
	linux-pm, linux-kernel, linux-fsdevel, 'Seungho Park',
	'Inkyu Hwang', 'Donghwan Jung',
	'Jongsung Kim'

On 11/14, Michal Hocko wrote:
>
> > I don't understand why it isn't appropriate for exec to block. The
> > exec can freeze. When tasks are thawed, the killed sub-thread will die
> > and wake de_thread(). The exec will continue to work from resume.
>
> Because this is fragile.

I don't really agree, but...

> I haven't checked the full set of resources the
> task holds when in this path but I can imagine we can introduce lock
> dependency on freezing really easily.

And you are right, see another email I sent you a minute ago.

But again, we need to change de_thread to sleep without cred_guard_mutex
anyway, so I think this change is fine. At least in that it can't add the
new problems.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] exec: make de_thread() freezable
  2018-11-12  8:15 ` Oleg Nesterov
@ 2018-11-21 23:35   ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2018-11-21 23:35 UTC (permalink / raw)
  To: Oleg Nesterov, Chanho Min
  Cc: Pavel Machek, Len Brown, Andrew Morton, Eric W. Biederman,
	Christian Brauner, Anna-Maria Gleixner, Alexander Viro, linux-pm,
	linux-kernel, linux-fsdevel, Seungho Park, Inkyu Hwang,
	Donghwan Jung, Jongsung Kim

On Monday, November 12, 2018 9:15:18 AM CET Oleg Nesterov wrote:
> On 11/12, Chanho Min wrote:
> >
> > @@ -1083,7 +1084,7 @@ static int de_thread(struct task_struct *tsk)
> >  	while (sig->notify_count) {
> >  		__set_current_state(TASK_KILLABLE);
> >  		spin_unlock_irq(lock);
> > -		schedule();
> > +		freezable_schedule();
> >  		if (unlikely(__fatal_signal_pending(tsk)))
> >  			goto killed;
> >  		spin_lock_irq(lock);
> > @@ -1111,7 +1112,7 @@ static int de_thread(struct task_struct *tsk)
> >  			__set_current_state(TASK_KILLABLE);
> >  			write_unlock_irq(&tasklist_lock);
> >  			cgroup_threadgroup_change_end(tsk);
> > -			schedule();
> > +			freezable_schedule();
> >  			if (unlikely(__fatal_signal_pending(tsk)))
> >  				goto killed;
> >  		}
> 
> Thanks, looks good to me.
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

Patch applied, thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-21 23:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12  3:54 [PATCH v2] exec: make de_thread() freezable Chanho Min
2018-11-12  8:15 ` Oleg Nesterov
2018-11-21 23:35   ` Rafael J. Wysocki
2018-11-12  9:51 ` Pavel Machek
2018-11-13 14:53 ` Michal Hocko
2018-11-13 16:18   ` Oleg Nesterov
2018-11-13 18:00     ` Michal Hocko
2018-11-14 10:18       ` Chanho Min
2018-11-14 10:30         ` Michal Hocko
2018-11-14 14:37           ` Oleg Nesterov
2018-11-14 14:29       ` Oleg Nesterov
2018-11-14 11:31   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).