linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
@ 2021-07-07 12:49 Vasily Gorbik
  2021-07-08 10:12 ` Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Vasily Gorbik @ 2021-07-07 12:49 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek
  Cc: Joe Lawrence, Heiko Carstens, Sven Schnelle, Sumanth Korikkar,
	live-patching, linux-kernel

On an idle system with large amount of cpus it might happen that
klp_update_patch_state() is not reached in do_idle() for a long periods
of time. With debug messages enabled log is filled with:
[  499.442643] livepatch: klp_try_switch_task: swapper/63:0 is running

without any signs of progress. Ending up with "failed to complete
transition".

On s390 LPAR with 128 cpus not a single transition is able to complete
and livepatch kselftests fail.

To deal with that, make sure we break out of do_idle() inner loop to
reach klp_update_patch_state() by marking idle tasks as NEED_RESCHED
as well as kick cpus out of idle state.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 kernel/livepatch/transition.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 3a4beb9395c4..793eba46e970 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
 	for_each_possible_cpu(cpu) {
 		task = idle_task(cpu);
 		if (cpu_online(cpu)) {
-			if (!klp_try_switch_task(task))
+			if (!klp_try_switch_task(task)) {
 				complete = false;
+				set_tsk_need_resched(task);
+				kick_process(task);
+			}
 		} else if (task->patch_state != klp_target_state) {
 			/* offline idle tasks can be switched immediately */
 			clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
-- 
2.25.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
  2021-07-07 12:49 [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition Vasily Gorbik
@ 2021-07-08 10:12 ` Petr Mladek
  2021-08-06 23:59 ` Suraj Jitindar Singh
  2021-08-27 12:54 ` Petr Mladek
  2 siblings, 0 replies; 6+ messages in thread
From: Petr Mladek @ 2021-07-08 10:12 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Wed 2021-07-07 14:49:38, Vasily Gorbik wrote:
> On an idle system with large amount of cpus it might happen that
> klp_update_patch_state() is not reached in do_idle() for a long periods
> of time. With debug messages enabled log is filled with:
> [  499.442643] livepatch: klp_try_switch_task: swapper/63:0 is running

I see. I guess that the problem is only when CONFIG_NO_HZ is enabled.
Do I get it correctly, please?

> without any signs of progress. Ending up with "failed to complete
> transition".
> 
> On s390 LPAR with 128 cpus not a single transition is able to complete
> and livepatch kselftests fail.
> 
> To deal with that, make sure we break out of do_idle() inner loop to
> reach klp_update_patch_state() by marking idle tasks as NEED_RESCHED
> as well as kick cpus out of idle state.
>
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  kernel/livepatch/transition.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 3a4beb9395c4..793eba46e970 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
>  	for_each_possible_cpu(cpu) {
>  		task = idle_task(cpu);
>  		if (cpu_online(cpu)) {
> -			if (!klp_try_switch_task(task))
> +			if (!klp_try_switch_task(task)) {
>  				complete = false;
> +				set_tsk_need_resched(task);
> +				kick_process(task);

First, we should kick the idle threads in klp_send_signals().
It already solves similar problem when normal threads and kthreads
stay in the incorruptible sleep for too long.

Second, the way looks a bit hacky to me. need_resched() depends on
the currect implementation of the idle loop. kick_process() has
a completely different purpose and does checks that do not fit well
this use-case.

I wonder if wake_up_nohz_cpu() would fit better here. Please, add
scheduler people into CC, namely:

    Ingo Molnar <mingo@redhat.com>
    Peter Zijlstra <peterz@infradead.org>

and NOHZ guys:

    Frederic Weisbecker <fweisbec@gmail.com>
    Thomas Gleixner <tglx@linutronix.de>


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
  2021-07-07 12:49 [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition Vasily Gorbik
  2021-07-08 10:12 ` Petr Mladek
@ 2021-08-06 23:59 ` Suraj Jitindar Singh
  2021-08-27 12:54 ` Petr Mladek
  2 siblings, 0 replies; 6+ messages in thread
From: Suraj Jitindar Singh @ 2021-08-06 23:59 UTC (permalink / raw)
  To: Vasily Gorbik, Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek
  Cc: Joe Lawrence, Heiko Carstens, Sven Schnelle, Sumanth Korikkar,
	live-patching, linux-kernel

On Wed, 2021-07-07 at 14:49 +0200, Vasily Gorbik wrote:
> On an idle system with large amount of cpus it might happen that
> klp_update_patch_state() is not reached in do_idle() for a long
> periods
> of time. With debug messages enabled log is filled with:
> [  499.442643] livepatch: klp_try_switch_task: swapper/63:0 is
> running
> 
> without any signs of progress. Ending up with "failed to complete
> transition".
> 
> On s390 LPAR with 128 cpus not a single transition is able to
> complete
> and livepatch kselftests fail.

This also speeds up completion of the transition on high cpu count arm
instances.

Interested in seeing the correct way to address this.

Suraj.

> 
> To deal with that, make sure we break out of do_idle() inner loop to
> reach klp_update_patch_state() by marking idle tasks as NEED_RESCHED
> as well as kick cpus out of idle state.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  kernel/livepatch/transition.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/livepatch/transition.c
> b/kernel/livepatch/transition.c
> index 3a4beb9395c4..793eba46e970 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
>  	for_each_possible_cpu(cpu) {
>  		task = idle_task(cpu);
>  		if (cpu_online(cpu)) {
> -			if (!klp_try_switch_task(task))
> +			if (!klp_try_switch_task(task)) {
>  				complete = false;
> +				set_tsk_need_resched(task);
> +				kick_process(task);
> +			}
>  		} else if (task->patch_state != klp_target_state) {
>  			/* offline idle tasks can be switched
> immediately */
>  			clear_tsk_thread_flag(task, TIF_PATCH_PENDING);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
  2021-07-07 12:49 [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition Vasily Gorbik
  2021-07-08 10:12 ` Petr Mladek
  2021-08-06 23:59 ` Suraj Jitindar Singh
@ 2021-08-27 12:54 ` Petr Mladek
  2021-09-09  8:54   ` Vasily Gorbik
  2 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2021-08-27 12:54 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Wed 2021-07-07 14:49:38, Vasily Gorbik wrote:
> On an idle system with large amount of cpus it might happen that
> klp_update_patch_state() is not reached in do_idle() for a long periods
> of time. With debug messages enabled log is filled with:
> [  499.442643] livepatch: klp_try_switch_task: swapper/63:0 is running
> 
> without any signs of progress. Ending up with "failed to complete
> transition".
> 
> On s390 LPAR with 128 cpus not a single transition is able to complete
> and livepatch kselftests fail.
> 
> To deal with that, make sure we break out of do_idle() inner loop to
> reach klp_update_patch_state() by marking idle tasks as NEED_RESCHED
> as well as kick cpus out of idle state.

I see.

> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  kernel/livepatch/transition.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 3a4beb9395c4..793eba46e970 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
>  	for_each_possible_cpu(cpu) {
>  		task = idle_task(cpu);
>  		if (cpu_online(cpu)) {
> -			if (!klp_try_switch_task(task))
> +			if (!klp_try_switch_task(task)) {
>  				complete = false;
> +				set_tsk_need_resched(task);

Is this really needed?

> +				kick_process(task);

This would probably do the job. Well, I wonder if the following is
a bit cleaner.

		wake_up_if_idle(cpu);


Also, please do this in klp_send_signals(). We kick there all other
tasks that block the transition for too long.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
  2021-08-27 12:54 ` Petr Mladek
@ 2021-09-09  8:54   ` Vasily Gorbik
  2021-09-10  8:08     ` Petr Mladek
  0 siblings, 1 reply; 6+ messages in thread
From: Vasily Gorbik @ 2021-09-09  8:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Fri, Aug 27, 2021 at 02:54:39PM +0200, Petr Mladek wrote:
> On Wed 2021-07-07 14:49:38, Vasily Gorbik wrote:
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
> >  	for_each_possible_cpu(cpu) {
> >  		task = idle_task(cpu);
> >  		if (cpu_online(cpu)) {
> > -			if (!klp_try_switch_task(task))
> > +			if (!klp_try_switch_task(task)) {
> >  				complete = false;
> > +				set_tsk_need_resched(task);
> 
> Is this really needed?

Yes, otherwise the inner idle loop is not left and
klp_update_patch_state() is not reached. Only waking up idle
cpus is not enough.

> > +				kick_process(task);
> 
> This would probably do the job. Well, I wonder if the following is
> a bit cleaner.
> 
> 		wake_up_if_idle(cpu);

wake_up_if_idle() is nice way to identify idle cpus, but it does not
force idle task rescheduling in our case.

> Also, please do this in klp_send_signals(). We kick there all other
> tasks that block the transition for too long.

#define SIGNALS_TIMEOUT 15

Hm, kicking the idle threads in klp_send_signals() means extra 15 seconds
delay for every transition in our case and failing kselftests:

# --- expected
# +++ result
...
#  livepatch: 'test_klp_livepatch': starting patching transition
# +livepatch: signaling remaining tasks
#  livepatch: 'test_klp_livepatch': completing patching transition

BTW, for x86 I made a lousy tests with 128 cpus in kvm. With default
CONFIG_NO_HZ_IDLE=y kselftests are failing the same way. Sometimes
transition times out as well, despite NO_HZ options configurations.

Jul 09 11:43:33 q.q kernel: livepatch: 'test_klp_livepatch': starting patching transition
Jul 09 11:44:37 q.q kernel: livepatch: 'test_klp_livepatch': starting unpatching transition
Jul 09 11:45:40 q.q ERROR: failed to disable livepatch test_klp_livepatch

I understand this 15 seconds delay for loaded system and tasks doing real
work is good, but those lazy idle "running" tasks could be kicked right
away with no harm done, right? Given we are able to identify them reliably.
I'll send another patch version.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition
  2021-09-09  8:54   ` Vasily Gorbik
@ 2021-09-10  8:08     ` Petr Mladek
  0 siblings, 0 replies; 6+ messages in thread
From: Petr Mladek @ 2021-09-10  8:08 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Thu 2021-09-09 10:54:05, Vasily Gorbik wrote:
> On Fri, Aug 27, 2021 at 02:54:39PM +0200, Petr Mladek wrote:
> > On Wed 2021-07-07 14:49:38, Vasily Gorbik wrote:
> > > --- a/kernel/livepatch/transition.c
> > > +++ b/kernel/livepatch/transition.c
> > > @@ -415,8 +415,11 @@ void klp_try_complete_transition(void)
> > >  	for_each_possible_cpu(cpu) {
> > >  		task = idle_task(cpu);
> > >  		if (cpu_online(cpu)) {
> > > -			if (!klp_try_switch_task(task))
> > > +			if (!klp_try_switch_task(task)) {
> > >  				complete = false;
> > > +				set_tsk_need_resched(task);
> > 
> > Is this really needed?
> 
> Yes, otherwise the inner idle loop is not left and
> klp_update_patch_state() is not reached. Only waking up idle
> cpus is not enough.

I see.

> > Also, please do this in klp_send_signals(). We kick there all other
> > tasks that block the transition for too long.
> 
> #define SIGNALS_TIMEOUT 15
> 
> Hm, kicking the idle threads in klp_send_signals() means extra 15 seconds
> delay for every transition in our case and failing kselftests:
>
> I understand this 15 seconds delay for loaded system and tasks doing real
> work is good,

Yup. Also normal processes should not stay in the running state
for this long. They are typically migrated quickly. But the idle task is
special.

> but those lazy idle "running" tasks could be kicked right
> away with no harm done, right?

Fair enough.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-10  8:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 12:49 [RFC PATCH] livepatch: Kick idle cpu's tasks to perform transition Vasily Gorbik
2021-07-08 10:12 ` Petr Mladek
2021-08-06 23:59 ` Suraj Jitindar Singh
2021-08-27 12:54 ` Petr Mladek
2021-09-09  8:54   ` Vasily Gorbik
2021-09-10  8:08     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).