live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] livepatch: Speed up transition retries
@ 2021-07-07 12:49 Vasily Gorbik
  2021-07-08 10:35 ` Petr Mladek
  0 siblings, 1 reply; 4+ messages in thread
From: Vasily Gorbik @ 2021-07-07 12:49 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek
  Cc: Joe Lawrence, Heiko Carstens, Sven Schnelle, Sumanth Korikkar,
	live-patching, linux-kernel

That's just a racy hack for now for demonstration purposes.

On a s390 system with large amount of cpus
klp_try_complete_transition() often cannot be "complete" from the first
attempt. klp_try_complete_transition() schedules itself as delayed work
after a second delay. This accumulates to significant amount of time when
there are large number of livepatching transitions.

This patch tries to minimize this delay to counting processes which still
need to be transitioned and then scheduling
klp_try_complete_transition() right away.

For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
from
real    1m11.837s
user    0m0.603s
sys     0m10.940s

to
real    0m14.550s
user    0m0.420s
sys     0m5.779s

And qa_test_klp run time from
real    5m15.950s
user    0m34.447s
sys     15m11.345s

to
real    3m51.987s
user    0m27.074s
sys     9m37.301s

Would smth like that be useful for production use cases?
Any ideas how to approach that more gracefully?

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 kernel/livepatch/transition.c | 41 +++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 793eba46e970..fc4bb7a4a116 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -26,6 +26,8 @@ static int klp_target_state = KLP_UNDEFINED;
 
 static unsigned int klp_signals_cnt;
 
+static atomic_t klp_procs;
+
 /*
  * This work can be performed periodically to finish patching or unpatching any
  * "straggler" tasks which failed to transition in the first attempt.
@@ -181,8 +183,15 @@ void klp_update_patch_state(struct task_struct *task)
 	 *    of func->transition, if klp_ftrace_handler() is called later on
 	 *    the same CPU.  See __klp_disable_patch().
 	 */
-	if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
+	if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING)) {
 		task->patch_state = READ_ONCE(klp_target_state);
+		if (atomic_read(&klp_procs) == 0)
+			pr_err("klp_procs misaccounting\n");
+		else if (atomic_sub_return(1, &klp_procs) == 0) {
+			if (delayed_work_pending(&klp_transition_work))
+				mod_delayed_work(system_wq, &klp_transition_work, 0);
+		}
+	}
 
 	preempt_enable_notrace();
 }
@@ -320,7 +329,8 @@ static bool klp_try_switch_task(struct task_struct *task)
 
 	success = true;
 
-	clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
+	if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
+		atomic_sub(1, &klp_procs);
 	task->patch_state = klp_target_state;
 
 done:
@@ -402,11 +412,6 @@ void klp_try_complete_transition(void)
 	 * Usually this will transition most (or all) of the tasks on a system
 	 * unless the patch includes changes to a very common function.
 	 */
-	read_lock(&tasklist_lock);
-	for_each_process_thread(g, task)
-		if (!klp_try_switch_task(task))
-			complete = false;
-	read_unlock(&tasklist_lock);
 
 	/*
 	 * Ditto for the idle "swapper" tasks.
@@ -424,10 +429,17 @@ void klp_try_complete_transition(void)
 			/* offline idle tasks can be switched immediately */
 			clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
 			task->patch_state = klp_target_state;
+			atomic_sub(1, &klp_procs);
 		}
 	}
 	put_online_cpus();
 
+	read_lock(&tasklist_lock);
+	for_each_process_thread(g, task)
+		if (!klp_try_switch_task(task))
+			complete = false;
+	read_unlock(&tasklist_lock);
+
 	if (!complete) {
 		if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT))
 			klp_send_signals();
@@ -438,8 +450,8 @@ void klp_try_complete_transition(void)
 		 * later and/or wait for other methods like kernel exit
 		 * switching.
 		 */
-		schedule_delayed_work(&klp_transition_work,
-				      round_jiffies_relative(HZ));
+		schedule_delayed_work(&klp_transition_work, atomic_read(&klp_procs) ?
+				      round_jiffies_relative(HZ) : 0);
 		return;
 	}
 
@@ -473,6 +485,7 @@ void klp_start_transition(void)
 		  klp_transition_patch->mod->name,
 		  klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
 
+	atomic_set(&klp_procs, 0);
 	/*
 	 * Mark all normal tasks as needing a patch state update.  They'll
 	 * switch either in klp_try_complete_transition() or as they exit the
@@ -480,8 +493,10 @@ void klp_start_transition(void)
 	 */
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, task)
-		if (task->patch_state != klp_target_state)
+		if (task->patch_state != klp_target_state) {
 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
+			atomic_inc(&klp_procs);
+		}
 	read_unlock(&tasklist_lock);
 
 	/*
@@ -491,8 +506,10 @@ void klp_start_transition(void)
 	 */
 	for_each_possible_cpu(cpu) {
 		task = idle_task(cpu);
-		if (task->patch_state != klp_target_state)
+		if (task->patch_state != klp_target_state) {
 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
+			atomic_inc(&klp_procs);
+		}
 	}
 
 	klp_signals_cnt = 0;
@@ -614,6 +631,8 @@ void klp_reverse_transition(void)
 void klp_copy_process(struct task_struct *child)
 {
 	child->patch_state = current->patch_state;
+	if (child->patch_state != klp_target_state)
+		atomic_add(1, &klp_procs);
 
 	/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
 }
-- 
2.25.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] livepatch: Speed up transition retries
  2021-07-07 12:49 [RFC PATCH] livepatch: Speed up transition retries Vasily Gorbik
@ 2021-07-08 10:35 ` Petr Mladek
  2021-07-08 13:19   ` Vasily Gorbik
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Mladek @ 2021-07-08 10:35 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Wed 2021-07-07 14:49:41, Vasily Gorbik wrote:
> That's just a racy hack for now for demonstration purposes.
> 
> On a s390 system with large amount of cpus
> klp_try_complete_transition() often cannot be "complete" from the first
> attempt. klp_try_complete_transition() schedules itself as delayed work
> after a second delay. This accumulates to significant amount of time when
> there are large number of livepatching transitions.
> 
> This patch tries to minimize this delay to counting processes which still
> need to be transitioned and then scheduling
> klp_try_complete_transition() right away.
> 
> For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
> from
> real    1m11.837s
> user    0m0.603s
> sys     0m10.940s
> 
> to
> real    0m14.550s
> user    0m0.420s
> sys     0m5.779s
> 
> And qa_test_klp run time from
> real    5m15.950s
> user    0m34.447s
> sys     15m11.345s
> 
> to
> real    3m51.987s
> user    0m27.074s
> sys     9m37.301s
> 
> Would smth like that be useful for production use cases?
> Any ideas how to approach that more gracefully?

Honestly, I do not see a real life use case for this, except maybe
speeding up a test suite.

The livepatch transition is more about reliability than about speed.
In the real life, a livepatch will be applied only once in a while.

We have spent weeks thinking about and discussing the consistency
model, code, and barriers to handle races correctly. Especially,
klp_update_patch_state() is a super-sensitive beast because it is
called without klp_lock. It might be pretty hard to synchronize
it with klp_reverse_transition() or klp_force_transition().

You would need to come up with a really convincing use case and
numbers to make it worth the effort.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] livepatch: Speed up transition retries
  2021-07-08 10:35 ` Petr Mladek
@ 2021-07-08 13:19   ` Vasily Gorbik
  2021-07-08 14:57     ` Petr Mladek
  0 siblings, 1 reply; 4+ messages in thread
From: Vasily Gorbik @ 2021-07-08 13:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Thu, Jul 08, 2021 at 12:35:24PM +0200, Petr Mladek wrote:
> On Wed 2021-07-07 14:49:41, Vasily Gorbik wrote:
> > That's just a racy hack for now for demonstration purposes.
> > 
> > For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
> > from
> > real    1m11.837s
> > user    0m0.603s
> > sys     0m10.940s
> > 
> > to
> > real    0m14.550s
> > user    0m0.420s
> > sys     0m5.779s
> > 
> > Would smth like that be useful for production use cases?
> > Any ideas how to approach that more gracefully?
> 
> Honestly, I do not see a real life use case for this, except maybe
> speeding up a test suite.
> 
> The livepatch transition is more about reliability than about speed.
> In the real life, a livepatch will be applied only once in a while.

That's what I thought. Thanks for looking. Dropping this one.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] livepatch: Speed up transition retries
  2021-07-08 13:19   ` Vasily Gorbik
@ 2021-07-08 14:57     ` Petr Mladek
  0 siblings, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2021-07-08 14:57 UTC (permalink / raw)
  To: Vasily Gorbik
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence,
	Heiko Carstens, Sven Schnelle, Sumanth Korikkar, live-patching,
	linux-kernel

On Thu 2021-07-08 15:19:25, Vasily Gorbik wrote:
> On Thu, Jul 08, 2021 at 12:35:24PM +0200, Petr Mladek wrote:
> > On Wed 2021-07-07 14:49:41, Vasily Gorbik wrote:
> > > That's just a racy hack for now for demonstration purposes.
> > > 
> > > For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
> > > from
> > > real    1m11.837s
> > > user    0m0.603s
> > > sys     0m10.940s
> > > 
> > > to
> > > real    0m14.550s
> > > user    0m0.420s
> > > sys     0m5.779s
> > > 
> > > Would smth like that be useful for production use cases?
> > > Any ideas how to approach that more gracefully?
> > 
> > Honestly, I do not see a real life use case for this, except maybe
> > speeding up a test suite.
> > 
> > The livepatch transition is more about reliability than about speed.
> > In the real life, a livepatch will be applied only once in a while.
> 
> That's what I thought. Thanks for looking. Dropping this one.

If you still wanted to speed up the transition from some reason
then an easy win might be to call klp_send_signals() earlier.

Well, my view is the following. The primary livepatching task is
to fix some broken/vulnerable functionality on a running kernel.
It should ideally happen on background and do not affect or slow
down the existing work load.

klp_send_signals() is not ideal. The fake signal interrupts syscalls
and they need to get restarted. Also the function wakes up a lot of
tasks and might increase load. Hence, it is used as a last resort that
allows to finish the transition in a reasonable time frame.

That said, the current timeouts are arbitrary chosen values based
rather on a common sense than on some measurement. I could imagine that
we could modify them or allow to trigger klp_send_signal() via
sysfs when there is a good reason.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-08 14:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 12:49 [RFC PATCH] livepatch: Speed up transition retries Vasily Gorbik
2021-07-08 10:35 ` Petr Mladek
2021-07-08 13:19   ` Vasily Gorbik
2021-07-08 14:57     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).