[PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
@ 2023-01-20 22:12 Seth Forshee (DigitalOcean)
  2023-01-20 22:12 ` [PATCH 1/2] livepatch: add an interface for safely switching kthreads Seth Forshee (DigitalOcean)
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Seth Forshee (DigitalOcean) @ 2023-01-20 22:12 UTC (permalink / raw)
  To: Petr Mladek, Jason Wang, Michael S. Tsirkin, Jiri Kosina,
	Miroslav Benes, Joe Lawrence, Josh Poimboeuf
  Cc: virtualization, kvm, Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

We've fairly regularaly seen liveptches which cannot transition within kpatch's
timeout period due to busy vhost worker kthreads. In looking for a solution the
only answer I found was to call klp_update_patch_state() from a safe location.
I tried adding this call to vhost_worker(), and it works, but this creates the
potential for problems if a livepatch attempted to patch vhost_worker().
Without a call to klp_update_patch_state() fully loaded vhost kthreads can
never switch because vhost_worker() will always appear on the stack, but with
the call these kthreads can switch but will still be running the old version of
vhost_worker().

To avoid this situation I've added a new function, klp_switch_current(), which
switches the current task only if its stack does not include any function being
patched. This allows kthreads to safely attempt switching themselves if a patch
is pending. There is at least one downside, however. Since there's no way for
the kthread to track whether it has already tried to switch for a pending patch
it can end up calling klp_switch_current() repeatedly when it can never be
safely switched.

I don't know whether this is the right solution, and I'm happy to try out other
suggestions. But in my testing these patches proved effective in consistently
switching heavily loaded vhost kthreads almost immediately.

To: Josh Poimboeuf <jpoimboe@kernel.org>
To: Jiri Kosina <jikos@kernel.org>
To: Miroslav Benes <mbenes@suse.cz>
To: Petr Mladek <pmladek@suse.com>
To: Joe Lawrence <joe.lawrence@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: live-patching@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>

---
Seth Forshee (DigitalOcean) (2):
      livepatch: add an interface for safely switching kthreads
      vhost: check for pending livepatches from vhost worker kthreads

 drivers/vhost/vhost.c         |  4 ++++
 include/linux/livepatch.h     |  2 ++
 kernel/livepatch/transition.c | 11 +++++++++++
 3 files changed, 17 insertions(+)
---
base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a

Best regards,
-- 
Seth Forshee (DigitalOcean) <sforshee@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/2] livepatch: add an interface for safely switching kthreads
  2023-01-20 22:12 [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads Seth Forshee (DigitalOcean)
@ 2023-01-20 22:12 ` Seth Forshee (DigitalOcean)
  2023-01-20 22:12 ` [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads Seth Forshee (DigitalOcean)
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Seth Forshee (DigitalOcean) @ 2023-01-20 22:12 UTC (permalink / raw)
  To: Petr Mladek, Jason Wang, Michael S. Tsirkin, Jiri Kosina,
	Miroslav Benes, Joe Lawrence, Josh Poimboeuf
  Cc: virtualization, kvm, Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

The only existing solution for transitioning a busy kernel thread is to
call klp_update_patch_state() from a safe location. However, this does
not account for the fact that even the main function of the kthread
could potentially be patched, leaving the patch switched but still
running the old version of a patched function.

To address this, add klp_switch_current() for use by kthreads to safely
transition themselves. This is just a wrapper around
klp_try_switch_task(), which can already transition the current task
with stack checking.

Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
---
 include/linux/livepatch.h     |  2 ++
 kernel/livepatch/transition.c | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 293e29960c6e..00b5981684a4 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -199,6 +199,7 @@ void klp_module_going(struct module *mod);
 
 void klp_copy_process(struct task_struct *child);
 void klp_update_patch_state(struct task_struct *task);
+void klp_switch_current(void);
 
 static inline bool klp_patch_pending(struct task_struct *task)
 {
@@ -240,6 +241,7 @@ static inline int klp_module_coming(struct module *mod) { return 0; }
 static inline void klp_module_going(struct module *mod) {}
 static inline bool klp_patch_pending(struct task_struct *task) { return false; }
 static inline void klp_update_patch_state(struct task_struct *task) {}
+static inline void klp_switch_current(void) {}
 static inline void klp_copy_process(struct task_struct *child) {}
 
 static inline
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..ff328b912916 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -455,6 +455,17 @@ void klp_try_complete_transition(void)
 		klp_free_replaced_patches_async(patch);
 }
 
+/*
+ * Safely try to switch the current task to the target patch state. This can
+ * be used by kthreads to transition if no to-be-patched or to-be-unpatched
+ * functions are on the call stack.
+ */
+void klp_switch_current(void)
+{
+	klp_try_switch_task(current);
+}
+EXPORT_SYMBOL_GPL(klp_switch_current);
+
 /*
  * Start the transition to the specified target patch state so tasks can begin
  * switching to it.

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-20 22:12 [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads Seth Forshee (DigitalOcean)
  2023-01-20 22:12 ` [PATCH 1/2] livepatch: add an interface for safely switching kthreads Seth Forshee (DigitalOcean)
@ 2023-01-20 22:12 ` Seth Forshee (DigitalOcean)
  2023-01-24 14:17   ` Petr Mladek
  2023-01-22  8:34 ` [PATCH 0/2] vhost: improve livepatch switching for heavily loaded " Michael S. Tsirkin
  2023-01-26 17:03 ` Petr Mladek
  3 siblings, 1 reply; 36+ messages in thread
From: Seth Forshee (DigitalOcean) @ 2023-01-20 22:12 UTC (permalink / raw)
  To: Petr Mladek, Jason Wang, Michael S. Tsirkin, Jiri Kosina,
	Miroslav Benes, Joe Lawrence, Josh Poimboeuf
  Cc: virtualization, kvm, Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

Livepatch relies on stack checking of sleeping tasks to switch kthreads,
so a busy kthread can block a livepatch transition indefinitely. We've
seen this happen fairly often with busy vhost kthreads.

Add a check to call klp_switch_current() from vhost_worker() when a
livepatch is pending. In testing this allowed vhost kthreads to switch
immediately when they had previously blocked livepatch transitions for
long periods of time.

Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
---
 drivers/vhost/vhost.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cbe72bfd2f1f..d8624f1f2d64 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -30,6 +30,7 @@
 #include <linux/interval_tree_generic.h>
 #include <linux/nospec.h>
 #include <linux/kcov.h>
+#include <linux/livepatch.h>
 
 #include "vhost.h"
 
@@ -366,6 +367,9 @@ static int vhost_worker(void *data)
 			if (need_resched())
 				schedule();
 		}
+
+		if (unlikely(klp_patch_pending(current)))
+			klp_switch_current();
 	}
 	kthread_unuse_mm(dev->mm);
 	return 0;

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-20 22:12 [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads Seth Forshee (DigitalOcean)
  2023-01-20 22:12 ` [PATCH 1/2] livepatch: add an interface for safely switching kthreads Seth Forshee (DigitalOcean)
  2023-01-20 22:12 ` [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads Seth Forshee (DigitalOcean)
@ 2023-01-22  8:34 ` Michael S. Tsirkin
  2023-01-26 17:03 ` Petr Mladek
  3 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2023-01-22  8:34 UTC (permalink / raw)
  To: Seth Forshee (DigitalOcean)
  Cc: Petr Mladek, Jason Wang, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm,
	Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

On Fri, Jan 20, 2023 at 04:12:20PM -0600, Seth Forshee (DigitalOcean) wrote:
> We've fairly regularaly seen liveptches which cannot transition within kpatch's
> timeout period due to busy vhost worker kthreads. In looking for a solution the
> only answer I found was to call klp_update_patch_state() from a safe location.
> I tried adding this call to vhost_worker(), and it works, but this creates the
> potential for problems if a livepatch attempted to patch vhost_worker().
> Without a call to klp_update_patch_state() fully loaded vhost kthreads can
> never switch because vhost_worker() will always appear on the stack, but with
> the call these kthreads can switch but will still be running the old version of
> vhost_worker().
> 
> To avoid this situation I've added a new function, klp_switch_current(), which
> switches the current task only if its stack does not include any function being
> patched. This allows kthreads to safely attempt switching themselves if a patch
> is pending. There is at least one downside, however. Since there's no way for
> the kthread to track whether it has already tried to switch for a pending patch
> it can end up calling klp_switch_current() repeatedly when it can never be
> safely switched.
> 
> I don't know whether this is the right solution, and I'm happy to try out other
> suggestions. But in my testing these patches proved effective in consistently
> switching heavily loaded vhost kthreads almost immediately.
> 
> To: Josh Poimboeuf <jpoimboe@kernel.org>
> To: Jiri Kosina <jikos@kernel.org>
> To: Miroslav Benes <mbenes@suse.cz>
> To: Petr Mladek <pmladek@suse.com>
> To: Joe Lawrence <joe.lawrence@redhat.com>
> To: "Michael S. Tsirkin" <mst@redhat.com>
> To: Jason Wang <jasowang@redhat.com>
> Cc: live-patching@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kvm@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>

Don't know enough about live patching to judge this.

I'll let livepatch maintainers judge this, and merge through
the livepatch tree if appropriate. For that:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
but pls underestand this is more a 'looks ok superficially and
I don't have better ideas'  than 'I have reviewed this thoroughly'.

> 
> ---
> Seth Forshee (DigitalOcean) (2):
>       livepatch: add an interface for safely switching kthreads
>       vhost: check for pending livepatches from vhost worker kthreads
> 
>  drivers/vhost/vhost.c         |  4 ++++
>  include/linux/livepatch.h     |  2 ++
>  kernel/livepatch/transition.c | 11 +++++++++++
>  3 files changed, 17 insertions(+)
> ---
> base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
> change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a
> 
> Best regards,
> -- 
> Seth Forshee (DigitalOcean) <sforshee@kernel.org>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-20 22:12 ` [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads Seth Forshee (DigitalOcean)
@ 2023-01-24 14:17   ` Petr Mladek
  2023-01-24 17:21     ` Seth Forshee
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-24 14:17 UTC (permalink / raw)
  To: Seth Forshee (DigitalOcean)
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm,
	Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> so a busy kthread can block a livepatch transition indefinitely. We've
> seen this happen fairly often with busy vhost kthreads.

To be precise, it would be "indefinitely" only when the kthread never
sleeps.

But yes. I believe that the problem is real. It might be almost
impossible to livepatch some busy kthreads, especially when they
have a dedicated CPU.


> Add a check to call klp_switch_current() from vhost_worker() when a
> livepatch is pending. In testing this allowed vhost kthreads to switch
> immediately when they had previously blocked livepatch transitions for
> long periods of time.
> 
> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
> ---
>  drivers/vhost/vhost.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cbe72bfd2f1f..d8624f1f2d64 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
>  			if (need_resched())
>  				schedule();
>  		}
> +
> +		if (unlikely(klp_patch_pending(current)))
> +			klp_switch_current();

I suggest to use the following intead:

		if (unlikely(klp_patch_pending(current)))
			klp_update_patch_state(current);

We already use this in do_idle(). The reason is basically the same.
It is almost impossible to livepatch the idle task when a CPU is
very idle.

klp_update_patch_state(current) does not check the stack.
It switches the task immediately.

It should be safe because the kthread never leaves vhost_worker().
It means that the same kthread could never re-enter this function
and use the new code.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-24 14:17   ` Petr Mladek
@ 2023-01-24 17:21     ` Seth Forshee
  2023-01-25 11:34       ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Seth Forshee @ 2023-01-24 17:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Tue, Jan 24, 2023 at 03:17:43PM +0100, Petr Mladek wrote:
> On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> > Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> > so a busy kthread can block a livepatch transition indefinitely. We've
> > seen this happen fairly often with busy vhost kthreads.
> 
> To be precise, it would be "indefinitely" only when the kthread never
> sleeps.
> 
> But yes. I believe that the problem is real. It might be almost
> impossible to livepatch some busy kthreads, especially when they
> have a dedicated CPU.
> 
> 
> > Add a check to call klp_switch_current() from vhost_worker() when a
> > livepatch is pending. In testing this allowed vhost kthreads to switch
> > immediately when they had previously blocked livepatch transitions for
> > long periods of time.
> > 
> > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
> > ---
> >  drivers/vhost/vhost.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index cbe72bfd2f1f..d8624f1f2d64 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
> >  			if (need_resched())
> >  				schedule();
> >  		}
> > +
> > +		if (unlikely(klp_patch_pending(current)))
> > +			klp_switch_current();
> 
> I suggest to use the following intead:
> 
> 		if (unlikely(klp_patch_pending(current)))
> 			klp_update_patch_state(current);
> 
> We already use this in do_idle(). The reason is basically the same.
> It is almost impossible to livepatch the idle task when a CPU is
> very idle.
> 
> klp_update_patch_state(current) does not check the stack.
> It switches the task immediately.
> 
> It should be safe because the kthread never leaves vhost_worker().
> It means that the same kthread could never re-enter this function
> and use the new code.

My knowledge of livepatching internals is fairly limited, so I'll accept
it if you say that it's safe to do it this way. But let me ask about one
scenario.

Let's say that a livepatch is loaded which replaces vhost_worker(). New
vhost worker threads are started which use the replacement function. Now
if the patch is disabled, these new worker threads would be switched
despite still running the code from the patch module, correct? Could the
module then be unloaded, freeing the memory containing the code these
kthreads are executing?

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-24 17:21     ` Seth Forshee
@ 2023-01-25 11:34       ` Petr Mladek
  2023-01-25 16:57         ` Seth Forshee
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-25 11:34 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Tue 2023-01-24 11:21:39, Seth Forshee wrote:
> On Tue, Jan 24, 2023 at 03:17:43PM +0100, Petr Mladek wrote:
> > On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> > > Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> > > so a busy kthread can block a livepatch transition indefinitely. We've
> > > seen this happen fairly often with busy vhost kthreads.
> > 
> > To be precise, it would be "indefinitely" only when the kthread never
> > sleeps.
> > 
> > But yes. I believe that the problem is real. It might be almost
> > impossible to livepatch some busy kthreads, especially when they
> > have a dedicated CPU.
> > 
> > 
> > > Add a check to call klp_switch_current() from vhost_worker() when a
> > > livepatch is pending. In testing this allowed vhost kthreads to switch
> > > immediately when they had previously blocked livepatch transitions for
> > > long periods of time.
> > > 
> > > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
> > > ---
> > >  drivers/vhost/vhost.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index cbe72bfd2f1f..d8624f1f2d64 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
> > >  			if (need_resched())
> > >  				schedule();
> > >  		}
> > > +
> > > +		if (unlikely(klp_patch_pending(current)))
> > > +			klp_switch_current();
> > 
> > I suggest to use the following intead:
> > 
> > 		if (unlikely(klp_patch_pending(current)))
> > 			klp_update_patch_state(current);
> > 
> > We already use this in do_idle(). The reason is basically the same.
> > It is almost impossible to livepatch the idle task when a CPU is
> > very idle.
> > 
> > klp_update_patch_state(current) does not check the stack.
> > It switches the task immediately.
> > 
> > It should be safe because the kthread never leaves vhost_worker().
> > It means that the same kthread could never re-enter this function
> > and use the new code.
> 
> My knowledge of livepatching internals is fairly limited, so I'll accept
> it if you say that it's safe to do it this way. But let me ask about one
> scenario.
> 
> Let's say that a livepatch is loaded which replaces vhost_worker(). New
> vhost worker threads are started which use the replacement function. Now
> if the patch is disabled, these new worker threads would be switched
> despite still running the code from the patch module, correct? Could the
> module then be unloaded, freeing the memory containing the code these
> kthreads are executing?

Great catch! Yes, this might theoretically happen.

The above scenario would require calling klp_update_patch_state() from
the code in the livepatch module. It is not possible at the moment because
this function is not exported for modules.

Hmm, the same problem might be when we livepatch a function that calls
another function that calls klp_update_patch_state(). But in this case
it would be kthread() from kernel/kthread.c. It would affect any
running kthread. I doubt that anyone would seriously think about
livepatching this function.

A good enough solution might be to document this. Livepatches could
not be created blindly. There are more situations where the
livepatch is tricky or not possible at all.

Crazy idea. We could prevent this problem even technically. A solution
would be to increment a per-process counter in klp_ftrace_handler() when a
function is redirected(). And klp_update_patch_state() might refuse
the migration when this counter is not zero. But it would require
to use a trampoline on return that would decrement the counter.
I am not sure if this is worth the complexity.

One the other hand, this counter might actually remove the need
of the reliable backtrace. It is possible that I miss something
or that it is not easy/possible to implement the return trampoline.

Back to the original problem. I still consider calling
klp_update_patch_state(current) in vhost_worker() safe.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-25 11:34       ` Petr Mladek
@ 2023-01-25 16:57         ` Seth Forshee
  2023-01-26 11:16           ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Seth Forshee @ 2023-01-25 16:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Wed, Jan 25, 2023 at 12:34:26PM +0100, Petr Mladek wrote:
> On Tue 2023-01-24 11:21:39, Seth Forshee wrote:
> > On Tue, Jan 24, 2023 at 03:17:43PM +0100, Petr Mladek wrote:
> > > On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> > > > Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> > > > so a busy kthread can block a livepatch transition indefinitely. We've
> > > > seen this happen fairly often with busy vhost kthreads.
> > > 
> > > To be precise, it would be "indefinitely" only when the kthread never
> > > sleeps.
> > > 
> > > But yes. I believe that the problem is real. It might be almost
> > > impossible to livepatch some busy kthreads, especially when they
> > > have a dedicated CPU.
> > > 
> > > 
> > > > Add a check to call klp_switch_current() from vhost_worker() when a
> > > > livepatch is pending. In testing this allowed vhost kthreads to switch
> > > > immediately when they had previously blocked livepatch transitions for
> > > > long periods of time.
> > > > 
> > > > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
> > > > ---
> > > >  drivers/vhost/vhost.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index cbe72bfd2f1f..d8624f1f2d64 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
> > > >  			if (need_resched())
> > > >  				schedule();
> > > >  		}
> > > > +
> > > > +		if (unlikely(klp_patch_pending(current)))
> > > > +			klp_switch_current();
> > > 
> > > I suggest to use the following intead:
> > > 
> > > 		if (unlikely(klp_patch_pending(current)))
> > > 			klp_update_patch_state(current);
> > > 
> > > We already use this in do_idle(). The reason is basically the same.
> > > It is almost impossible to livepatch the idle task when a CPU is
> > > very idle.
> > > 
> > > klp_update_patch_state(current) does not check the stack.
> > > It switches the task immediately.
> > > 
> > > It should be safe because the kthread never leaves vhost_worker().
> > > It means that the same kthread could never re-enter this function
> > > and use the new code.
> > 
> > My knowledge of livepatching internals is fairly limited, so I'll accept
> > it if you say that it's safe to do it this way. But let me ask about one
> > scenario.
> > 
> > Let's say that a livepatch is loaded which replaces vhost_worker(). New
> > vhost worker threads are started which use the replacement function. Now
> > if the patch is disabled, these new worker threads would be switched
> > despite still running the code from the patch module, correct? Could the
> > module then be unloaded, freeing the memory containing the code these
> > kthreads are executing?
> 
> Great catch! Yes, this might theoretically happen.
> 
> The above scenario would require calling klp_update_patch_state() from
> the code in the livepatch module. It is not possible at the moment because
> this function is not exported for modules.

vhost can be built as a module, so in order to call
klp_update_patch_state() from vhost_worker() it would have to be
exported to modules.

> Hmm, the same problem might be when we livepatch a function that calls
> another function that calls klp_update_patch_state(). But in this case
> it would be kthread() from kernel/kthread.c. It would affect any
> running kthread. I doubt that anyone would seriously think about
> livepatching this function.

Yes, there are clearly certain functions that are not safe/practical to
patch, and authors need to know what they are doing. Most kthread main()
functions probably qualify as impractical at best, at least without a
strategy to restart relevant kthreads.

But a livepatch transition will normally stall if patching these
functions when a relevant kthread is running (unless the patch is
forced), so a patch author who made a mistake should quickly notice.
vhost_worker() would behave differently.

> A good enough solution might be to document this. Livepatches could
> not be created blindly. There are more situations where the
> livepatch is tricky or not possible at all.

I can add this if you like. Is Documentation/livepatch/livepatch.rst the
right place for this?

> Crazy idea. We could prevent this problem even technically. A solution
> would be to increment a per-process counter in klp_ftrace_handler() when a
> function is redirected(). And klp_update_patch_state() might refuse
> the migration when this counter is not zero. But it would require
> to use a trampoline on return that would decrement the counter.
> I am not sure if this is worth the complexity.
> 
> One the other hand, this counter might actually remove the need
> of the reliable backtrace. It is possible that I miss something
> or that it is not easy/possible to implement the return trampoline.

I agree this should work for unpatching, and even for patching a
function which is already patched.

Maybe I'm misunderstanding, but this would only work for unpatching or
patching an already-patched function, wouldn't it? Because the original
functions would not increment the counter so you would not know if tasks
still had those on their call stacks.

> Back to the original problem. I still consider calling
> klp_update_patch_state(current) in vhost_worker() safe.

Okay, I can send a v2 which does this, so long as it's okay to export
klp_update_patch_state() to modules.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-25 16:57         ` Seth Forshee
@ 2023-01-26 11:16           ` Petr Mladek
  2023-01-26 11:49             ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-26 11:16 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Wed 2023-01-25 10:57:30, Seth Forshee wrote:
> On Wed, Jan 25, 2023 at 12:34:26PM +0100, Petr Mladek wrote:
> > On Tue 2023-01-24 11:21:39, Seth Forshee wrote:
> > > On Tue, Jan 24, 2023 at 03:17:43PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> > > > > Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> > > > > so a busy kthread can block a livepatch transition indefinitely. We've
> > > > > seen this happen fairly often with busy vhost kthreads.
> > > > 
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
> > > > >  			if (need_resched())
> > > > >  				schedule();
> > > > >  		}
> > > > > +
> > > > > +		if (unlikely(klp_patch_pending(current)))
> > > > > +			klp_switch_current();
> > > > 
> > > > I suggest to use the following intead:
> > > > 
> > > > 		if (unlikely(klp_patch_pending(current)))
> > > > 			klp_update_patch_state(current);
> > > > 
> > > > We already use this in do_idle(). The reason is basically the same.
> > > > It is almost impossible to livepatch the idle task when a CPU is
> > > > very idle.
> > > > 
> > > > klp_update_patch_state(current) does not check the stack.
> > > > It switches the task immediately.
> > > > 
> > > > It should be safe because the kthread never leaves vhost_worker().
> > > > It means that the same kthread could never re-enter this function
> > > > and use the new code.
> > > 
> > > My knowledge of livepatching internals is fairly limited, so I'll accept
> > > it if you say that it's safe to do it this way. But let me ask about one
> > > scenario.
> > > 
> > > Let's say that a livepatch is loaded which replaces vhost_worker(). New
> > > vhost worker threads are started which use the replacement function. Now
> > > if the patch is disabled, these new worker threads would be switched
> > > despite still running the code from the patch module, correct? Could the
> > > module then be unloaded, freeing the memory containing the code these
> > > kthreads are executing?
> > 
> > The above scenario would require calling klp_update_patch_state() from
> > the code in the livepatch module. It is not possible at the moment because
> > this function is not exported for modules.
> 
> vhost can be built as a module, so in order to call
> klp_update_patch_state() from vhost_worker() it would have to be
> exported to modules.

I see.

> > Hmm, the same problem might be when we livepatch a function that calls
> > another function that calls klp_update_patch_state(). But in this case
> > it would be kthread() from kernel/kthread.c. It would affect any
> > running kthread. I doubt that anyone would seriously think about
> > livepatching this function.
> 
> Yes, there are clearly certain functions that are not safe/practical to
> patch, and authors need to know what they are doing. Most kthread main()
> functions probably qualify as impractical at best, at least without a
> strategy to restart relevant kthreads.
> 
> But a livepatch transition will normally stall if patching these
> functions when a relevant kthread is running (unless the patch is
> forced), so a patch author who made a mistake should quickly notice.
> vhost_worker() would behave differently.

Another crazy idea:

/**
 * klp_update_patch_state_safe() - do not update the path state when
 *	called from a livepatch.
 * @task: task_struct to be updated
 * @calller_addr: address of the function which  calls this one
 *
 * Do not update the patch set when called from a livepatch.
 * It would allow to remove the livepatch module even when
 * the code still might be in use.
 */
void klp_update_patch_state_safe(struct task_struct *task, void *caller_addr)
{
	static bool checked;
	static bool safe;

	if (unlikely(!checked)) {
		struct module *mod;

		preempt_disable();
		mod = __module_address(caller_addr);
		if (!mod || !is_livepatch_module(mod))
			safe = true;
		checked = true;
		preempt_enable();
	}

	if (safe)
		klp_update_patch_state(task);
}

and use in vhost_worker()

		if (unlikely(klp_patch_pending(current)))
			klp_update_patch_state_safe(current, vhost_worker);

Even better might be to get the caller address using some compiler
macro. I guess that it should be possible.

And even better would be to detect this at the compile time. But
I do not know how to do so.

> > A good enough solution might be to document this. Livepatches could
> > not be created blindly. There are more situations where the
> > livepatch is tricky or not possible at all.
> 
> I can add this if you like. Is Documentation/livepatch/livepatch.rst the
> right place for this?

Yes, the best place probably would be "7. Limitations" section in
Documentation/livepatch/livepatch.rst.

Even better would be to add a document about the best practices.
We have dreamed about it for years ;-)

> > Crazy idea. We could prevent this problem even technically. A solution
> > would be to increment a per-process counter in klp_ftrace_handler() when a
> > function is redirected(). And klp_update_patch_state() might refuse
> > the migration when this counter is not zero. But it would require
> > to use a trampoline on return that would decrement the counter.
> > I am not sure if this is worth the complexity.
> > 
> > One the other hand, this counter might actually remove the need
> > of the reliable backtrace. It is possible that I miss something
> > or that it is not easy/possible to implement the return trampoline.
> 
> I agree this should work for unpatching, and even for patching a
> function which is already patched.
> 
> Maybe I'm misunderstanding, but this would only work for unpatching or
> patching an already-patched function, wouldn't it? Because the original
> functions would not increment the counter so you would not know if tasks
> still had those on their call stacks.

Right. I knew that it could not be that easy. Otherwise, we would have
used it. I just did not spent enough cycles on the idea yesterday.

> > Back to the original problem. I still consider calling
> > klp_update_patch_state(current) in vhost_worker() safe.
> 
> Okay, I can send a v2 which does this, so long as it's okay to export
> klp_update_patch_state() to modules.

It would be acceptable for me if we added a warning above the function
definition and into the livepatch documentation.

But I would prefer klp_update_patch_state_safe() if it worked. It is
possible that I have missed something.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads
  2023-01-26 11:16           ` Petr Mladek
@ 2023-01-26 11:49             ` Petr Mladek
  0 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2023-01-26 11:49 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Thu 2023-01-26 12:16:36, Petr Mladek wrote:
> On Wed 2023-01-25 10:57:30, Seth Forshee wrote:
> > On Wed, Jan 25, 2023 at 12:34:26PM +0100, Petr Mladek wrote:
> > > On Tue 2023-01-24 11:21:39, Seth Forshee wrote:
> > > > On Tue, Jan 24, 2023 at 03:17:43PM +0100, Petr Mladek wrote:
> > > > > On Fri 2023-01-20 16:12:22, Seth Forshee (DigitalOcean) wrote:
> > > > > > Livepatch relies on stack checking of sleeping tasks to switch kthreads,
> > > > > > so a busy kthread can block a livepatch transition indefinitely. We've
> > > > > > seen this happen fairly often with busy vhost kthreads.
> > > > > 
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -366,6 +367,9 @@ static int vhost_worker(void *data)
> > > > > >  			if (need_resched())
> > > > > >  				schedule();
> > > > > >  		}
> > > > > > +
> > > > > > +		if (unlikely(klp_patch_pending(current)))
> > > > > > +			klp_switch_current();
> > > > > 
> > > > > I suggest to use the following intead:
> > > > > 
> > > > > 		if (unlikely(klp_patch_pending(current)))
> > > > > 			klp_update_patch_state(current);
> > > > > 
> > > > > We already use this in do_idle(). The reason is basically the same.
> > > > > It is almost impossible to livepatch the idle task when a CPU is
> > > > > very idle.
> > > > > 
> > > > Let's say that a livepatch is loaded which replaces vhost_worker(). New
> > > > vhost worker threads are started which use the replacement function. Now
> > > > if the patch is disabled, these new worker threads would be switched
> > > > despite still running the code from the patch module, correct? Could the
> > > > module then be unloaded, freeing the memory containing the code these
> > > > kthreads are executing?
> > > 
> > > Hmm, the same problem might be when we livepatch a function that calls
> > > another function that calls klp_update_patch_state(). But in this case
> > > it would be kthread() from kernel/kthread.c. It would affect any
> > > running kthread. I doubt that anyone would seriously think about
> > > livepatching this function.

And I missed something. klp_update_patch_state_safe(), proposed below,
would not cover the above scenario.

It might be possible to add something similar to kthread()
function. I think that it is the only "livepatchable" function
that might call vhost_worker(). We could block
klp_update_patch_state() for the entire kthread when the kthread()
function is called from a livepatch.

Well, it is all just the best effort. The reference counting in
the ftrace handler would be more reliable. But it would require
adding the trampoline on the return.

> /**
>  * klp_update_patch_state_safe() - do not update the path state when
>  *	called from a livepatch.
>  * @task: task_struct to be updated
>  * @calller_addr: address of the function which  calls this one
>  *
>  * Do not update the patch set when called from a livepatch.
>  * It would allow to remove the livepatch module even when
>  * the code still might be in use.
>  */
> void klp_update_patch_state_safe(struct task_struct *task, void *caller_addr)
> {
> 	static bool checked;
> 	static bool safe;
> 
> 	if (unlikely(!checked)) {
> 		struct module *mod;
> 
> 		preempt_disable();
> 		mod = __module_address(caller_addr);
> 		if (!mod || !is_livepatch_module(mod))
> 			safe = true;
> 		checked = true;
> 		preempt_enable();
> 	}
> 
> 	if (safe)
> 		klp_update_patch_state(task);
> }
> 
> and use in vhost_worker()
> 
> 		if (unlikely(klp_patch_pending(current)))
> 			klp_update_patch_state_safe(current, vhost_worker);
> 
> Even better might be to get the caller address using some compiler
> macro. I guess that it should be possible.
> 
> And even better would be to detect this at the compile time. But
> I do not know how to do so.
> 
> > Okay, I can send a v2 which does this, so long as it's okay to export
> > klp_update_patch_state() to modules.
> 
> It would be acceptable for me if we added a warning above the function
> definition and into the livepatch documentation.

I would probably go this way after all. Still thinking...

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-20 22:12 [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads Seth Forshee (DigitalOcean)
                   ` (2 preceding siblings ...)
  2023-01-22  8:34 ` [PATCH 0/2] vhost: improve livepatch switching for heavily loaded " Michael S. Tsirkin
@ 2023-01-26 17:03 ` Petr Mladek
  2023-01-26 21:12   ` Seth Forshee (DigitalOcean)
  3 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-26 17:03 UTC (permalink / raw)
  To: Seth Forshee (DigitalOcean)
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm,
	Seth Forshee (DigitalOcean),
	netdev, live-patching, linux-kernel

On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> We've fairly regularaly seen liveptches which cannot transition within kpatch's
> timeout period due to busy vhost worker kthreads.

I have missed this detail. Miroslav told me that we have solved
something similar some time ago, see
https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/

Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
that the transition is tried only once per minute. It means that there
are "only" 60 attempts.

Just by chance, does it help you to increase the timeout, please?

This low timeout might be useful for testing. But in practice, it does
not matter when the transition is lasting one hour or even longer.
It takes much longer time to prepare the livepatch.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-26 17:03 ` Petr Mladek
@ 2023-01-26 21:12   ` Seth Forshee (DigitalOcean)
  2023-01-27  4:43     ` Josh Poimboeuf
  2023-01-27 11:19     ` Petr Mladek
  0 siblings, 2 replies; 36+ messages in thread
From: Seth Forshee (DigitalOcean) @ 2023-01-26 21:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > timeout period due to busy vhost worker kthreads.
> 
> I have missed this detail. Miroslav told me that we have solved
> something similar some time ago, see
> https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/

Interesting thread. I had thought about something along the lines of the
original patch, but there are some ideas in there that I hadn't
considered.

> Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> that the transition is tried only once per minute. It means that there
> are "only" 60 attempts.
> 
> Just by chance, does it help you to increase the timeout, please?

To be honest my test setup reproduces the problem well enough to make
KLP wait significant time due to vhost threads, but it seldom causes it
to hit kpatch's timeout.

Our system management software will try to load a patch tens of times in
a day, and we've seen real-world cases where patches couldn't load
within kpatch's timeout for multiple days. But I don't have such an
environment readily accessible for my own testing. I can try to refine
my test case and see if I can get it to that point.

> This low timeout might be useful for testing. But in practice, it does
> not matter when the transition is lasting one hour or even longer.
> It takes much longer time to prepare the livepatch.

Agreed. And to be clear, we cope with the fact that patches may take
hours or even days to get applied in some cases. The patches I sent are
just about improving the only case I've identified which has lead to
kpatch failing to load a patch for a day or longer.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-26 21:12   ` Seth Forshee (DigitalOcean)
@ 2023-01-27  4:43     ` Josh Poimboeuf
  2023-01-27 10:37       ` Peter Zijlstra
  2023-01-27 11:19     ` Petr Mladek
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-27  4:43 UTC (permalink / raw)
  To: Seth Forshee (DigitalOcean)
  Cc: Petr Mladek, Jason Wang, Michael S. Tsirkin, Jiri Kosina,
	Miroslav Benes, Joe Lawrence, virtualization, kvm, netdev,
	live-patching, linux-kernel, Peter Zijlstra

On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > timeout period due to busy vhost worker kthreads.
> > 
> > I have missed this detail. Miroslav told me that we have solved
> > something similar some time ago, see
> > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> 
> Interesting thread. I had thought about something along the lines of the
> original patch, but there are some ideas in there that I hadn't
> considered.

Here's another idea, have we considered this?  Have livepatch set
TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
set.

Not sure how scheduler folks would feel about that ;-)

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27  4:43     ` Josh Poimboeuf
@ 2023-01-27 10:37       ` Peter Zijlstra
  2023-01-27 12:09         ` Petr Mladek
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-01-27 10:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Seth Forshee (DigitalOcean),
	Petr Mladek, Jason Wang, Michael S. Tsirkin, Jiri Kosina,
	Miroslav Benes, Joe Lawrence, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > timeout period due to busy vhost worker kthreads.
> > > 
> > > I have missed this detail. Miroslav told me that we have solved
> > > something similar some time ago, see
> > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> > 
> > Interesting thread. I had thought about something along the lines of the
> > original patch, but there are some ideas in there that I hadn't
> > considered.
> 
> Here's another idea, have we considered this?  Have livepatch set
> TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> set.
> 
> Not sure how scheduler folks would feel about that ;-)

So, let me try and page all that back in.... :-)

KLP needs to unwind the stack to see if any of the patched functions are
active, if not, flip task to new set.

Unwinding the stack of a task can be done when:

 - task is inactive (stable reg and stack) -- provided it stays inactive
   while unwinding etc..

 - task is current (guarantees stack doesn't dip below where we started
   due to being busy on top etc..)

Can NOT be done from interrupt context, because can hit in the middle of
setting up stack frames etc..

The issue at hand is that some tasks run for a long time without passing
through an explicit check.

The thread above tried sticking something in cond_resched() which is a
problem for PREEMPT=y since cond_resched() is a no-op.

Preempt notifiers were raised, and those would actually be nice, except
you can only install a notifier on current and you need some memory
allocated per task, which makes it less than ideal. Plus ...

... putting something in finish_task_switch() wouldn't be the end of the
world I suppose, but then you still need to force schedule the task --
imagine it being the only runnable task on the CPU, there's nothing
going to make it actually switch.

Which then leads me to suggest something daft like this.. does that
help?


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..06746095a724 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+static int __stop_try_switch(void *arg)
+{
+	return klp_try_switch_task(arg) ? 0 : -EBUSY;
+}
+
+static bool klp_try_switch_task_harder(struct task_struct *task)
+{
+	return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
+}
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
  * Kthreads with TIF_PATCH_PENDING set are woken up.


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-26 21:12   ` Seth Forshee (DigitalOcean)
  2023-01-27  4:43     ` Josh Poimboeuf
@ 2023-01-27 11:19     ` Petr Mladek
  2023-01-27 14:57       ` Seth Forshee
  1 sibling, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-27 11:19 UTC (permalink / raw)
  To: Seth Forshee (DigitalOcean)
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote:
> On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > timeout period due to busy vhost worker kthreads.
> > 
> > I have missed this detail. Miroslav told me that we have solved
> > something similar some time ago, see
> > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> 
> Interesting thread. I had thought about something along the lines of the
> original patch, but there are some ideas in there that I hadn't
> considered.

Could you please provide some more details about the test system?
Is there anything important to make it reproducible?

The following aspects come to my mind. It might require:

   + more workers running on the same system
   + have a dedicated CPU for the worker
   + livepatching the function called by work->fn()
   + running the same work again and again
   + huge and overloaded system


> > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > that the transition is tried only once per minute. It means that there
> > are "only" 60 attempts.
> > 
> > Just by chance, does it help you to increase the timeout, please?
> 
> To be honest my test setup reproduces the problem well enough to make
> KLP wait significant time due to vhost threads, but it seldom causes it
> to hit kpatch's timeout.
> 
> Our system management software will try to load a patch tens of times in
> a day, and we've seen real-world cases where patches couldn't load
> within kpatch's timeout for multiple days. But I don't have such an
> environment readily accessible for my own testing. I can try to refine
> my test case and see if I can get it to that point.

My understanding is that you try to load the patch repeatedly but
it always fails after the 1 minute timeout. It means that it always
starts from the beginning (no livepatched process).

Is there any chance to try it with a longer timeout, for example, one
hour? It should increase the chance if there are more problematic kthreads.

> > This low timeout might be useful for testing. But in practice, it does
> > not matter when the transition is lasting one hour or even longer.
> > It takes much longer time to prepare the livepatch.
> 
> Agreed. And to be clear, we cope with the fact that patches may take
> hours or even days to get applied in some cases. The patches I sent are
> just about improving the only case I've identified which has lead to
> kpatch failing to load a patch for a day or longer.

If it is acceptable to wait hours or even days then the 1 minute
timeout is quite contra-productive. We actually do not use any timeout
at all in livepatches provided by SUSE.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 10:37       ` Peter Zijlstra
@ 2023-01-27 12:09         ` Petr Mladek
  2023-01-27 14:37           ` Seth Forshee
  2023-01-27 16:52         ` Josh Poimboeuf
  2023-01-27 20:02         ` Seth Forshee
  2 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2023-01-27 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Seth Forshee (DigitalOcean),
	Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, virtualization, kvm, netdev, live-patching,
	linux-kernel

On Fri 2023-01-27 11:37:02, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > > timeout period due to busy vhost worker kthreads.
> > > > 
> > > > I have missed this detail. Miroslav told me that we have solved
> > > > something similar some time ago, see
> > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> > > 
> > > Interesting thread. I had thought about something along the lines of the
> > > original patch, but there are some ideas in there that I hadn't
> > > considered.
> > 
> > Here's another idea, have we considered this?  Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> > 
> > Not sure how scheduler folks would feel about that ;-)
> 
> So, let me try and page all that back in.... :-)
> 
> KLP needs to unwind the stack to see if any of the patched functions are
> active, if not, flip task to new set.
> 
> Unwinding the stack of a task can be done when:
> 
>  - task is inactive (stable reg and stack) -- provided it stays inactive
>    while unwinding etc..
> 
>  - task is current (guarantees stack doesn't dip below where we started
>    due to being busy on top etc..)
> 
> Can NOT be done from interrupt context, because can hit in the middle of
> setting up stack frames etc..

All the above seems correct.

> The issue at hand is that some tasks run for a long time without passing
> through an explicit check.

There might actually be two possibilities why the transition fails
too often:

1. The task might be in the running state most of the time. Therefore
   the backtrace is not reliable most of the time.

   In this case, some cooperation with the scheduler would really
   help. We would need to stop the task and check the stack
   when it is stopped. Something like the patch you proposed.


2. The task might be sleeping but almost always in a livepatched
   function. Therefore it could not be transitioned.

   It might be the case with vhost_worker(). The main loop is "tiny".
   The kthread probaly spends most of the time with processing
   a vhost_work. And if the "works" are livepatched...

   In this case, it would help to call klp_try_switch_task(current)
   in the main loop in vhost_worker(). It would always succeed
   when vhost_worker() is not livepatched on its own.

   Note that even this would not help with kPatch when a single
   vhost_work might need more than the 1 minute timout to get proceed.

> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index f1b25ec581e0..06746095a724 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/cpu.h>
>  #include <linux/stacktrace.h>
> +#include <linux/stop_machine.h>
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	return !ret;
>  }
>  
> +static int __stop_try_switch(void *arg)
> +{
> +	return klp_try_switch_task(arg) ? 0 : -EBUSY;
> +}
> +
> +static bool klp_try_switch_task_harder(struct task_struct *task)
> +{
> +	return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
> +}
> +
>  /*
>   * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
>   * Kthreads with TIF_PATCH_PENDING set are woken up.

Nice. I am surprised that it can be implemented so easily.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 12:09         ` Petr Mladek
@ 2023-01-27 14:37           ` Seth Forshee
  0 siblings, 0 replies; 36+ messages in thread
From: Seth Forshee @ 2023-01-27 14:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Peter Zijlstra, Josh Poimboeuf, Jason Wang, Michael S. Tsirkin,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, virtualization, kvm,
	netdev, live-patching, linux-kernel

On Fri, Jan 27, 2023 at 01:09:02PM +0100, Petr Mladek wrote:
> There might actually be two possibilities why the transition fails
> too often:
> 
> 1. The task might be in the running state most of the time. Therefore
>    the backtrace is not reliable most of the time.
> 
>    In this case, some cooperation with the scheduler would really
>    help. We would need to stop the task and check the stack
>    when it is stopped. Something like the patch you proposed.

This is the situation we are encountering.

> 2. The task might be sleeping but almost always in a livepatched
>    function. Therefore it could not be transitioned.
> 
>    It might be the case with vhost_worker(). The main loop is "tiny".
>    The kthread probaly spends most of the time with processing
>    a vhost_work. And if the "works" are livepatched...
> 
>    In this case, it would help to call klp_try_switch_task(current)
>    in the main loop in vhost_worker(). It would always succeed
>    when vhost_worker() is not livepatched on its own.
> 
>    Note that even this would not help with kPatch when a single
>    vhost_work might need more than the 1 minute timout to get proceed.
> 
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index f1b25ec581e0..06746095a724 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -9,6 +9,7 @@
> >  
> >  #include <linux/cpu.h>
> >  #include <linux/stacktrace.h>
> > +#include <linux/stop_machine.h>
> >  #include "core.h"
> >  #include "patch.h"
> >  #include "transition.h"
> > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
> >  	return !ret;
> >  }
> >  
> > +static int __stop_try_switch(void *arg)
> > +{
> > +	return klp_try_switch_task(arg) ? 0 : -EBUSY;
> > +}
> > +
> > +static bool klp_try_switch_task_harder(struct task_struct *task)
> > +{
> > +	return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
> > +}
> > +
> >  /*
> >   * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> >   * Kthreads with TIF_PATCH_PENDING set are woken up.
> 
> Nice. I am surprised that it can be implemented so easily.

Yes, that's a neat solution. I will give it a try.

AIUI this still doesn't help for architectures without a reliable
stacktrace though, right? So we probably should only try this for
architectures which do have relaible stacktraces.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 11:19     ` Petr Mladek
@ 2023-01-27 14:57       ` Seth Forshee
  2023-01-30  9:55         ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Seth Forshee @ 2023-01-27 14:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote:
> On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote:
> > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > timeout period due to busy vhost worker kthreads.
> > > 
> > > I have missed this detail. Miroslav told me that we have solved
> > > something similar some time ago, see
> > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> > 
> > Interesting thread. I had thought about something along the lines of the
> > original patch, but there are some ideas in there that I hadn't
> > considered.
> 
> Could you please provide some more details about the test system?
> Is there anything important to make it reproducible?
> 
> The following aspects come to my mind. It might require:
> 
>    + more workers running on the same system
>    + have a dedicated CPU for the worker
>    + livepatching the function called by work->fn()
>    + running the same work again and again
>    + huge and overloaded system

I'm isolating a CPU, starting a KVM guest with a virtio-net device, and
setting the affinity of the vhost worker thread to only the isolated
CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll
note that in real-world cases the systems have many CPUs, and while the
vhost threads aren't each given a dedicated CPU, if the system load is
light enough a thread can end up with exlusive use of a CPU).

Then all I do is run iperf between the guest and the host with several
parallel streams. I seem to be hitting the limits of the guest vCPUs
before the vhost thread is fully saturated, as this gets it to about 90%
CPU utilization by the vhost thread.

> > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > > that the transition is tried only once per minute. It means that there
> > > are "only" 60 attempts.
> > > 
> > > Just by chance, does it help you to increase the timeout, please?
> > 
> > To be honest my test setup reproduces the problem well enough to make
> > KLP wait significant time due to vhost threads, but it seldom causes it
> > to hit kpatch's timeout.
> > 
> > Our system management software will try to load a patch tens of times in
> > a day, and we've seen real-world cases where patches couldn't load
> > within kpatch's timeout for multiple days. But I don't have such an
> > environment readily accessible for my own testing. I can try to refine
> > my test case and see if I can get it to that point.
> 
> My understanding is that you try to load the patch repeatedly but
> it always fails after the 1 minute timeout. It means that it always
> starts from the beginning (no livepatched process).
> 
> Is there any chance to try it with a longer timeout, for example, one
> hour? It should increase the chance if there are more problematic kthreads.

Yes, I can try it. But I think I already mentioned that we are somewhat
limited by our system management software and how livepatch loading is
currently implemented there. I'd need to consult with others about how
long we could make the timeout, but 1 hour is definitely too long under
our current system.

> > > This low timeout might be useful for testing. But in practice, it does
> > > not matter when the transition is lasting one hour or even longer.
> > > It takes much longer time to prepare the livepatch.
> > 
> > Agreed. And to be clear, we cope with the fact that patches may take
> > hours or even days to get applied in some cases. The patches I sent are
> > just about improving the only case I've identified which has lead to
> > kpatch failing to load a patch for a day or longer.
> 
> If it is acceptable to wait hours or even days then the 1 minute
> timeout is quite contra-productive. We actually do not use any timeout
> at all in livepatches provided by SUSE.

I agree, though I'd still prefer it didn't take days. Based on this
discussion I do plan to look at changing how we load livepatches to make
this possible, but it will take some time.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 10:37       ` Peter Zijlstra
  2023-01-27 12:09         ` Petr Mladek
@ 2023-01-27 16:52         ` Josh Poimboeuf
  2023-01-27 17:09           ` Josh Poimboeuf
  2023-01-27 20:02         ` Seth Forshee
  2 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-27 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > Here's another idea, have we considered this?  Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> > 
> > Not sure how scheduler folks would feel about that ;-)

Hmmmm, with preemption I guess the above doesn't work for kthreads
calling cond_resched() instead of what vhost_worker() does (explicit
need_resched/schedule).

> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index f1b25ec581e0..06746095a724 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/cpu.h>
>  #include <linux/stacktrace.h>
> +#include <linux/stop_machine.h>
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	return !ret;
>  }
>  
> +static int __stop_try_switch(void *arg)
> +{
> +	return klp_try_switch_task(arg) ? 0 : -EBUSY;
> +}
> +
> +static bool klp_try_switch_task_harder(struct task_struct *task)
> +{
> +	return !stop_one_cpu(task_cpu(task), __stop_try_switch, task);
> +}
> +
>  /*
>   * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
>   * Kthreads with TIF_PATCH_PENDING set are woken up.

Doesn't work for PREEMPT+!ORC.  Non-ORC reliable unwinders will detect
preemption on the stack and automatically report unreliable.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 16:52         ` Josh Poimboeuf
@ 2023-01-27 17:09           ` Josh Poimboeuf
  2023-01-27 22:11             ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-27 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Fri, Jan 27, 2023 at 08:52:38AM -0800, Josh Poimboeuf wrote:
> On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > > Here's another idea, have we considered this?  Have livepatch set
> > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > > set.
> > > 
> > > Not sure how scheduler folks would feel about that ;-)
> 
> Hmmmm, with preemption I guess the above doesn't work for kthreads
> calling cond_resched() instead of what vhost_worker() does (explicit
> need_resched/schedule).

Though I guess we could hook into cond_resched() too if we make it a
non-NOP for PREEMPT+LIVEPATCH?

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 10:37       ` Peter Zijlstra
  2023-01-27 12:09         ` Petr Mladek
  2023-01-27 16:52         ` Josh Poimboeuf
@ 2023-01-27 20:02         ` Seth Forshee
  2 siblings, 0 replies; 36+ messages in thread
From: Seth Forshee @ 2023-01-27 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Petr Mladek, Jason Wang, Michael S. Tsirkin,
	Jiri Kosina, Miroslav Benes, Joe Lawrence, virtualization, kvm,
	netdev, live-patching, linux-kernel

On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > > timeout period due to busy vhost worker kthreads.
> > > > 
> > > > I have missed this detail. Miroslav told me that we have solved
> > > > something similar some time ago, see
> > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/
> > > 
> > > Interesting thread. I had thought about something along the lines of the
> > > original patch, but there are some ideas in there that I hadn't
> > > considered.
> > 
> > Here's another idea, have we considered this?  Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> > 
> > Not sure how scheduler folks would feel about that ;-)
> 
> So, let me try and page all that back in.... :-)
> 
> KLP needs to unwind the stack to see if any of the patched functions are
> active, if not, flip task to new set.
> 
> Unwinding the stack of a task can be done when:
> 
>  - task is inactive (stable reg and stack) -- provided it stays inactive
>    while unwinding etc..
> 
>  - task is current (guarantees stack doesn't dip below where we started
>    due to being busy on top etc..)
> 
> Can NOT be done from interrupt context, because can hit in the middle of
> setting up stack frames etc..
> 
> The issue at hand is that some tasks run for a long time without passing
> through an explicit check.
> 
> The thread above tried sticking something in cond_resched() which is a
> problem for PREEMPT=y since cond_resched() is a no-op.
> 
> Preempt notifiers were raised, and those would actually be nice, except
> you can only install a notifier on current and you need some memory
> allocated per task, which makes it less than ideal. Plus ...
> 
> ... putting something in finish_task_switch() wouldn't be the end of the
> world I suppose, but then you still need to force schedule the task --
> imagine it being the only runnable task on the CPU, there's nothing
> going to make it actually switch.
> 
> Which then leads me to suggest something daft like this.. does that
> help?

The changes below are working well for me. The context has a read lock
on tasklist_lock so it can't sleep, so I'm using stop_one_cpu_nowait()
with per-CPU state.

Based on Josh's comments it sounds like the klp_have_reliable_stack()
check probably isn't quite right, and we might want to add something
else for PREEMPT+!ORC. But I wanted to go ahead and see if this approach
seems reasonable to everyone.

Thanks,
Seth


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..9f3898f02828 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -334,9 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+static int __try_switch_kthread(void *arg)
+{
+	return klp_try_switch_task(arg) ? 0 : -EBUSY;
+}
+
+static DEFINE_PER_CPU(struct cpu_stop_work, klp_stop_work);
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
- * Kthreads with TIF_PATCH_PENDING set are woken up.
+ * Kthreads with TIF_PATCH_PENDING set are preempted or woken up.
  */
 static void klp_send_signals(void)
 {
@@ -357,11 +365,22 @@ static void klp_send_signals(void)
 		 * would be meaningless. It is not serious though.
 		 */
 		if (task->flags & PF_KTHREAD) {
-			/*
-			 * Wake up a kthread which sleeps interruptedly and
-			 * still has not been migrated.
-			 */
-			wake_up_state(task, TASK_INTERRUPTIBLE);
+			if (task_curr(task) && klp_have_reliable_stack()) {
+				/*
+				 * kthread is currently running on a CPU; try
+				 * to preempt it.
+				 */
+				stop_one_cpu_nowait(task_cpu(task),
+						    __try_switch_kthread,
+						    task,
+						    this_cpu_ptr(&klp_stop_work));
+			} else {
+				/*
+				 * Wake up a kthread which sleeps interruptedly
+				 * and still has not been migrated.
+				 */
+				wake_up_state(task, TASK_INTERRUPTIBLE);
+			}
 		} else {
 			/*
 			 * Send fake signal to all non-kthread tasks which are

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 17:09           ` Josh Poimboeuf
@ 2023-01-27 22:11             ` Josh Poimboeuf
  2023-01-30 12:40               ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-27 22:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Fri, Jan 27, 2023 at 09:09:48AM -0800, Josh Poimboeuf wrote:
> On Fri, Jan 27, 2023 at 08:52:38AM -0800, Josh Poimboeuf wrote:
> > On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > > > Here's another idea, have we considered this?  Have livepatch set
> > > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > > > set.
> > > > 
> > > > Not sure how scheduler folks would feel about that ;-)
> > 
> > Hmmmm, with preemption I guess the above doesn't work for kthreads
> > calling cond_resched() instead of what vhost_worker() does (explicit
> > need_resched/schedule).
> 
> Though I guess we could hook into cond_resched() too if we make it a
> non-NOP for PREEMPT+LIVEPATCH?

I discussed this idea with Peter on IRC and he didn't immediately shoot
it down.

It compiles...

Thoughts?

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 293e29960c6e..937816d0867c 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -14,6 +14,8 @@
 #include <linux/completion.h>
 #include <linux/list.h>
 
+#include <linux/livepatch_sched.h>
+
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
 /* task patch states */
diff --git a/include/linux/livepatch_sched.h b/include/linux/livepatch_sched.h
new file mode 100644
index 000000000000..3237bc6a5b01
--- /dev/null
+++ b/include/linux/livepatch_sched.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_LIVEPATCH_SCHED_H_
+#define _LINUX_LIVEPATCH_SCHED_H_
+
+#include <linux/static_call_types.h>
+
+#ifdef CONFIG_LIVEPATCH
+
+void __klp_sched_try_switch(void);
+DECLARE_STATIC_CALL(klp_sched_try_switch, __klp_sched_try_switch);
+
+static __always_inline void klp_sched_try_switch(void)
+{
+	//FIXME need static_call_cond_mod() ?
+	static_call_mod(klp_sched_try_switch)();
+}
+
+#else /* !CONFIG_LIVEPATCH */
+static inline void klp_sched_try_switch(void) {}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _LINUX_LIVEPATCH_SCHED_H_ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4df2b3e76b30..fbcd3acca25c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -36,6 +36,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <linux/rv.h>
+#include <linux/livepatch_sched.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched);
 
 static __always_inline int _cond_resched(void)
 {
+	//FIXME this is a bit redundant with preemption disabled
+	klp_sched_try_switch();
+
 	return static_call_mod(cond_resched)();
 }
 
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..042e34c9389c 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/static_call.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -24,6 +25,9 @@ static int klp_target_state = KLP_UNDEFINED;
 
 static unsigned int klp_signals_cnt;
 
+DEFINE_STATIC_CALL_NULL(klp_sched_try_switch, __klp_sched_try_switch);
+EXPORT_STATIC_CALL_TRAMP(klp_sched_try_switch);
+
 /*
  * This work can be performed periodically to finish patching or unpatching any
  * "straggler" tasks which failed to transition in the first attempt.
@@ -76,6 +80,8 @@ static void klp_complete_transition(void)
 		 klp_transition_patch->mod->name,
 		 klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
 
+	static_call_update(klp_sched_try_switch, NULL);
+
 	if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED) {
 		klp_unpatch_replaced_patches(klp_transition_patch);
 		klp_discard_nops(klp_transition_patch);
@@ -256,7 +262,8 @@ static int klp_check_stack(struct task_struct *task, const char **oldname)
 		klp_for_each_func(obj, func) {
 			ret = klp_check_stack_func(func, entries, nr_entries);
 			if (ret) {
-				*oldname = func->old_name;
+				if (oldname)
+					*oldname = func->old_name;
 				return -EADDRINUSE;
 			}
 		}
@@ -307,7 +314,11 @@ static bool klp_try_switch_task(struct task_struct *task)
 	 * functions.  If all goes well, switch the task to the target patch
 	 * state.
 	 */
-	ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+	if (task == current)
+		ret = klp_check_and_switch_task(current, &old_name);
+	else
+		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+
 	switch (ret) {
 	case 0:		/* success */
 		break;
@@ -334,6 +345,15 @@ static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+void __klp_sched_try_switch(void)
+{
+	if (likely(!klp_patch_pending(current)))
+		return;
+
+	//FIXME locking
+	klp_try_switch_task(current);
+}
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
  * Kthreads with TIF_PATCH_PENDING set are woken up.
@@ -401,8 +421,10 @@ void klp_try_complete_transition(void)
 	 */
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, task)
-		if (!klp_try_switch_task(task))
+		if (!klp_try_switch_task(task)) {
+			set_tsk_need_resched(task);
 			complete = false;
+		}
 	read_unlock(&tasklist_lock);
 
 	/*
@@ -413,6 +435,7 @@ void klp_try_complete_transition(void)
 		task = idle_task(cpu);
 		if (cpu_online(cpu)) {
 			if (!klp_try_switch_task(task)) {
+				set_tsk_need_resched(task);
 				complete = false;
 				/* Make idle task go through the main loop. */
 				wake_up_if_idle(cpu);
@@ -492,6 +515,8 @@ void klp_start_transition(void)
 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
 	}
 
+	static_call_update(klp_sched_try_switch, __klp_sched_try_switch);
+
 	klp_signals_cnt = 0;
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3a0ef2fefbd5..01e32d242ef6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	struct rq *rq;
 	int cpu;
 
+	klp_sched_try_switch();
+
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
@@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
 static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
 int __sched dynamic_cond_resched(void)
 {
-	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
+	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
+		klp_sched_try_switch();
 		return 0;
+	}
 	return __cond_resched();
 }
 EXPORT_SYMBOL(dynamic_cond_resched);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index e9ef66be2870..27ba93930584 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -308,9 +308,6 @@ static void do_idle(void)
 	 */
 	flush_smp_call_function_queue();
 	schedule_idle();
-
-	if (unlikely(klp_patch_pending(current)))
-		klp_update_patch_state(current);
 }
 
 bool cpu_in_idle(unsigned long pc)

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 14:57       ` Seth Forshee
@ 2023-01-30  9:55         ` Petr Mladek
  0 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2023-01-30  9:55 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Jason Wang, Michael S. Tsirkin, Jiri Kosina, Miroslav Benes,
	Joe Lawrence, Josh Poimboeuf, virtualization, kvm, netdev,
	live-patching, linux-kernel

On Fri 2023-01-27 08:57:40, Seth Forshee wrote:
> On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote:
> > Could you please provide some more details about the test system?
> > Is there anything important to make it reproducible?
> > 
> > The following aspects come to my mind. It might require:
> > 
> >    + more workers running on the same system
> >    + have a dedicated CPU for the worker
> >    + livepatching the function called by work->fn()
> >    + running the same work again and again
> >    + huge and overloaded system
> 
> I'm isolating a CPU, starting a KVM guest with a virtio-net device, and
> setting the affinity of the vhost worker thread to only the isolated
> CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll
> note that in real-world cases the systems have many CPUs, and while the
> vhost threads aren't each given a dedicated CPU, if the system load is
> light enough a thread can end up with exlusive use of a CPU).
> 
> Then all I do is run iperf between the guest and the host with several
> parallel streams. I seem to be hitting the limits of the guest vCPUs
> before the vhost thread is fully saturated, as this gets it to about 90%
> CPU utilization by the vhost thread.

Thanks for the info!

> > > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > > > that the transition is tried only once per minute. It means that there
> > > > are "only" 60 attempts.
> > > > 
> > > > Just by chance, does it help you to increase the timeout, please?
> > > 
> > > To be honest my test setup reproduces the problem well enough to make
> > > KLP wait significant time due to vhost threads, but it seldom causes it
> > > to hit kpatch's timeout.
> > > 
> > > Our system management software will try to load a patch tens of times in
> > > a day, and we've seen real-world cases where patches couldn't load
> > > within kpatch's timeout for multiple days. But I don't have such an
> > > environment readily accessible for my own testing. I can try to refine
> > > my test case and see if I can get it to that point.
> > 
> > My understanding is that you try to load the patch repeatedly but
> > it always fails after the 1 minute timeout. It means that it always
> > starts from the beginning (no livepatched process).
> > 
> > Is there any chance to try it with a longer timeout, for example, one
> > hour? It should increase the chance if there are more problematic kthreads.
> 
> Yes, I can try it. But I think I already mentioned that we are somewhat
> limited by our system management software and how livepatch loading is
> currently implemented there. I'd need to consult with others about how
> long we could make the timeout, but 1 hour is definitely too long under
> our current system.

Another possibility is to do not wait at all. SUSE livepatch packages load
the livepatch module, remove not longer used livepatch modules and are
done with it.

Note that the module is loaded quickly. The transition is finished
asynchronously using workqueues.

Of course, there is a risk that the transition will never finish.
It would prevent loading any newer livepatch. But it might be handled
when the the newer livepatch is loaded. It might revert the pending
transition, ...

Of course, it would be great to make the transition more reliable.
It would be nice to add the hook into the scheduler as discussed
in another branch of this thread. But it might bring another problems,
for example, affect the system performance. Well, it probably can
be optimized or ratelimited.

Anyway, I wanted to say that there is a way to get rid of the timeout
completely.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-27 22:11             ` Josh Poimboeuf
@ 2023-01-30 12:40               ` Peter Zijlstra
  2023-01-30 17:50                 ` Seth Forshee
                                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-01-30 12:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:


> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4df2b3e76b30..fbcd3acca25c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -36,6 +36,7 @@
>  #include <linux/seqlock.h>
>  #include <linux/kcsan.h>
>  #include <linux/rv.h>
> +#include <linux/livepatch_sched.h>
>  #include <asm/kmap_size.h>
>  
>  /* task_struct member predeclarations (sorted alphabetically): */
> @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched);
>  
>  static __always_inline int _cond_resched(void)
>  {
> +	//FIXME this is a bit redundant with preemption disabled
> +	klp_sched_try_switch();
> +
>  	return static_call_mod(cond_resched)();
>  }

Right, I was thinking you'd do something like:

	static_call_update(cond_resched, klp_cond_resched);

With:

static int klp_cond_resched(void)
{
	klp_try_switch_task(current);
	return __cond_resched();
}

That would force cond_resched() into doing the transition thing,
irrespective of the preemption mode at hand. And then, when KLP be done,
re-run sched_dynamic_update() to reset it to whatever it ought to be.

> @@ -401,8 +421,10 @@ void klp_try_complete_transition(void)
>  	 */
>  	read_lock(&tasklist_lock);
>  	for_each_process_thread(g, task)
> -		if (!klp_try_switch_task(task))
> +		if (!klp_try_switch_task(task)) {
> +			set_tsk_need_resched(task);
>  			complete = false;
> +		}

Yeah, no, that's broken -- preemption state live in more than just the
TIF bit.

>  	read_unlock(&tasklist_lock);
>  
>  	/*
> @@ -413,6 +435,7 @@ void klp_try_complete_transition(void)
>  		task = idle_task(cpu);
>  		if (cpu_online(cpu)) {
>  			if (!klp_try_switch_task(task)) {
> +				set_tsk_need_resched(task);
>  				complete = false;
>  				/* Make idle task go through the main loop. */
>  				wake_up_if_idle(cpu);

Idem.

Also, I don't see the point of this and the __schedule() hook here:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3a0ef2fefbd5..01e32d242ef6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int sched_mode)
>  	struct rq *rq;
>  	int cpu;
>  
> +	klp_sched_try_switch();
> +
>  	cpu = smp_processor_id();
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;

If it schedules, you'll get it with the normal switcheroo, because it'll
be inactive some of the time. If it doesn't schedule, it'll run into
cond_resched().

> @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
>  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
>  int __sched dynamic_cond_resched(void)
>  {
> -	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> +	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> +		klp_sched_try_switch();
>  		return 0;
> +	}
>  	return __cond_resched();
>  }
>  EXPORT_SYMBOL(dynamic_cond_resched);

I would make the klp_sched_try_switch() not depend on
sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
pass through __schedule().

But you'll probably want to check with Mark here, this all might
generate crap code on arm64.

Both ways this seems to make KLP 'depend' (or at least work lots better)
when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for
_cond_resched() too?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 12:40               ` Peter Zijlstra
@ 2023-01-30 17:50                 ` Seth Forshee
  2023-01-30 18:18                 ` Josh Poimboeuf
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Seth Forshee @ 2023-01-30 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Petr Mladek, Joe Lawrence, kvm,
	Michael S. Tsirkin, netdev, Jiri Kosina, linux-kernel,
	virtualization, live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> Both ways this seems to make KLP 'depend' (or at least work lots better)
> when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for
> _cond_resched() too?

I would like this, as we have no other need for PREEMPT_DYNAMIC=y and
currently have it disabled.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 12:40               ` Peter Zijlstra
  2023-01-30 17:50                 ` Seth Forshee
@ 2023-01-30 18:18                 ` Josh Poimboeuf
  2023-01-30 18:36                 ` Mark Rutland
  2023-01-30 19:59                 ` Josh Poimboeuf
  3 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-30 18:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> 
> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 4df2b3e76b30..fbcd3acca25c 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -36,6 +36,7 @@
> >  #include <linux/seqlock.h>
> >  #include <linux/kcsan.h>
> >  #include <linux/rv.h>
> > +#include <linux/livepatch_sched.h>
> >  #include <asm/kmap_size.h>
> >  
> >  /* task_struct member predeclarations (sorted alphabetically): */
> > @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> >  
> >  static __always_inline int _cond_resched(void)
> >  {
> > +	//FIXME this is a bit redundant with preemption disabled
> > +	klp_sched_try_switch();
> > +
> >  	return static_call_mod(cond_resched)();
> >  }
> 
> Right, I was thinking you'd do something like:
> 
> 	static_call_update(cond_resched, klp_cond_resched);
> 
> With:
> 
> static int klp_cond_resched(void)
> {
> 	klp_try_switch_task(current);
> 	return __cond_resched();
> }
> 
> That would force cond_resched() into doing the transition thing,
> irrespective of the preemption mode at hand. And then, when KLP be done,
> re-run sched_dynamic_update() to reset it to whatever it ought to be.

Ok, makes sense.

> 
> > @@ -401,8 +421,10 @@ void klp_try_complete_transition(void)
> >  	 */
> >  	read_lock(&tasklist_lock);
> >  	for_each_process_thread(g, task)
> > -		if (!klp_try_switch_task(task))
> > +		if (!klp_try_switch_task(task)) {
> > +			set_tsk_need_resched(task);
> >  			complete = false;
> > +		}
> 
> Yeah, no, that's broken -- preemption state live in more than just the
> TIF bit.

Oops.

> >  	read_unlock(&tasklist_lock);
> >  
> >  	/*
> > @@ -413,6 +435,7 @@ void klp_try_complete_transition(void)
> >  		task = idle_task(cpu);
> >  		if (cpu_online(cpu)) {
> >  			if (!klp_try_switch_task(task)) {
> > +				set_tsk_need_resched(task);
> >  				complete = false;
> >  				/* Make idle task go through the main loop. */
> >  				wake_up_if_idle(cpu);
> 
> Idem.
> 
> Also, I don't see the point of this and the __schedule() hook here:

The (poorly executed) idea was to catch kthreads which do

	if (need_resched())
		schedule();

but I guess we can just replace those usages with cond_resched()?

> > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> >  int __sched dynamic_cond_resched(void)
> >  {
> > -	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > +	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > +		klp_sched_try_switch();
> >  		return 0;
> > +	}
> >  	return __cond_resched();
> >  }
> >  EXPORT_SYMBOL(dynamic_cond_resched);
> 
> I would make the klp_sched_try_switch() not depend on
> sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> pass through __schedule().
> 
> But you'll probably want to check with Mark here, this all might
> generate crap code on arm64.
> 
> Both ways this seems to make KLP 'depend' (or at least work lots better)
> when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for
> _cond_resched() too?

That was the intent but I obviously failed.  Let me go rework it a bit.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 12:40               ` Peter Zijlstra
  2023-01-30 17:50                 ` Seth Forshee
  2023-01-30 18:18                 ` Josh Poimboeuf
@ 2023-01-30 18:36                 ` Mark Rutland
  2023-01-30 19:48                   ` Josh Poimboeuf
  2023-01-30 19:59                 ` Josh Poimboeuf
  3 siblings, 1 reply; 36+ messages in thread
From: Mark Rutland @ 2023-01-30 18:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Petr Mladek, Joe Lawrence, kvm,
	Michael S. Tsirkin, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> >  int __sched dynamic_cond_resched(void)
> >  {
> > -	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > +	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > +		klp_sched_try_switch();
> >  		return 0;
> > +	}
> >  	return __cond_resched();
> >  }
> >  EXPORT_SYMBOL(dynamic_cond_resched);
> 
> I would make the klp_sched_try_switch() not depend on
> sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> pass through __schedule().
> 
> But you'll probably want to check with Mark here, this all might
> generate crap code on arm64.

IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate
at least a load, a conditional branch, and an indirect branch. That's not
ideal, but I'd have to benchmark it to find out whether it's a significant
overhead relative to the baseline of PREEMPT_DYNAMIC.

For arm64 it'd be a bit nicer to have another static key check, and a call to
__klp_sched_try_switch(). That way the static key check gets turned into a NOP
in the common case, and the call to __klp_sched_try_switch() can be a direct
call (potentially a tail-call if we made it return 0).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 18:36                 ` Mark Rutland
@ 2023-01-30 19:48                   ` Josh Poimboeuf
  2023-01-31  1:53                     ` Song Liu
  2023-01-31 10:22                     ` Mark Rutland
  0 siblings, 2 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-30 19:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Peter Zijlstra, Petr Mladek, Joe Lawrence, kvm,
	Michael S. Tsirkin, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote:
> On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> > >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> > >  int __sched dynamic_cond_resched(void)
> > >  {
> > > -	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > > +	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > > +		klp_sched_try_switch();
> > >  		return 0;
> > > +	}
> > >  	return __cond_resched();
> > >  }
> > >  EXPORT_SYMBOL(dynamic_cond_resched);
> > 
> > I would make the klp_sched_try_switch() not depend on
> > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> > pass through __schedule().
> > 
> > But you'll probably want to check with Mark here, this all might
> > generate crap code on arm64.
> 
> IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate
> at least a load, a conditional branch, and an indirect branch. That's not
> ideal, but I'd have to benchmark it to find out whether it's a significant
> overhead relative to the baseline of PREEMPT_DYNAMIC.
> 
> For arm64 it'd be a bit nicer to have another static key check, and a call to
> __klp_sched_try_switch(). That way the static key check gets turned into a NOP
> in the common case, and the call to __klp_sched_try_switch() can be a direct
> call (potentially a tail-call if we made it return 0).

Hm, it might be nice if our out-of-line static call implementation would
automatically do a static key check as part of static_call_cond() for
NULL-type static calls.

But the best answer is probably to just add inline static calls to
arm64.  Is the lack of objtool the only thing blocking that?

Objtool is now modular, so all the controversial CFG reverse engineering
is now optional, so it shouldn't be too hard to just enable objtool for
static call inlines.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 12:40               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2023-01-30 18:36                 ` Mark Rutland
@ 2023-01-30 19:59                 ` Josh Poimboeuf
  2023-01-31 10:02                   ` Peter Zijlstra
  3 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-30 19:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> Right, I was thinking you'd do something like:
> 
> 	static_call_update(cond_resched, klp_cond_resched);
> 
> With:
> 
> static int klp_cond_resched(void)
> {
> 	klp_try_switch_task(current);
> 	return __cond_resched();
> }

Something like this?

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index cbe72bfd2f1f..424c0c939f57 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -363,8 +363,7 @@ static int vhost_worker(void *data)
 			kcov_remote_start_common(dev->kcov_handle);
 			work->fn(work);
 			kcov_remote_stop();
-			if (need_resched())
-				schedule();
+			cond_resched();
 		}
 	}
 	kthread_unuse_mm(dev->mm);
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 293e29960c6e..937816d0867c 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -14,6 +14,8 @@
 #include <linux/completion.h>
 #include <linux/list.h>
 
+#include <linux/livepatch_sched.h>
+
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
 /* task patch states */
diff --git a/include/linux/livepatch_sched.h b/include/linux/livepatch_sched.h
new file mode 100644
index 000000000000..3237bc6a5b01
--- /dev/null
+++ b/include/linux/livepatch_sched.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_LIVEPATCH_SCHED_H_
+#define _LINUX_LIVEPATCH_SCHED_H_
+
+#include <linux/static_call_types.h>
+
+#ifdef CONFIG_LIVEPATCH
+
+void __klp_sched_try_switch(void);
+DECLARE_STATIC_CALL(klp_sched_try_switch, __klp_sched_try_switch);
+
+static __always_inline void klp_sched_try_switch(void)
+{
+	//FIXME need static_call_cond_mod() ?
+	static_call_mod(klp_sched_try_switch)();
+}
+
+#else /* !CONFIG_LIVEPATCH */
+static inline void klp_sched_try_switch(void) {}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _LINUX_LIVEPATCH_SCHED_H_ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4df2b3e76b30..a7acf9ae9b90 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -36,6 +36,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <linux/rv.h>
+#include <linux/livepatch_sched.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -2077,11 +2078,15 @@ static __always_inline int _cond_resched(void)
 	return static_call_mod(cond_resched)();
 }
 
+void sched_dynamic_klp_enable(void);
+void sched_dynamic_klp_disable(void);
+
 #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
 extern int dynamic_cond_resched(void);
 
 static __always_inline int _cond_resched(void)
 {
+	klp_sched_try_switch();
 	return dynamic_cond_resched();
 }
 
@@ -2089,6 +2094,7 @@ static __always_inline int _cond_resched(void)
 
 static inline int _cond_resched(void)
 {
+	klp_sched_try_switch();
 	return __cond_resched();
 }
 
@@ -2096,7 +2102,10 @@ static inline int _cond_resched(void)
 
 #else
 
-static inline int _cond_resched(void) { return 0; }
+static inline int _cond_resched(void) {
+	klp_sched_try_switch();
+	return 0;
+}
 
 #endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */
 
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..3cc4e0a24dc6 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/static_call.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -24,6 +25,9 @@ static int klp_target_state = KLP_UNDEFINED;
 
 static unsigned int klp_signals_cnt;
 
+DEFINE_STATIC_CALL_NULL(klp_sched_try_switch, __klp_sched_try_switch);
+EXPORT_STATIC_CALL_TRAMP(klp_sched_try_switch);
+
 /*
  * This work can be performed periodically to finish patching or unpatching any
  * "straggler" tasks which failed to transition in the first attempt.
@@ -61,6 +65,28 @@ static void klp_synchronize_transition(void)
 	schedule_on_each_cpu(klp_sync);
 }
 
+/*
+ * Enable the klp hooks in cond_resched() while livepatching is in progress.
+ * This helps CPU-bound kthreads get patched.
+ */
+static void klp_sched_hook_enable(void)
+{
+#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+	sched_dynamic_klp_enable();
+#else
+	static_call_update(klp_sched_try_switch, __klp_sched_try_switch);
+#endif
+}
+
+static void klp_sched_hook_disable(void)
+{
+#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+	sched_dynamic_klp_disable();
+#else
+	static_call_update(klp_sched_try_switch, NULL);
+#endif
+}
+
 /*
  * The transition to the target patch state is complete.  Clean up the data
  * structures.
@@ -76,6 +102,8 @@ static void klp_complete_transition(void)
 		 klp_transition_patch->mod->name,
 		 klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
 
+	klp_sched_hook_disable();
+
 	if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED) {
 		klp_unpatch_replaced_patches(klp_transition_patch);
 		klp_discard_nops(klp_transition_patch);
@@ -307,7 +335,11 @@ static bool klp_try_switch_task(struct task_struct *task)
 	 * functions.  If all goes well, switch the task to the target patch
 	 * state.
 	 */
-	ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+	if (task == current)
+		ret = klp_check_and_switch_task(current, &old_name);
+	else
+		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+
 	switch (ret) {
 	case 0:		/* success */
 		break;
@@ -334,6 +366,15 @@ static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+void __klp_sched_try_switch(void)
+{
+	if (likely(!klp_patch_pending(current)))
+		return;
+
+	//FIXME locking
+	klp_try_switch_task(current);
+}
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
  * Kthreads with TIF_PATCH_PENDING set are woken up.
@@ -492,6 +533,8 @@ void klp_start_transition(void)
 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
 	}
 
+	klp_sched_hook_enable();
+
 	klp_signals_cnt = 0;
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3a0ef2fefbd5..4fbf70b05576 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8648,13 +8648,16 @@ int sched_dynamic_mode(const char *str)
 #error "Unsupported PREEMPT_DYNAMIC mechanism"
 #endif
 
+static bool klp_override;
+
 void sched_dynamic_update(int mode)
 {
 	/*
 	 * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in
 	 * the ZERO state, which is invalid.
 	 */
-	preempt_dynamic_enable(cond_resched);
+	if (!klp_override)
+		preempt_dynamic_enable(cond_resched);
 	preempt_dynamic_enable(might_resched);
 	preempt_dynamic_enable(preempt_schedule);
 	preempt_dynamic_enable(preempt_schedule_notrace);
@@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode)
 
 	switch (mode) {
 	case preempt_dynamic_none:
-		preempt_dynamic_enable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_enable(cond_resched);
 		preempt_dynamic_disable(might_resched);
 		preempt_dynamic_disable(preempt_schedule);
 		preempt_dynamic_disable(preempt_schedule_notrace);
 		preempt_dynamic_disable(irqentry_exit_cond_resched);
+		//FIXME avoid printk for klp restore
 		pr_info("Dynamic Preempt: none\n");
 		break;
 
 	case preempt_dynamic_voluntary:
-		preempt_dynamic_enable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_enable(cond_resched);
 		preempt_dynamic_enable(might_resched);
 		preempt_dynamic_disable(preempt_schedule);
 		preempt_dynamic_disable(preempt_schedule_notrace);
@@ -8680,7 +8686,8 @@ void sched_dynamic_update(int mode)
 		break;
 
 	case preempt_dynamic_full:
-		preempt_dynamic_disable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_disable(cond_resched);
 		preempt_dynamic_disable(might_resched);
 		preempt_dynamic_enable(preempt_schedule);
 		preempt_dynamic_enable(preempt_schedule_notrace);
@@ -8692,6 +8699,28 @@ void sched_dynamic_update(int mode)
 	preempt_dynamic_mode = mode;
 }
 
+#ifdef CONFIG_HAVE_PREEMPT_DYNAMIC_CALL
+static int klp_cond_resched(void)
+{
+	__klp_sched_try_switch();
+	return __cond_resched();
+}
+
+void sched_dynamic_klp_enable(void)
+{
+	//FIXME locking
+	klp_override = true;
+	static_call_update(cond_resched, klp_cond_resched);
+}
+
+void sched_dynamic_klp_disable(void)
+{
+	//FIXME locking
+	klp_override = false;
+	sched_dynamic_update(preempt_dynamic_mode);
+}
+#endif /* CONFIG_HAVE_PREEMPT_DYNAMIC_CALL*/
+
 static int __init setup_preempt_mode(char *str)
 {
 	int mode = sched_dynamic_mode(str);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 19:48                   ` Josh Poimboeuf
@ 2023-01-31  1:53                     ` Song Liu
  2023-01-31 10:22                     ` Mark Rutland
  1 sibling, 0 replies; 36+ messages in thread
From: Song Liu @ 2023-01-31  1:53 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Mark Rutland, Peter Zijlstra, Petr Mladek, Joe Lawrence, kvm,
	Michael S. Tsirkin, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 11:48 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote:
> > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> > > >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> > > >  int __sched dynamic_cond_resched(void)
> > > >  {
> > > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > > > +         klp_sched_try_switch();
> > > >           return 0;
> > > > + }
> > > >   return __cond_resched();
> > > >  }
> > > >  EXPORT_SYMBOL(dynamic_cond_resched);
> > >
> > > I would make the klp_sched_try_switch() not depend on
> > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> > > pass through __schedule().
> > >
> > > But you'll probably want to check with Mark here, this all might
> > > generate crap code on arm64.
> >
> > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate
> > at least a load, a conditional branch, and an indirect branch. That's not
> > ideal, but I'd have to benchmark it to find out whether it's a significant
> > overhead relative to the baseline of PREEMPT_DYNAMIC.
> >
> > For arm64 it'd be a bit nicer to have another static key check, and a call to
> > __klp_sched_try_switch(). That way the static key check gets turned into a NOP
> > in the common case, and the call to __klp_sched_try_switch() can be a direct
> > call (potentially a tail-call if we made it return 0).
>
> Hm, it might be nice if our out-of-line static call implementation would
> automatically do a static key check as part of static_call_cond() for
> NULL-type static calls.
>
> But the best answer is probably to just add inline static calls to
> arm64.  Is the lack of objtool the only thing blocking that?
>
> Objtool is now modular, so all the controversial CFG reverse engineering
> is now optional, so it shouldn't be too hard to just enable objtool for
> static call inlines.

This might be a little off topic, and maybe I missed some threads:
How far are we from officially supporting livepatch on arm64?

IIUC, stable stack unwinding is the missing piece at the moment?

Thanks,
Song

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 19:59                 ` Josh Poimboeuf
@ 2023-01-31 10:02                   ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2023-01-31 10:02 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin, netdev,
	Jiri Kosina, linux-kernel, virtualization,
	Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 11:59:30AM -0800, Josh Poimboeuf wrote:

> @@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode)
>  
>  	switch (mode) {
>  	case preempt_dynamic_none:
> -		preempt_dynamic_enable(cond_resched);
> +		if (!klp_override)
> +			preempt_dynamic_enable(cond_resched);
>  		preempt_dynamic_disable(might_resched);
>  		preempt_dynamic_disable(preempt_schedule);
>  		preempt_dynamic_disable(preempt_schedule_notrace);
>  		preempt_dynamic_disable(irqentry_exit_cond_resched);
> +		//FIXME avoid printk for klp restore

		if (mode != preempt_dynamic_mode)

>  		pr_info("Dynamic Preempt: none\n");
>  		break;
>  
>  	case preempt_dynamic_voluntary:
> -		preempt_dynamic_enable(cond_resched);
> +		if (!klp_override)
> +			preempt_dynamic_enable(cond_resched);
>  		preempt_dynamic_enable(might_resched);
>  		preempt_dynamic_disable(preempt_schedule);
>  		preempt_dynamic_disable(preempt_schedule_notrace);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-30 19:48                   ` Josh Poimboeuf
  2023-01-31  1:53                     ` Song Liu
@ 2023-01-31 10:22                     ` Mark Rutland
  2023-01-31 16:38                       ` Josh Poimboeuf
  1 sibling, 1 reply; 36+ messages in thread
From: Mark Rutland @ 2023-01-31 10:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Petr Mladek, Joe Lawrence, kvm,
	Michael S. Tsirkin, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Mon, Jan 30, 2023 at 11:48:23AM -0800, Josh Poimboeuf wrote:
> On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote:
> > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> > > >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> > > >  int __sched dynamic_cond_resched(void)
> > > >  {
> > > > -	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > > > +	if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > > > +		klp_sched_try_switch();
> > > >  		return 0;
> > > > +	}
> > > >  	return __cond_resched();
> > > >  }
> > > >  EXPORT_SYMBOL(dynamic_cond_resched);
> > > 
> > > I would make the klp_sched_try_switch() not depend on
> > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> > > pass through __schedule().
> > > 
> > > But you'll probably want to check with Mark here, this all might
> > > generate crap code on arm64.
> > 
> > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate
> > at least a load, a conditional branch, and an indirect branch. That's not
> > ideal, but I'd have to benchmark it to find out whether it's a significant
> > overhead relative to the baseline of PREEMPT_DYNAMIC.
> > 
> > For arm64 it'd be a bit nicer to have another static key check, and a call to
> > __klp_sched_try_switch(). That way the static key check gets turned into a NOP
> > in the common case, and the call to __klp_sched_try_switch() can be a direct
> > call (potentially a tail-call if we made it return 0).
> 
> Hm, it might be nice if our out-of-line static call implementation would
> automatically do a static key check as part of static_call_cond() for
> NULL-type static calls.
> 
> But the best answer is probably to just add inline static calls to
> arm64.  Is the lack of objtool the only thing blocking that?

The major issues were branch range limitations (and needing the linker to add
PLTs), and painful instruction patching requirements (e.g. the architecture's
"CMODX" rules for Concurrent MODification and eXecution of instructions). We
went with the static key scheme above because that was what our assembled code
generation would devolve to anyway.

If we knew each call-site would only call a particular function or skip the
call, then we could do better (and would probably need something like objtool
to NOP that out at compile time), but since we don't know the callee at build
time we can't ensure we have a PLT in range when necessary.

> Objtool is now modular, so all the controversial CFG reverse engineering
> is now optional, so it shouldn't be too hard to just enable objtool for
> static call inlines.

Funnily enough, I spent some time yesterday looking at enabling a trivial
objtool for arm64 as I wanted some basic ELF rewriting functionality (to
manipulate the mcount_loc table). So I'll likely be looking at that soon
regardless of static calls. :)

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-31 10:22                     ` Mark Rutland
@ 2023-01-31 16:38                       ` Josh Poimboeuf
  2023-02-01 11:10                         ` Mark Rutland
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-01-31 16:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin,
	Peter Zijlstra, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote:
> > Hm, it might be nice if our out-of-line static call implementation would
> > automatically do a static key check as part of static_call_cond() for
> > NULL-type static calls.
> > 
> > But the best answer is probably to just add inline static calls to
> > arm64.  Is the lack of objtool the only thing blocking that?
> 
> The major issues were branch range limitations (and needing the linker to add
> PLTs),

Does the compiler do the right thing (e.g., force PLT) if the branch
target is outside the translation unit?  I'm wondering if we could for
example use objtool to help enforce such rules at the call site.

> and painful instruction patching requirements (e.g. the architecture's
> "CMODX" rules for Concurrent MODification and eXecution of instructions). We
> went with the static key scheme above because that was what our assembled code
> generation would devolve to anyway.
> 
> If we knew each call-site would only call a particular function or skip the
> call, then we could do better (and would probably need something like objtool
> to NOP that out at compile time), but since we don't know the callee at build
> time we can't ensure we have a PLT in range when necessary.

Unfortunately most static calls have multiple destinations.  And most
don't have the option of being NULL.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-01-31 16:38                       ` Josh Poimboeuf
@ 2023-02-01 11:10                         ` Mark Rutland
  2023-02-01 16:57                           ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Mark Rutland @ 2023-02-01 11:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin,
	Peter Zijlstra, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote:
> On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote:
> > > Hm, it might be nice if our out-of-line static call implementation would
> > > automatically do a static key check as part of static_call_cond() for
> > > NULL-type static calls.
> > > 
> > > But the best answer is probably to just add inline static calls to
> > > arm64.  Is the lack of objtool the only thing blocking that?
> > 
> > The major issues were branch range limitations (and needing the linker to add
> > PLTs),
> 
> Does the compiler do the right thing (e.g., force PLT) if the branch
> target is outside the translation unit?  I'm wondering if we could for
> example use objtool to help enforce such rules at the call site.

It's the linker (rather than the compiler) that'll generate the PLT if the
caller and callee are out of range at link time. There are a few other issues
too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers,
CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves
at build time with a patching-friendly code sequence. Ard had a prototype for
that:

  https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/

... but that was sufficiently painful that we went with the current static key
approach:

  https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/

> > and painful instruction patching requirements (e.g. the architecture's
> > "CMODX" rules for Concurrent MODification and eXecution of instructions). We
> > went with the static key scheme above because that was what our assembled code
> > generation would devolve to anyway.
> > 
> > If we knew each call-site would only call a particular function or skip the
> > call, then we could do better (and would probably need something like objtool
> > to NOP that out at compile time), but since we don't know the callee at build
> > time we can't ensure we have a PLT in range when necessary.
> 
> Unfortunately most static calls have multiple destinations.

Sure, but here we're just enabling/disabling a call, which we could treat
differently, or wrap at a different level within the scheduler code. I'm happy
to take a look at that.

> And most don't have the option of being NULL.

Oh, I was under the impression that all could be disabled/skipped, which is
what a NULL target implied.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-02-01 11:10                         ` Mark Rutland
@ 2023-02-01 16:57                           ` Josh Poimboeuf
  2023-02-01 17:11                             ` Mark Rutland
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2023-02-01 16:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin,
	Peter Zijlstra, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Wed, Feb 01, 2023 at 11:10:20AM +0000, Mark Rutland wrote:
> On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote:
> > On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote:
> > > > Hm, it might be nice if our out-of-line static call implementation would
> > > > automatically do a static key check as part of static_call_cond() for
> > > > NULL-type static calls.
> > > > 
> > > > But the best answer is probably to just add inline static calls to
> > > > arm64.  Is the lack of objtool the only thing blocking that?
> > > 
> > > The major issues were branch range limitations (and needing the linker to add
> > > PLTs),
> > 
> > Does the compiler do the right thing (e.g., force PLT) if the branch
> > target is outside the translation unit?  I'm wondering if we could for
> > example use objtool to help enforce such rules at the call site.
> 
> It's the linker (rather than the compiler) that'll generate the PLT if the
> caller and callee are out of range at link time. There are a few other issues
> too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers,
> CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves
> at build time with a patching-friendly code sequence. Ard had a prototype for
> that:
> 
>   https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/
> 
> ... but that was sufficiently painful that we went with the current static key
> approach:
> 
>   https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/

Thanks for the background.

Was there a reason for putting it out-of-line rather than directly in
_cond_resched()?

If it were inline then it wouldn't be that much different from the
static called version and I wonder if we could simplify by just using
the static key for all PREEMPT_DYNAMIC configs.

> > > If we knew each call-site would only call a particular function or skip the
> > > call, then we could do better (and would probably need something like objtool
> > > to NOP that out at compile time), but since we don't know the callee at build
> > > time we can't ensure we have a PLT in range when necessary.
> > 
> > Unfortunately most static calls have multiple destinations.
> 
> Sure, but here we're just enabling/disabling a call, which we could treat
> differently, or wrap at a different level within the scheduler code. I'm happy
> to take a look at that.

I can try to emulate what you did for PREEMPT_DYNAMIC.  I'll Cc you on
my actual patch to come soon-ish.

> > And most don't have the option of being NULL.
> 
> Oh, I was under the impression that all could be disabled/skipped, which is
> what a NULL target implied.

I guess what I was trying to say is that if the target can be NULL, the
call site has to use static_call_cond() to not break the
!HAVE_STATIC_CALL case.  But most call sites use static_call().

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads
  2023-02-01 16:57                           ` Josh Poimboeuf
@ 2023-02-01 17:11                             ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2023-02-01 17:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Petr Mladek, Joe Lawrence, kvm, Michael S. Tsirkin,
	Peter Zijlstra, netdev, Jiri Kosina, linux-kernel,
	virtualization, Seth Forshee (DigitalOcean),
	live-patching, Miroslav Benes

On Wed, Feb 01, 2023 at 08:57:27AM -0800, Josh Poimboeuf wrote:
> On Wed, Feb 01, 2023 at 11:10:20AM +0000, Mark Rutland wrote:
> > On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote:
> > > On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote:
> > > > > Hm, it might be nice if our out-of-line static call implementation would
> > > > > automatically do a static key check as part of static_call_cond() for
> > > > > NULL-type static calls.
> > > > > 
> > > > > But the best answer is probably to just add inline static calls to
> > > > > arm64.  Is the lack of objtool the only thing blocking that?
> > > > 
> > > > The major issues were branch range limitations (and needing the linker to add
> > > > PLTs),
> > > 
> > > Does the compiler do the right thing (e.g., force PLT) if the branch
> > > target is outside the translation unit?  I'm wondering if we could for
> > > example use objtool to help enforce such rules at the call site.
> > 
> > It's the linker (rather than the compiler) that'll generate the PLT if the
> > caller and callee are out of range at link time. There are a few other issues
> > too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers,
> > CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves
> > at build time with a patching-friendly code sequence. Ard had a prototype for
> > that:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/
> > 
> > ... but that was sufficiently painful that we went with the current static key
> > approach:
> > 
> >   https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/
> 
> Thanks for the background.
> 
> Was there a reason for putting it out-of-line rather than directly in
> _cond_resched()?

I think that's mostly a historical accident; I'm not aware of a reaason we
can't put that directly in _cond_resched(). 

Since we started from out-of-line static call trampolines, even the out-of-line
static key check looked nicer, and I think we just never considered moving the
static key check inline.

> If it were inline then it wouldn't be that much different from the
> static called version and I wonder if we could simplify by just using
> the static key for all PREEMPT_DYNAMIC configs.

That would be nice!

> > > > If we knew each call-site would only call a particular function or skip the
> > > > call, then we could do better (and would probably need something like objtool
> > > > to NOP that out at compile time), but since we don't know the callee at build
> > > > time we can't ensure we have a PLT in range when necessary.
> > > 
> > > Unfortunately most static calls have multiple destinations.
> > 
> > Sure, but here we're just enabling/disabling a call, which we could treat
> > differently, or wrap at a different level within the scheduler code. I'm happy
> > to take a look at that.
> 
> I can try to emulate what you did for PREEMPT_DYNAMIC.  I'll Cc you on
> my actual patch to come soon-ish.

I look forward to it! :)

> > > And most don't have the option of being NULL.
> > 
> > Oh, I was under the impression that all could be disabled/skipped, which is
> > what a NULL target implied.
> 
> I guess what I was trying to say is that if the target can be NULL, the
> call site has to use static_call_cond() to not break the
> !HAVE_STATIC_CALL case.  But most call sites use static_call().

Ah, sorry -- I had missed that we had distinct static_call_cond() and
static_call() helpers.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-02-01 17:12 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-20 22:12 [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads Seth Forshee (DigitalOcean)
2023-01-20 22:12 ` [PATCH 1/2] livepatch: add an interface for safely switching kthreads Seth Forshee (DigitalOcean)
2023-01-20 22:12 ` [PATCH 2/2] vhost: check for pending livepatches from vhost worker kthreads Seth Forshee (DigitalOcean)
2023-01-24 14:17   ` Petr Mladek
2023-01-24 17:21     ` Seth Forshee
2023-01-25 11:34       ` Petr Mladek
2023-01-25 16:57         ` Seth Forshee
2023-01-26 11:16           ` Petr Mladek
2023-01-26 11:49             ` Petr Mladek
2023-01-22  8:34 ` [PATCH 0/2] vhost: improve livepatch switching for heavily loaded " Michael S. Tsirkin
2023-01-26 17:03 ` Petr Mladek
2023-01-26 21:12   ` Seth Forshee (DigitalOcean)
2023-01-27  4:43     ` Josh Poimboeuf
2023-01-27 10:37       ` Peter Zijlstra
2023-01-27 12:09         ` Petr Mladek
2023-01-27 14:37           ` Seth Forshee
2023-01-27 16:52         ` Josh Poimboeuf
2023-01-27 17:09           ` Josh Poimboeuf
2023-01-27 22:11             ` Josh Poimboeuf
2023-01-30 12:40               ` Peter Zijlstra
2023-01-30 17:50                 ` Seth Forshee
2023-01-30 18:18                 ` Josh Poimboeuf
2023-01-30 18:36                 ` Mark Rutland
2023-01-30 19:48                   ` Josh Poimboeuf
2023-01-31  1:53                     ` Song Liu
2023-01-31 10:22                     ` Mark Rutland
2023-01-31 16:38                       ` Josh Poimboeuf
2023-02-01 11:10                         ` Mark Rutland
2023-02-01 16:57                           ` Josh Poimboeuf
2023-02-01 17:11                             ` Mark Rutland
2023-01-30 19:59                 ` Josh Poimboeuf
2023-01-31 10:02                   ` Peter Zijlstra
2023-01-27 20:02         ` Seth Forshee
2023-01-27 11:19     ` Petr Mladek
2023-01-27 14:57       ` Seth Forshee
2023-01-30  9:55         ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).