All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
@ 2022-05-03 17:49 Seth Forshee
  2022-05-03 17:53 ` Seth Forshee
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Seth Forshee @ 2022-05-03 17:49 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Paolo Bonzini, Sean Christopherson, linux-kernel, live-patching,
	kvm

A task can be livepatched only when it is sleeping or it exits to
userspace. This may happen infrequently for a heavily loaded vCPU task,
leading to livepatch transition failures.

Fake signals will be sent to tasks which fail patching via stack
checking. This will cause running vCPU tasks to exit guest mode, but
since no signal is pending they return to guest execution without
exiting to userspace. Fix this by treating a pending livepatch migration
like a pending signal, exiting to userspace with EINTR. This allows the
task to be patched, and userspace should re-excecute KVM_RUN to resume
guest execution.

In my testing, systems where livepatching would timeout after 60 seconds
were able to load livepatches within a couple of seconds with this
change.

Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
---
Changes in v2:
 - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
 - Reworded commit message and comments to avoid confusion around the
   term "migrate"

 include/linux/entry-kvm.h | 4 ++--
 kernel/entry/kvm.c        | 7 ++++++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
index 6813171afccb..bf79e4cbb5a2 100644
--- a/include/linux/entry-kvm.h
+++ b/include/linux/entry-kvm.h
@@ -17,8 +17,8 @@
 #endif
 
 #define XFER_TO_GUEST_MODE_WORK						\
-	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
-	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
+	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
+	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
 
 struct kvm_vcpu;
 
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 9d09f489b60e..98439dfaa1a0 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 				task_work_run();
 		}
 
-		if (ti_work & _TIF_SIGPENDING) {
+		/*
+		 * When a livepatch is pending, force an exit to userspace
+		 * as though a signal is pending to allow the task to be
+		 * patched.
+		 */
+		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
 			kvm_handle_signal_exit(vcpu);
 			return -EINTR;
 		}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
@ 2022-05-03 17:53 ` Seth Forshee
  2022-05-04  1:08 ` kernel test robot
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Seth Forshee @ 2022-05-03 17:53 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Paolo Bonzini, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Tue, May 03, 2022 at 12:49:34PM -0500, Seth Forshee wrote:
> A task can be livepatched only when it is sleeping or it exits to
> userspace. This may happen infrequently for a heavily loaded vCPU task,
> leading to livepatch transition failures.
> 
> Fake signals will be sent to tasks which fail patching via stack
> checking. This will cause running vCPU tasks to exit guest mode, but
> since no signal is pending they return to guest execution without
> exiting to userspace. Fix this by treating a pending livepatch migration
> like a pending signal, exiting to userspace with EINTR. This allows the
> task to be patched, and userspace should re-excecute KVM_RUN to resume
> guest execution.
> 
> In my testing, systems where livepatching would timeout after 60 seconds
> were able to load livepatches within a couple of seconds with this
> change.
> 
> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> ---
> Changes in v2:
>  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK

Clearly I meant _TIF_PATCH_PENDING here and not _TIF_SIGPENDING.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
  2022-05-03 17:53 ` Seth Forshee
@ 2022-05-04  1:08 ` kernel test robot
  2022-05-04 12:44 ` Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2022-05-04  1:08 UTC (permalink / raw)
  To: Seth Forshee, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski
  Cc: kbuild-all, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Paolo Bonzini, Sean Christopherson, linux-kernel,
	live-patching, kvm

Hi Seth,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.18-rc5]
[also build test ERROR on next-20220503]
[cannot apply to tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Seth-Forshee/entry-kvm-Make-vCPU-tasks-exit-to-userspace-when-a-livepatch-is-pending/20220504-015159
base:    672c0c5173427e6b3e2a9bbb7be51ceeec78093a
config: arm64-randconfig-r034-20220502 (https://download.01.org/0day-ci/archive/20220504/202205040945.JZNNub2D-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/1c97c02f02b9f8e6b8e1f11657f950510f9c828e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Seth-Forshee/entry-kvm-Make-vCPU-tasks-exit-to-userspace-when-a-livepatch-is-pending/20220504-015159
        git checkout 1c97c02f02b9f8e6b8e1f11657f950510f9c828e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/arm64/kvm/arm.c:9:
   include/linux/entry-kvm.h: In function '__xfer_to_guest_mode_work_pending':
>> include/linux/entry-kvm.h:20:48: error: '_TIF_PATCH_PENDING' undeclared (first use in this function)
      20 |         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
         |                                                ^~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:80:29: note: in expansion of macro 'XFER_TO_GUEST_MODE_WORK'
      80 |         return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:20:48: note: each undeclared identifier is reported only once for each function it appears in
      20 |         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
         |                                                ^~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:80:29: note: in expansion of macro 'XFER_TO_GUEST_MODE_WORK'
      80 |         return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:81:1: error: control reaches end of non-void function [-Werror=return-type]
      81 | }
         | ^
   cc1: some warnings being treated as errors
--
   In file included from kernel/entry/kvm.c:3:
   include/linux/entry-kvm.h: In function '__xfer_to_guest_mode_work_pending':
>> include/linux/entry-kvm.h:20:48: error: '_TIF_PATCH_PENDING' undeclared (first use in this function)
      20 |         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
         |                                                ^~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:80:29: note: in expansion of macro 'XFER_TO_GUEST_MODE_WORK'
      80 |         return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:20:48: note: each undeclared identifier is reported only once for each function it appears in
      20 |         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
         |                                                ^~~~~~~~~~~~~~~~~~
   include/linux/entry-kvm.h:80:29: note: in expansion of macro 'XFER_TO_GUEST_MODE_WORK'
      80 |         return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/entry/kvm.c: In function 'xfer_to_guest_mode_work':
>> kernel/entry/kvm.c:22:50: error: '_TIF_PATCH_PENDING' undeclared (first use in this function)
      22 |                 if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
         |                                                  ^~~~~~~~~~~~~~~~~~
   In file included from kernel/entry/kvm.c:3:
   kernel/entry/kvm.c: In function 'xfer_to_guest_mode_handle_work':
>> include/linux/entry-kvm.h:20:48: error: '_TIF_PATCH_PENDING' undeclared (first use in this function)
      20 |         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
         |                                                ^~~~~~~~~~~~~~~~~~
   kernel/entry/kvm.c:55:25: note: in expansion of macro 'XFER_TO_GUEST_MODE_WORK'
      55 |         if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
         |                         ^~~~~~~~~~~~~~~~~~~~~~~


vim +/_TIF_PATCH_PENDING +20 include/linux/entry-kvm.h

    18	
    19	#define XFER_TO_GUEST_MODE_WORK						\
  > 20		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
    21		 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
    22	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
  2022-05-03 17:53 ` Seth Forshee
  2022-05-04  1:08 ` kernel test robot
@ 2022-05-04 12:44 ` Thomas Gleixner
  2022-05-04 13:07 ` Petr Mladek
  2022-05-04 15:01 ` kernel test robot
  4 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2022-05-04 12:44 UTC (permalink / raw)
  To: Seth Forshee, Peter Zijlstra, Andy Lutomirski
  Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Paolo Bonzini, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Tue, May 03 2022 at 12:49, Seth Forshee wrote:
> diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
> index 6813171afccb..bf79e4cbb5a2 100644
> --- a/include/linux/entry-kvm.h
> +++ b/include/linux/entry-kvm.h
> @@ -17,8 +17,8 @@
>  #endif
>  
>  #define XFER_TO_GUEST_MODE_WORK						\
> -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
> -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)

as the 0-day robot has demonstrated already, this cannot compile on
architectures which do not provide _TIF_PATCH_PENDING...

Something like the below is required.

Thanks,

        tglx
---
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -3,6 +3,7 @@
 #define __LINUX_ENTRYCOMMON_H
 
 #include <linux/static_call_types.h>
+#include <linux/entry-defs.h>
 #include <linux/ptrace.h>
 #include <linux/syscalls.h>
 #include <linux/seccomp.h>
@@ -11,18 +12,6 @@
 #include <asm/entry-common.h>
 
 /*
- * Define dummy _TIF work flags if not defined by the architecture or for
- * disabled functionality.
- */
-#ifndef _TIF_PATCH_PENDING
-# define _TIF_PATCH_PENDING		(0)
-#endif
-
-#ifndef _TIF_UPROBE
-# define _TIF_UPROBE			(0)
-#endif
-
-/*
  * SYSCALL_WORK flags handled in syscall_enter_from_user_mode()
  */
 #ifndef ARCH_SYSCALL_WORK_ENTER
--- /dev/null
+++ b/include/linux/entry-defs.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_ENTRYDEFS_H
+#define __LINUX_ENTRYDEFS_H
+
+#include <linux/thread_info.h>
+
+/*
+ * Define dummy _TIF work flags if not defined by the architecture or for
+ * disabled functionality.
+ */
+#ifndef _TIF_PATCH_PENDING
+# define _TIF_PATCH_PENDING		(0)
+#endif
+
+#ifndef _TIF_UPROBE
+# define _TIF_UPROBE			(0)
+#endif
+
+#endif
--- a/include/linux/entry-kvm.h
+++ b/include/linux/entry-kvm.h
@@ -4,6 +4,7 @@
 
 #include <linux/static_call_types.h>
 #include <linux/resume_user_mode.h>
+#include <linux/entry-defs.h>
 #include <linux/syscalls.h>
 #include <linux/seccomp.h>
 #include <linux/sched.h>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
                   ` (2 preceding siblings ...)
  2022-05-04 12:44 ` Thomas Gleixner
@ 2022-05-04 13:07 ` Petr Mladek
  2022-05-04 13:50   ` Seth Forshee
  2022-05-04 14:16   ` Eric W. Biederman
  2022-05-04 15:01 ` kernel test robot
  4 siblings, 2 replies; 14+ messages in thread
From: Petr Mladek @ 2022-05-04 13:07 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Paolo Bonzini, Eric W. Biederman,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> A task can be livepatched only when it is sleeping or it exits to
> userspace. This may happen infrequently for a heavily loaded vCPU task,
> leading to livepatch transition failures.

This is misleading.

First, the problem is not a loaded CPU. The problem is that the
task might spend very long time in the kernel when handling
some syscall.

Second, there is no timeout for the transition in the kernel code.
It might take very long time but it will not fail.

> Fake signals will be sent to tasks which fail patching via stack
> checking. This will cause running vCPU tasks to exit guest mode, but
> since no signal is pending they return to guest execution without
> exiting to userspace. Fix this by treating a pending livepatch migration
> like a pending signal, exiting to userspace with EINTR. This allows the
> task to be patched, and userspace should re-excecute KVM_RUN to resume
> guest execution.

It seems that the patch works as expected but it is far from clear.
And the above description helps only partially. Let me try to
explain it for dummies like me ;-)

<explanation>
The problem was solved by sending a fake signal, see the commit
0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
and woke the task. It interrupted the syscall and the task was
transitioned when leaving to the userspace.

signal_wake_up() was later replaced by set_notify_signal(),
see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
instead of TIF_SIGPENDING.

The effect is the same when running on a real hardware. The syscall
gets interrupted and exit_to_user_mode_loop() is called where
the livepatch state is updated (task migrated).

But it works a different way in kvm where the task works are
called in the guest mode and the task does not return into
the user space in the host mode.
</explanation>

The solution provided by this patch is a bit weird, see below.


> In my testing, systems where livepatching would timeout after 60 seconds
> were able to load livepatches within a couple of seconds with this
> change.
> 
> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> ---
> Changes in v2:
>  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
>  - Reworded commit message and comments to avoid confusion around the
>    term "migrate"
> 
>  include/linux/entry-kvm.h | 4 ++--
>  kernel/entry/kvm.c        | 7 ++++++-
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
> index 6813171afccb..bf79e4cbb5a2 100644
> --- a/include/linux/entry-kvm.h
> +++ b/include/linux/entry-kvm.h
> @@ -17,8 +17,8 @@
>  #endif
>  
>  #define XFER_TO_GUEST_MODE_WORK						\
> -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
> -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
>  
>  struct kvm_vcpu;
>  
> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> index 9d09f489b60e..98439dfaa1a0 100644
> --- a/kernel/entry/kvm.c
> +++ b/kernel/entry/kvm.c
> @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
>  				task_work_run();
>  		}
>  
> -		if (ti_work & _TIF_SIGPENDING) {
> +		/*
> +		 * When a livepatch is pending, force an exit to userspace
> +		 * as though a signal is pending to allow the task to be
> +		 * patched.
> +		 */
> +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
>  			kvm_handle_signal_exit(vcpu);
>  			return -EINTR;
>  		}

This looks strange:

  + klp_send_signals() calls set_notify_signal(task) that sets
    TIF_NOTIFY_SIGNAL

  + xfer_to_guest_mode_work() handles TIF_NOTIFY_SIGNAL by calling
    task_work_run().

  + This patch calls kvm_handle_signal_exit(vcpu) when
    _TIF_PATCH_PENDING is set. It probably causes the guest
    to call exit_to_user_mode_loop() because TIF_PATCH_PENDING
    bit is set. But neither TIF_NOTIFY_SIGNAL not TIF_NOTIFY_SIGNAL
    is set so that it works different way than on the real hardware.


Question:

Does xfer_to_guest_mode_work() interrupts the syscall running
on the guest?

If "yes" then we do not need to call kvm_handle_signal_exit(vcpu).
It will be enough to call:

		if (ti_work & _TIF_PATCH_PENDING)
			klp_update_patch_state(current);

If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
the syscall on the real hardware and not in kvm.

Anyway, we either should make sure that TIF_NOTIFY_SIGNAL has the same
effect on the real hardware and in kvm. Or we need another interface
for the fake signal used by livepatching.

Adding Jens Axboe and Eric into Cc.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 13:07 ` Petr Mladek
@ 2022-05-04 13:50   ` Seth Forshee
  2022-05-04 14:28     ` Petr Mladek
  2022-05-04 14:16   ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Seth Forshee @ 2022-05-04 13:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Paolo Bonzini, Eric W. Biederman,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Wed, May 04, 2022 at 03:07:53PM +0200, Petr Mladek wrote:
> On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> > A task can be livepatched only when it is sleeping or it exits to
> > userspace. This may happen infrequently for a heavily loaded vCPU task,
> > leading to livepatch transition failures.
> 
> This is misleading.
> 
> First, the problem is not a loaded CPU. The problem is that the
> task might spend very long time in the kernel when handling
> some syscall.

It's a fully loaded vCPU, which yes to the host looks like spending a
very long time in the ioctl(KVM_RUN) syscall. I can reword to clarify.

> Second, there is no timeout for the transition in the kernel code.
> It might take very long time but it will not fail.

I suppose the timeout is in kpatch then. I didn't check what implemented
the timeout. I'll remove the statement about timing out.

> > Fake signals will be sent to tasks which fail patching via stack
> > checking. This will cause running vCPU tasks to exit guest mode, but
> > since no signal is pending they return to guest execution without
> > exiting to userspace. Fix this by treating a pending livepatch migration
> > like a pending signal, exiting to userspace with EINTR. This allows the
> > task to be patched, and userspace should re-excecute KVM_RUN to resume
> > guest execution.
> 
> It seems that the patch works as expected but it is far from clear.
> And the above description helps only partially. Let me try to
> explain it for dummies like me ;-)
> 
> <explanation>
> The problem was solved by sending a fake signal, see the commit
> 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> and woke the task. It interrupted the syscall and the task was
> transitioned when leaving to the userspace.
> 
> signal_wake_up() was later replaced by set_notify_signal(),
> see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> instead of TIF_SIGPENDING.
> 
> The effect is the same when running on a real hardware. The syscall
> gets interrupted and exit_to_user_mode_loop() is called where
> the livepatch state is updated (task migrated).
> 
> But it works a different way in kvm where the task works are
> called in the guest mode and the task does not return into
> the user space in the host mode.
> </explanation>

Thanks, I can update the commit message to include more of this
background.

> 
> The solution provided by this patch is a bit weird, see below.
> 
> 
> > In my testing, systems where livepatching would timeout after 60 seconds
> > were able to load livepatches within a couple of seconds with this
> > change.
> > 
> > Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> > ---
> > Changes in v2:
> >  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
> >  - Reworded commit message and comments to avoid confusion around the
> >    term "migrate"
> > 
> >  include/linux/entry-kvm.h | 4 ++--
> >  kernel/entry/kvm.c        | 7 ++++++-
> >  2 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
> > index 6813171afccb..bf79e4cbb5a2 100644
> > --- a/include/linux/entry-kvm.h
> > +++ b/include/linux/entry-kvm.h
> > @@ -17,8 +17,8 @@
> >  #endif
> >  
> >  #define XFER_TO_GUEST_MODE_WORK						\
> > -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
> > -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> > +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> > +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> >  
> >  struct kvm_vcpu;
> >  
> > diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> > index 9d09f489b60e..98439dfaa1a0 100644
> > --- a/kernel/entry/kvm.c
> > +++ b/kernel/entry/kvm.c
> > @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> >  				task_work_run();
> >  		}
> >  
> > -		if (ti_work & _TIF_SIGPENDING) {
> > +		/*
> > +		 * When a livepatch is pending, force an exit to userspace
> > +		 * as though a signal is pending to allow the task to be
> > +		 * patched.
> > +		 */
> > +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> >  			kvm_handle_signal_exit(vcpu);
> >  			return -EINTR;
> >  		}
> 
> This looks strange:
> 
>   + klp_send_signals() calls set_notify_signal(task) that sets
>     TIF_NOTIFY_SIGNAL
> 
>   + xfer_to_guest_mode_work() handles TIF_NOTIFY_SIGNAL by calling
>     task_work_run().
> 
>   + This patch calls kvm_handle_signal_exit(vcpu) when
>     _TIF_PATCH_PENDING is set. It probably causes the guest
>     to call exit_to_user_mode_loop() because TIF_PATCH_PENDING
>     bit is set. But neither TIF_NOTIFY_SIGNAL not TIF_NOTIFY_SIGNAL
>     is set so that it works different way than on the real hardware.
> 
> 
> Question:
> 
> Does xfer_to_guest_mode_work() interrupts the syscall running
> on the guest?

xfer_to_guest_mode_work() is called as part of a loop to execute kvm
guests (for example, on x86 see vcpu_run() in arch/x86/kvm/x86.c). When
guest execution is interrupted (in the livepatch case it is interrupted
when set_notify_signal() is called for the vCPU task)
xfer_to_guest_mode_work() is called if there is pending work, and if it
returns non-zero the loop does not immediately re-enter guest execution
but instead returns to userspace.

> If "yes" then we do not need to call kvm_handle_signal_exit(vcpu).
> It will be enough to call:
> 
> 		if (ti_work & _TIF_PATCH_PENDING)
> 			klp_update_patch_state(current);

What if the task's call stack contains a function being patched?

> 
> If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
> the syscall on the real hardware and not in kvm.

It does interrupt, but xfer_to_guest_mode_handle_work() concludes it's
not necessary to return to userspace and resumes guest execution.

Thanks,
Seth

> Anyway, we either should make sure that TIF_NOTIFY_SIGNAL has the same
> effect on the real hardware and in kvm. Or we need another interface
> for the fake signal used by livepatching.
> 
> Adding Jens Axboe and Eric into Cc.
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 13:07 ` Petr Mladek
  2022-05-04 13:50   ` Seth Forshee
@ 2022-05-04 14:16   ` Eric W. Biederman
  2022-05-04 15:12     ` Petr Mladek
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2022-05-04 14:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Seth Forshee, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Paolo Bonzini,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

Petr Mladek <pmladek@suse.com> writes:

> On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
>> A task can be livepatched only when it is sleeping or it exits to
>> userspace. This may happen infrequently for a heavily loaded vCPU task,
>> leading to livepatch transition failures.
>
> This is misleading.
>
> First, the problem is not a loaded CPU. The problem is that the
> task might spend very long time in the kernel when handling
> some syscall.
>
> Second, there is no timeout for the transition in the kernel code.
> It might take very long time but it will not fail.
>
>> Fake signals will be sent to tasks which fail patching via stack
>> checking. This will cause running vCPU tasks to exit guest mode, but
>> since no signal is pending they return to guest execution without
>> exiting to userspace. Fix this by treating a pending livepatch migration
>> like a pending signal, exiting to userspace with EINTR. This allows the
>> task to be patched, and userspace should re-excecute KVM_RUN to resume
>> guest execution.
>
> It seems that the patch works as expected but it is far from clear.
> And the above description helps only partially. Let me try to
> explain it for dummies like me ;-)
>
> <explanation>
> The problem was solved by sending a fake signal, see the commit
> 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> and woke the task. It interrupted the syscall and the task was
> transitioned when leaving to the userspace.
>
> signal_wake_up() was later replaced by set_notify_signal(),
> see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> instead of TIF_SIGPENDING.
>
> The effect is the same when running on a real hardware. The syscall
> gets interrupted and exit_to_user_mode_loop() is called where
> the livepatch state is updated (task migrated).
>
> But it works a different way in kvm where the task works are
> called in the guest mode and the task does not return into
> the user space in the host mode.
> </explanation>
>
> The solution provided by this patch is a bit weird, see below.
>
>
>> In my testing, systems where livepatching would timeout after 60 seconds
>> were able to load livepatches within a couple of seconds with this
>> change.
>> 
>> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
>> ---
>> Changes in v2:
>>  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
>>  - Reworded commit message and comments to avoid confusion around the
>>    term "migrate"
>> 
>>  include/linux/entry-kvm.h | 4 ++--
>>  kernel/entry/kvm.c        | 7 ++++++-
>>  2 files changed, 8 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
>> index 6813171afccb..bf79e4cbb5a2 100644
>> --- a/include/linux/entry-kvm.h
>> +++ b/include/linux/entry-kvm.h
>> @@ -17,8 +17,8 @@
>>  #endif
>>  
>>  #define XFER_TO_GUEST_MODE_WORK						\
>> -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
>> -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
>> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
>> +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
>>  
>>  struct kvm_vcpu;
>>  
>> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
>> index 9d09f489b60e..98439dfaa1a0 100644
>> --- a/kernel/entry/kvm.c
>> +++ b/kernel/entry/kvm.c
>> @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
>>  				task_work_run();
>>  		}
>>  
>> -		if (ti_work & _TIF_SIGPENDING) {
>> +		/*
>> +		 * When a livepatch is pending, force an exit to userspace
>> +		 * as though a signal is pending to allow the task to be
>> +		 * patched.
>> +		 */
>> +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
>>  			kvm_handle_signal_exit(vcpu);
>>  			return -EINTR;
>>  		}
>
> This looks strange:
>
>   + klp_send_signals() calls set_notify_signal(task) that sets
>     TIF_NOTIFY_SIGNAL
>
>   + xfer_to_guest_mode_work() handles TIF_NOTIFY_SIGNAL by calling
>     task_work_run().
>
>   + This patch calls kvm_handle_signal_exit(vcpu) when
>     _TIF_PATCH_PENDING is set. It probably causes the guest
>     to call exit_to_user_mode_loop() because TIF_PATCH_PENDING
>     bit is set. But neither TIF_NOTIFY_SIGNAL not TIF_NOTIFY_SIGNAL
>     is set so that it works different way than on the real hardware.
>
>
> Question:
>
> Does xfer_to_guest_mode_work() interrupts the syscall running
> on the guest?

It looks like xfer_to_guest_mode_work runs like exit_to_user_mode_loop,
just before guest mode is invoked.

> If "yes" then we do not need to call kvm_handle_signal_exit(vcpu).
> It will be enough to call:
>
> 		if (ti_work & _TIF_PATCH_PENDING)
> 			klp_update_patch_state(current);
>
> If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
> the syscall on the real hardware and not in kvm.
>
> Anyway, we either should make sure that TIF_NOTIFY_SIGNAL has the same
> effect on the real hardware and in kvm. Or we need another interface
> for the fake signal used by livepatching.

The point of TIF_NOTIFY_SIGNAL is to break out of interruptible kernel
loops.  Once out of the interruptible kernel loop the expectation is the
returns to user space and on it's way runs the exit_to_user_mode_loop or
is architecture specific equivalent.


> Adding Jens Axboe and Eric into Cc.


Yes.  This is interesting.

Reading through the history of kernel/entry/kvm.c I believe
I made ``conservative'' changes that has not helped this situation.

Long story short at one point it was thought that _TIF_SIGPENDING
and _TIF_NOTIFY_SIGNAL could be separated and they could not.
Unfortunately the work to separate their handling has not been
completely undone.

In this case it appears that the only reason xfer_to_guest_mode_work
touches task_work_run is because of the separation work done by Jens
Axboe.  I don't see any kvm specific reason for _TIF_NOTIFY_SIGNAL
and _TIF_SIGPENDING to be treated differently.  Meanwhile my cleanups
elsewhere have made the unnecessary _TIF_NOTIFY_SIGNAL special case
bigger in xfer_to_guest_mode_work.

I suspect the first step in fixing things really should be just handling
_TIF_SIGPENDING and _TIF_NOTIFY_SIGNAL the same.

static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
{
	do {
		int ret;

		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
			kvm_handle_signal_exit(vcpu);
			return -EINTR;
		}

		if (ti_work & _TIF_NEED_RESCHED)
			schedule();

		if (ti_work & _TIF_NOTIFY_RESUME)
			resume_user_mode_work(NULL);

		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
		if (ret)
			return ret;

		ti_work = read_thread_flags();
	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
	return 0;
}

That said I do expect adding support for the live patching into
xfer_to_guest_mode_work, like there is in exit_to_user_mode_loop, is
probably a good idea.  That should prevent the live patching code from
needing to set TIF_NOTIFY_SIGNAL.

Something like:

Thomas Gleixner's patch to make _TIF_PATCH_PENDING always available.

#define XFER_TO_GUEST_MODE_WORK						\
	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)


static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
{
	do {
		int ret;

		if (ti_work & _TIF_PATCH_PENDING)
			klp_update_patch_state(current);

		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
			kvm_handle_signal_exit(vcpu);
			return -EINTR;
		}

		if (ti_work & _TIF_NEED_RESCHED)
			schedule();

		if (ti_work & _TIF_NOTIFY_RESUME)
			resume_user_mode_work(NULL);

		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
		if (ret)
			return ret;

		ti_work = read_thread_flags();
	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
	return 0;
}

If the kvm and the live patching folks could check my thinking that
would be great.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 13:50   ` Seth Forshee
@ 2022-05-04 14:28     ` Petr Mladek
  2022-05-04 14:44       ` Seth Forshee
  2022-05-04 14:53       ` Eric W. Biederman
  0 siblings, 2 replies; 14+ messages in thread
From: Petr Mladek @ 2022-05-04 14:28 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Paolo Bonzini, Eric W. Biederman,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Wed 2022-05-04 08:50:22, Seth Forshee wrote:
> On Wed, May 04, 2022 at 03:07:53PM +0200, Petr Mladek wrote:
> > On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> > > A task can be livepatched only when it is sleeping or it exits to
> > > userspace. This may happen infrequently for a heavily loaded vCPU task,
> > > leading to livepatch transition failures.
> > 
> > The problem was solved by sending a fake signal, see the commit
> > 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> > It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> > and woke the task. It interrupted the syscall and the task was
> > transitioned when leaving to the userspace.
> > 
> > signal_wake_up() was later replaced by set_notify_signal(),
> > see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> > the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> > The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> > instead of TIF_SIGPENDING.
> > 
> > The effect is the same when running on a real hardware. The syscall
> > gets interrupted and exit_to_user_mode_loop() is called where
> > the livepatch state is updated (task migrated).
> > 
> > But it works a different way in kvm where the task works are
> > called in the guest mode and the task does not return into
> > the user space in the host mode.
> 
> > > --- a/kernel/entry/kvm.c
> > > +++ b/kernel/entry/kvm.c
> > > @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > >  				task_work_run();
> > >  		}
> > >  
> > > -		if (ti_work & _TIF_SIGPENDING) {
> > > +		/*
> > > +		 * When a livepatch is pending, force an exit to userspace
> > > +		 * as though a signal is pending to allow the task to be
> > > +		 * patched.
> > > +		 */
> > > +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> > >  			kvm_handle_signal_exit(vcpu);

Another problem. Is it safe to call kvm_handle_signal_exit(vcpu)
for kthreads?

kthreads have _TIF_PATCH_PENDING when they need the livepatch transition.
But kthreads never leave kernel so we do not send the fake signal
signals to them.


> > >  			return -EINTR;
> > >  		}
> > 
> > Does xfer_to_guest_mode_work() interrupts the syscall running
> > on the guest?
> 
> xfer_to_guest_mode_work() is called as part of a loop to execute kvm
> guests (for example, on x86 see vcpu_run() in arch/x86/kvm/x86.c). When
> guest execution is interrupted (in the livepatch case it is interrupted
> when set_notify_signal() is called for the vCPU task)
> xfer_to_guest_mode_work() is called if there is pending work, and if it
> returns non-zero the loop does not immediately re-enter guest execution
> but instead returns to userspace.

Thanks for the detailed explanation.


> > If "yes" then we do not need to call kvm_handle_signal_exit(vcpu).
> > It will be enough to call:
> > 
> > 		if (ti_work & _TIF_PATCH_PENDING)
> > 			klp_update_patch_state(current);
> 
> What if the task's call stack contains a function being patched?

We do not need to check the stack when the syscall gets restarted.
The task might be transitioned only when the syscall gets restarted.


> > If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
> > the syscall on the real hardware and not in kvm.
> 
> It does interrupt, but xfer_to_guest_mode_handle_work() concludes it's
> not necessary to return to userspace and resumes guest execution.

In this case, we should revert the commit 8df1947c71ee53c7e21
("livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL
infrastructure"). The flag TIF_NOTIFY_SIGNAL clearly does not guarantee
restarting the syscall or exiting to the user space with -EINTR.

It should solve this problem. And it looks like a cleaner solution
to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 14:28     ` Petr Mladek
@ 2022-05-04 14:44       ` Seth Forshee
  2022-05-04 14:57         ` Petr Mladek
  2022-05-04 14:53       ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Seth Forshee @ 2022-05-04 14:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Paolo Bonzini, Eric W. Biederman,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Wed, May 04, 2022 at 04:28:09PM +0200, Petr Mladek wrote:
> On Wed 2022-05-04 08:50:22, Seth Forshee wrote:
> > On Wed, May 04, 2022 at 03:07:53PM +0200, Petr Mladek wrote:
> > > On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> > > > A task can be livepatched only when it is sleeping or it exits to
> > > > userspace. This may happen infrequently for a heavily loaded vCPU task,
> > > > leading to livepatch transition failures.
> > > 
> > > The problem was solved by sending a fake signal, see the commit
> > > 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> > > It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> > > and woke the task. It interrupted the syscall and the task was
> > > transitioned when leaving to the userspace.
> > > 
> > > signal_wake_up() was later replaced by set_notify_signal(),
> > > see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> > > the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> > > The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> > > instead of TIF_SIGPENDING.
> > > 
> > > The effect is the same when running on a real hardware. The syscall
> > > gets interrupted and exit_to_user_mode_loop() is called where
> > > the livepatch state is updated (task migrated).
> > > 
> > > But it works a different way in kvm where the task works are
> > > called in the guest mode and the task does not return into
> > > the user space in the host mode.
> > 
> > > > --- a/kernel/entry/kvm.c
> > > > +++ b/kernel/entry/kvm.c
> > > > @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > > >  				task_work_run();
> > > >  		}
> > > >  
> > > > -		if (ti_work & _TIF_SIGPENDING) {
> > > > +		/*
> > > > +		 * When a livepatch is pending, force an exit to userspace
> > > > +		 * as though a signal is pending to allow the task to be
> > > > +		 * patched.
> > > > +		 */
> > > > +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> > > >  			kvm_handle_signal_exit(vcpu);
> 
> Another problem. Is it safe to call kvm_handle_signal_exit(vcpu)
> for kthreads?
> 
> kthreads have _TIF_PATCH_PENDING when they need the livepatch transition.
> But kthreads never leave kernel so we do not send the fake signal
> signals to them.

xfer_to_guest_mode_handle_work() should only be getting called on user
threads running ioctl(KVM_RUN).

> 
> > > >  			return -EINTR;
> > > >  		}
> > > 
> > > Does xfer_to_guest_mode_work() interrupts the syscall running
> > > on the guest?
> > 
> > xfer_to_guest_mode_work() is called as part of a loop to execute kvm
> > guests (for example, on x86 see vcpu_run() in arch/x86/kvm/x86.c). When
> > guest execution is interrupted (in the livepatch case it is interrupted
> > when set_notify_signal() is called for the vCPU task)
> > xfer_to_guest_mode_work() is called if there is pending work, and if it
> > returns non-zero the loop does not immediately re-enter guest execution
> > but instead returns to userspace.
> 
> Thanks for the detailed explanation.
> 
> 
> > > If "yes" then we do not need to call kvm_handle_signal_exit(vcpu).
> > > It will be enough to call:
> > > 
> > > 		if (ti_work & _TIF_PATCH_PENDING)
> > > 			klp_update_patch_state(current);
> > 
> > What if the task's call stack contains a function being patched?
> 
> We do not need to check the stack when the syscall gets restarted.
> The task might be transitioned only when the syscall gets restarted.

I see. Thanks!

> > > If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
> > > the syscall on the real hardware and not in kvm.
> > 
> > It does interrupt, but xfer_to_guest_mode_handle_work() concludes it's
> > not necessary to return to userspace and resumes guest execution.
> 
> In this case, we should revert the commit 8df1947c71ee53c7e21
> ("livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL
> infrastructure"). The flag TIF_NOTIFY_SIGNAL clearly does not guarantee
> restarting the syscall or exiting to the user space with -EINTR.
> 
> It should solve this problem. And it looks like a cleaner solution
> to me.

It looks like that should fix the issue. I'll test to confirm.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 14:28     ` Petr Mladek
  2022-05-04 14:44       ` Seth Forshee
@ 2022-05-04 14:53       ` Eric W. Biederman
  1 sibling, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2022-05-04 14:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Seth Forshee, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Paolo Bonzini,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

Petr Mladek <pmladek@suse.com> writes:

> On Wed 2022-05-04 08:50:22, Seth Forshee wrote:
>> On Wed, May 04, 2022 at 03:07:53PM +0200, Petr Mladek wrote:

>> > If "no" then I do not understand why TIF_NOTIFY_SIGNAL interrupts
>> > the syscall on the real hardware and not in kvm.
>> 
>> It does interrupt, but xfer_to_guest_mode_handle_work() concludes it's
>> not necessary to return to userspace and resumes guest execution.
>
> In this case, we should revert the commit 8df1947c71ee53c7e21
> ("livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL
> infrastructure"). The flag TIF_NOTIFY_SIGNAL clearly does not guarantee
> restarting the syscall or exiting to the user space with -EINTR.
>
> It should solve this problem. And it looks like a cleaner solution
> to me.

Why not just?

diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 9d09f489b60e..cbb192aec13a 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -8,13 +8,7 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
        do {
                int ret;
 
-               if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
-                       clear_notify_signal();
-                       if (task_work_pending(current))
-                               task_work_run();
-               }
-
-               if (ti_work & _TIF_SIGPENDING) {
+               if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
                        kvm_handle_signal_exit(vcpu);
                        return -EINTR;
                }

As far as I can tell the only reason _TIF_NOTIFY_SIGNAL was handled any
differently than _TIF_SIGPENDING in xfer_to_guest_mode_work is because
of historical confusion.

Eric

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 14:44       ` Seth Forshee
@ 2022-05-04 14:57         ` Petr Mladek
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Mladek @ 2022-05-04 14:57 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Paolo Bonzini, Eric W. Biederman,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Wed 2022-05-04 09:44:07, Seth Forshee wrote:
> On Wed, May 04, 2022 at 04:28:09PM +0200, Petr Mladek wrote:
> > On Wed 2022-05-04 08:50:22, Seth Forshee wrote:
> > > On Wed, May 04, 2022 at 03:07:53PM +0200, Petr Mladek wrote:
> > > > On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> > > > > A task can be livepatched only when it is sleeping or it exits to
> > > > > userspace. This may happen infrequently for a heavily loaded vCPU task,
> > > > > leading to livepatch transition failures.
> > > > 
> > > > The problem was solved by sending a fake signal, see the commit
> > > > 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> > > > It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> > > > and woke the task. It interrupted the syscall and the task was
> > > > transitioned when leaving to the userspace.
> > > > 
> > > > signal_wake_up() was later replaced by set_notify_signal(),
> > > > see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> > > > the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> > > > The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> > > > instead of TIF_SIGPENDING.
> > > > 
> > > > The effect is the same when running on a real hardware. The syscall
> > > > gets interrupted and exit_to_user_mode_loop() is called where
> > > > the livepatch state is updated (task migrated).
> > > > 
> > > > But it works a different way in kvm where the task works are
> > > > called in the guest mode and the task does not return into
> > > > the user space in the host mode.
> > > 
> > > > > --- a/kernel/entry/kvm.c
> > > > > +++ b/kernel/entry/kvm.c
> > > > > @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > > > >  				task_work_run();
> > > > >  		}
> > > > >  
> > > > > -		if (ti_work & _TIF_SIGPENDING) {
> > > > > +		/*
> > > > > +		 * When a livepatch is pending, force an exit to userspace
> > > > > +		 * as though a signal is pending to allow the task to be
> > > > > +		 * patched.
> > > > > +		 */
> > > > > +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> > > > >  			kvm_handle_signal_exit(vcpu);
> > 
> > Another problem. Is it safe to call kvm_handle_signal_exit(vcpu)
> > for kthreads?
> > 
> > kthreads have _TIF_PATCH_PENDING when they need the livepatch transition.
> > But kthreads never leave kernel so we do not send the fake signal
> > signals to them.
> 
> xfer_to_guest_mode_handle_work() should only be getting called on user
> threads running ioctl(KVM_RUN).

Great!

> > In this case, we should revert the commit 8df1947c71ee53c7e21
> > ("livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL
> > infrastructure"). The flag TIF_NOTIFY_SIGNAL clearly does not guarantee
> > restarting the syscall or exiting to the user space with -EINTR.
> > 
> > It should solve this problem. And it looks like a cleaner solution
> > to me.
> 
> It looks like that should fix the issue. I'll test to confirm.

Even better solution would be what Eric suggested, see
https://lore.kernel.org/r/87r159fkmp.fsf@email.froward.int.ebiederm.org

But we need to make sure that the syscall really gets restarted
when the livepatch state is updated.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
                   ` (3 preceding siblings ...)
  2022-05-04 13:07 ` Petr Mladek
@ 2022-05-04 15:01 ` kernel test robot
  4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2022-05-04 15:01 UTC (permalink / raw)
  To: Seth Forshee, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski
  Cc: llvm, kbuild-all, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Paolo Bonzini, Sean Christopherson, linux-kernel,
	live-patching, kvm

Hi Seth,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.18-rc5]
[also build test ERROR on next-20220504]
[cannot apply to tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Seth-Forshee/entry-kvm-Make-vCPU-tasks-exit-to-userspace-when-a-livepatch-is-pending/20220504-015159
base:    672c0c5173427e6b3e2a9bbb7be51ceeec78093a
config: arm64-randconfig-r003-20220501 (https://download.01.org/0day-ci/archive/20220504/202205042204.CiMJBtiY-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 363b3a645a1e30011cc8da624f13dac5fd915628)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/1c97c02f02b9f8e6b8e1f11657f950510f9c828e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Seth-Forshee/entry-kvm-Make-vCPU-tasks-exit-to-userspace-when-a-livepatch-is-pending/20220504-015159
        git checkout 1c97c02f02b9f8e6b8e1f11657f950510f9c828e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/arm64/kvm/arm.c:9:
>> include/linux/entry-kvm.h:80:22: error: use of undeclared identifier '_TIF_PATCH_PENDING'
           return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
                               ^
   include/linux/entry-kvm.h:20:41: note: expanded from macro 'XFER_TO_GUEST_MODE_WORK'
           (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
                                                  ^
   In file included from arch/arm64/kvm/arm.c:17:
   include/linux/mman.h:158:9: warning: division by zero is undefined [-Wdivision-by-zero]
                  _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mman.h:136:21: note: expanded from macro '_calc_vm_trans'
      : ((x) & (bit1)) / ((bit1) / (bit2))))
                       ^ ~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.
--
   In file included from kernel/entry/kvm.c:3:
>> include/linux/entry-kvm.h:80:22: error: use of undeclared identifier '_TIF_PATCH_PENDING'
           return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
                               ^
   include/linux/entry-kvm.h:20:41: note: expanded from macro 'XFER_TO_GUEST_MODE_WORK'
           (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
                                                  ^
>> kernel/entry/kvm.c:22:36: error: use of undeclared identifier '_TIF_PATCH_PENDING'
                   if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
                                                    ^
   kernel/entry/kvm.c:38:21: error: use of undeclared identifier '_TIF_PATCH_PENDING'
           } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
                              ^
   include/linux/entry-kvm.h:20:41: note: expanded from macro 'XFER_TO_GUEST_MODE_WORK'
           (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
                                                  ^
   kernel/entry/kvm.c:55:18: error: use of undeclared identifier '_TIF_PATCH_PENDING'
           if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
                           ^
   include/linux/entry-kvm.h:20:41: note: expanded from macro 'XFER_TO_GUEST_MODE_WORK'
           (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |     \
                                                  ^
   4 errors generated.


vim +/_TIF_PATCH_PENDING +80 include/linux/entry-kvm.h

4ae7dc97f726ea Frederic Weisbecker 2021-02-01  67  
935ace2fb5cc49 Thomas Gleixner     2020-07-22  68  /**
935ace2fb5cc49 Thomas Gleixner     2020-07-22  69   * __xfer_to_guest_mode_work_pending - Check if work is pending
935ace2fb5cc49 Thomas Gleixner     2020-07-22  70   *
935ace2fb5cc49 Thomas Gleixner     2020-07-22  71   * Returns: True if work pending, False otherwise.
935ace2fb5cc49 Thomas Gleixner     2020-07-22  72   *
935ace2fb5cc49 Thomas Gleixner     2020-07-22  73   * Bare variant of xfer_to_guest_mode_work_pending(). Can be called from
935ace2fb5cc49 Thomas Gleixner     2020-07-22  74   * interrupt enabled code for racy quick checks with care.
935ace2fb5cc49 Thomas Gleixner     2020-07-22  75   */
935ace2fb5cc49 Thomas Gleixner     2020-07-22  76  static inline bool __xfer_to_guest_mode_work_pending(void)
935ace2fb5cc49 Thomas Gleixner     2020-07-22  77  {
6ce895128b3bff Mark Rutland        2021-11-29  78  	unsigned long ti_work = read_thread_flags();
935ace2fb5cc49 Thomas Gleixner     2020-07-22  79  
935ace2fb5cc49 Thomas Gleixner     2020-07-22 @80  	return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
935ace2fb5cc49 Thomas Gleixner     2020-07-22  81  }
935ace2fb5cc49 Thomas Gleixner     2020-07-22  82  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 14:16   ` Eric W. Biederman
@ 2022-05-04 15:12     ` Petr Mladek
  2022-05-04 17:37       ` Seth Forshee
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Mladek @ 2022-05-04 15:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Seth Forshee, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Paolo Bonzini,
	Jens Axboe, Sean Christopherson, linux-kernel, live-patching,
	kvm

On Wed 2022-05-04 09:16:46, Eric W. Biederman wrote:
> Petr Mladek <pmladek@suse.com> writes:
> 
> > On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> >> A task can be livepatched only when it is sleeping or it exits to
> >> userspace. This may happen infrequently for a heavily loaded vCPU task,
> >> leading to livepatch transition failures.
> >
> > This is misleading.
> >
> > First, the problem is not a loaded CPU. The problem is that the
> > task might spend very long time in the kernel when handling
> > some syscall.
> >
> > Second, there is no timeout for the transition in the kernel code.
> > It might take very long time but it will not fail.
> >
> >> Fake signals will be sent to tasks which fail patching via stack
> >> checking. This will cause running vCPU tasks to exit guest mode, but
> >> since no signal is pending they return to guest execution without
> >> exiting to userspace. Fix this by treating a pending livepatch migration
> >> like a pending signal, exiting to userspace with EINTR. This allows the
> >> task to be patched, and userspace should re-excecute KVM_RUN to resume
> >> guest execution.
> >
> > It seems that the patch works as expected but it is far from clear.
> > And the above description helps only partially. Let me try to
> > explain it for dummies like me ;-)
> >
> > <explanation>
> > The problem was solved by sending a fake signal, see the commit
> > 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> > It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> > and woke the task. It interrupted the syscall and the task was
> > transitioned when leaving to the userspace.
> >
> > signal_wake_up() was later replaced by set_notify_signal(),
> > see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> > the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> > The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> > instead of TIF_SIGPENDING.
> >
> > The effect is the same when running on a real hardware. The syscall
> > gets interrupted and exit_to_user_mode_loop() is called where
> > the livepatch state is updated (task migrated).
> >
> > But it works a different way in kvm where the task works are
> > called in the guest mode and the task does not return into
> > the user space in the host mode.
> > </explanation>
> >
> > The solution provided by this patch is a bit weird, see below.
> >
> >
> >> In my testing, systems where livepatching would timeout after 60 seconds
> >> were able to load livepatches within a couple of seconds with this
> >> change.
> >> 
> >> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> >> ---
> >> Changes in v2:
> >>  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
> >>  - Reworded commit message and comments to avoid confusion around the
> >>    term "migrate"
> >> 
> >>  include/linux/entry-kvm.h | 4 ++--
> >>  kernel/entry/kvm.c        | 7 ++++++-
> >>  2 files changed, 8 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
> >> index 6813171afccb..bf79e4cbb5a2 100644
> >> --- a/include/linux/entry-kvm.h
> >> +++ b/include/linux/entry-kvm.h
> >> @@ -17,8 +17,8 @@
> >>  #endif
> >>  
> >>  #define XFER_TO_GUEST_MODE_WORK						\
> >> -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
> >> -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> >> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> >> +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> >>  
> >>  struct kvm_vcpu;
> >>  
> >> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> >> index 9d09f489b60e..98439dfaa1a0 100644
> >> --- a/kernel/entry/kvm.c
> >> +++ b/kernel/entry/kvm.c
> >> @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> >>  				task_work_run();
> >>  		}
> >>  
> >> -		if (ti_work & _TIF_SIGPENDING) {
> >> +		/*
> >> +		 * When a livepatch is pending, force an exit to userspace
> >> +		 * as though a signal is pending to allow the task to be
> >> +		 * patched.
> >> +		 */
> >> +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> >>  			kvm_handle_signal_exit(vcpu);
> >>  			return -EINTR;
> >>  		}
> >
> > Anyway, we either should make sure that TIF_NOTIFY_SIGNAL has the same
> > effect on the real hardware and in kvm. Or we need another interface
> > for the fake signal used by livepatching.
> 
> The point of TIF_NOTIFY_SIGNAL is to break out of interruptible kernel
> loops.  Once out of the interruptible kernel loop the expectation is the
> returns to user space and on it's way runs the exit_to_user_mode_loop or
> is architecture specific equivalent.

That would make sense. Thanks for explanation.

> Reading through the history of kernel/entry/kvm.c I believe
> I made ``conservative'' changes that has not helped this situation.
> 
> Long story short at one point it was thought that _TIF_SIGPENDING
> and _TIF_NOTIFY_SIGNAL could be separated and they could not.
> Unfortunately the work to separate their handling has not been
> completely undone.
> 
> In this case it appears that the only reason xfer_to_guest_mode_work
> touches task_work_run is because of the separation work done by Jens
> Axboe.  I don't see any kvm specific reason for _TIF_NOTIFY_SIGNAL
> and _TIF_SIGPENDING to be treated differently.  Meanwhile my cleanups
> elsewhere have made the unnecessary _TIF_NOTIFY_SIGNAL special case
> bigger in xfer_to_guest_mode_work.
> 
> I suspect the first step in fixing things really should be just handling
> _TIF_SIGPENDING and _TIF_NOTIFY_SIGNAL the same.
> 
> static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> {
> 	do {
> 		int ret;
> 
> 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
> 			kvm_handle_signal_exit(vcpu);
> 			return -EINTR;
> 		}

This has the advantage that we will exit only when _TIF_NOTIFY_SIGNAL
is explicitly set by the livepatch code. It happens when
_TIF_PATCH_PENDING is not handled for few seconds.

_TIF_PATCH_PENDING is cleared when the task is transitioned.
It might happen when it is not running and there is no livepatched
function on the stack.


> 		if (ti_work & _TIF_NEED_RESCHED)
> 			schedule();
> 
> 		if (ti_work & _TIF_NOTIFY_RESUME)
> 			resume_user_mode_work(NULL);
> 
> 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> 		if (ret)
> 			return ret;
> 
> 		ti_work = read_thread_flags();
> 	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> 	return 0;
> }
> 
> That said I do expect adding support for the live patching into
> xfer_to_guest_mode_work, like there is in exit_to_user_mode_loop, is
> probably a good idea.  That should prevent the live patching code from
> needing to set TIF_NOTIFY_SIGNAL.
> 
> Something like:
> 
> Thomas Gleixner's patch to make _TIF_PATCH_PENDING always available.
> 
> #define XFER_TO_GUEST_MODE_WORK						\
> 	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> 	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> 
> 
> static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> {
> 	do {
> 		int ret;
> 
> 		if (ti_work & _TIF_PATCH_PENDING)
> 			klp_update_patch_state(current);

We need to make sure that the syscall really gets restarted.
My understanding is that it will happen only when this function
returns a non-zero value.

We might need to do:

		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL | _TIF_PATCH_PENDING)) {
			kvm_handle_signal_exit(vcpu);
			return -EINTR;
		}

But it might be better to do not check _TIF_PATCH_PENDING here at all.
It will allow to apply the livepatch without restarting the syscall.
The syscall will get restarted only by the fake signal when the
transition is blocked for too long.

> 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
> 			kvm_handle_signal_exit(vcpu);
> 			return -EINTR;
> 		}
> 
> 		if (ti_work & _TIF_NEED_RESCHED)
> 			schedule();
> 
> 		if (ti_work & _TIF_NOTIFY_RESUME)
> 			resume_user_mode_work(NULL);
> 
> 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> 		if (ret)
> 			return ret;
> 
> 		ti_work = read_thread_flags();
> 	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> 	return 0;
> }
> 
> If the kvm and the live patching folks could check my thinking that
> would be great.

Yeah, I am not sure about the kvm behavior either.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending
  2022-05-04 15:12     ` Petr Mladek
@ 2022-05-04 17:37       ` Seth Forshee
  0 siblings, 0 replies; 14+ messages in thread
From: Seth Forshee @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Eric W. Biederman, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Paolo Bonzini, Jens Axboe, Sean Christopherson, linux-kernel,
	live-patching, kvm

On Wed, May 04, 2022 at 05:12:52PM +0200, Petr Mladek wrote:
> On Wed 2022-05-04 09:16:46, Eric W. Biederman wrote:
> > Petr Mladek <pmladek@suse.com> writes:
> > 
> > > On Tue 2022-05-03 12:49:34, Seth Forshee wrote:
> > >> A task can be livepatched only when it is sleeping or it exits to
> > >> userspace. This may happen infrequently for a heavily loaded vCPU task,
> > >> leading to livepatch transition failures.
> > >
> > > This is misleading.
> > >
> > > First, the problem is not a loaded CPU. The problem is that the
> > > task might spend very long time in the kernel when handling
> > > some syscall.
> > >
> > > Second, there is no timeout for the transition in the kernel code.
> > > It might take very long time but it will not fail.
> > >
> > >> Fake signals will be sent to tasks which fail patching via stack
> > >> checking. This will cause running vCPU tasks to exit guest mode, but
> > >> since no signal is pending they return to guest execution without
> > >> exiting to userspace. Fix this by treating a pending livepatch migration
> > >> like a pending signal, exiting to userspace with EINTR. This allows the
> > >> task to be patched, and userspace should re-excecute KVM_RUN to resume
> > >> guest execution.
> > >
> > > It seems that the patch works as expected but it is far from clear.
> > > And the above description helps only partially. Let me try to
> > > explain it for dummies like me ;-)
> > >
> > > <explanation>
> > > The problem was solved by sending a fake signal, see the commit
> > > 0b3d52790e1cfd6b80b826 ("livepatch: Remove signal sysfs attribute").
> > > It was achieved by calling signal_wake_up(). It set TIF_SIGPENDING
> > > and woke the task. It interrupted the syscall and the task was
> > > transitioned when leaving to the userspace.
> > >
> > > signal_wake_up() was later replaced by set_notify_signal(),
> > > see the commit 8df1947c71ee53c7e21 ("livepatch: Replace
> > > the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure").
> > > The difference is that set_notify_signal() uses TIF_NOTIFY_SIGNAL
> > > instead of TIF_SIGPENDING.
> > >
> > > The effect is the same when running on a real hardware. The syscall
> > > gets interrupted and exit_to_user_mode_loop() is called where
> > > the livepatch state is updated (task migrated).
> > >
> > > But it works a different way in kvm where the task works are
> > > called in the guest mode and the task does not return into
> > > the user space in the host mode.
> > > </explanation>
> > >
> > > The solution provided by this patch is a bit weird, see below.
> > >
> > >
> > >> In my testing, systems where livepatching would timeout after 60 seconds
> > >> were able to load livepatches within a couple of seconds with this
> > >> change.
> > >> 
> > >> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> > >> ---
> > >> Changes in v2:
> > >>  - Added _TIF_SIGPENDING to XFER_TO_GUEST_MODE_WORK
> > >>  - Reworded commit message and comments to avoid confusion around the
> > >>    term "migrate"
> > >> 
> > >>  include/linux/entry-kvm.h | 4 ++--
> > >>  kernel/entry/kvm.c        | 7 ++++++-
> > >>  2 files changed, 8 insertions(+), 3 deletions(-)
> > >> 
> > >> diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
> > >> index 6813171afccb..bf79e4cbb5a2 100644
> > >> --- a/include/linux/entry-kvm.h
> > >> +++ b/include/linux/entry-kvm.h
> > >> @@ -17,8 +17,8 @@
> > >>  #endif
> > >>  
> > >>  #define XFER_TO_GUEST_MODE_WORK						\
> > >> -	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
> > >> -	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> > >> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> > >> +	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> > >>  
> > >>  struct kvm_vcpu;
> > >>  
> > >> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> > >> index 9d09f489b60e..98439dfaa1a0 100644
> > >> --- a/kernel/entry/kvm.c
> > >> +++ b/kernel/entry/kvm.c
> > >> @@ -14,7 +14,12 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > >>  				task_work_run();
> > >>  		}
> > >>  
> > >> -		if (ti_work & _TIF_SIGPENDING) {
> > >> +		/*
> > >> +		 * When a livepatch is pending, force an exit to userspace
> > >> +		 * as though a signal is pending to allow the task to be
> > >> +		 * patched.
> > >> +		 */
> > >> +		if (ti_work & (_TIF_SIGPENDING | _TIF_PATCH_PENDING)) {
> > >>  			kvm_handle_signal_exit(vcpu);
> > >>  			return -EINTR;
> > >>  		}
> > >
> > > Anyway, we either should make sure that TIF_NOTIFY_SIGNAL has the same
> > > effect on the real hardware and in kvm. Or we need another interface
> > > for the fake signal used by livepatching.
> > 
> > The point of TIF_NOTIFY_SIGNAL is to break out of interruptible kernel
> > loops.  Once out of the interruptible kernel loop the expectation is the
> > returns to user space and on it's way runs the exit_to_user_mode_loop or
> > is architecture specific equivalent.
> 
> That would make sense. Thanks for explanation.
> 
> > Reading through the history of kernel/entry/kvm.c I believe
> > I made ``conservative'' changes that has not helped this situation.
> > 
> > Long story short at one point it was thought that _TIF_SIGPENDING
> > and _TIF_NOTIFY_SIGNAL could be separated and they could not.
> > Unfortunately the work to separate their handling has not been
> > completely undone.
> > 
> > In this case it appears that the only reason xfer_to_guest_mode_work
> > touches task_work_run is because of the separation work done by Jens
> > Axboe.  I don't see any kvm specific reason for _TIF_NOTIFY_SIGNAL
> > and _TIF_SIGPENDING to be treated differently.  Meanwhile my cleanups
> > elsewhere have made the unnecessary _TIF_NOTIFY_SIGNAL special case
> > bigger in xfer_to_guest_mode_work.
> > 
> > I suspect the first step in fixing things really should be just handling
> > _TIF_SIGPENDING and _TIF_NOTIFY_SIGNAL the same.
> > 
> > static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > {
> > 	do {
> > 		int ret;
> > 
> > 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
> > 			kvm_handle_signal_exit(vcpu);
> > 			return -EINTR;
> > 		}
> 
> This has the advantage that we will exit only when _TIF_NOTIFY_SIGNAL
> is explicitly set by the livepatch code. It happens when
> _TIF_PATCH_PENDING is not handled for few seconds.

I agree. This maps better to the intended behavior of only interrupting
tasks which fail to transition after a period of time.

> _TIF_PATCH_PENDING is cleared when the task is transitioned.
> It might happen when it is not running and there is no livepatched
> function on the stack.
> 
> 
> > 		if (ti_work & _TIF_NEED_RESCHED)
> > 			schedule();
> > 
> > 		if (ti_work & _TIF_NOTIFY_RESUME)
> > 			resume_user_mode_work(NULL);
> > 
> > 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> > 		if (ret)
> > 			return ret;
> > 
> > 		ti_work = read_thread_flags();
> > 	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> > 	return 0;
> > }
> > 
> > That said I do expect adding support for the live patching into
> > xfer_to_guest_mode_work, like there is in exit_to_user_mode_loop, is
> > probably a good idea.  That should prevent the live patching code from
> > needing to set TIF_NOTIFY_SIGNAL.
> > 
> > Something like:
> > 
> > Thomas Gleixner's patch to make _TIF_PATCH_PENDING always available.
> > 
> > #define XFER_TO_GUEST_MODE_WORK						\
> > 	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_PATCH_PENDING |	\
> > 	 _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> > 
> > 
> > static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> > {
> > 	do {
> > 		int ret;
> > 
> > 		if (ti_work & _TIF_PATCH_PENDING)
> > 			klp_update_patch_state(current);
> 
> We need to make sure that the syscall really gets restarted.
> My understanding is that it will happen only when this function
> returns a non-zero value.
> 
> We might need to do:
> 
> 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL | _TIF_PATCH_PENDING)) {
> 			kvm_handle_signal_exit(vcpu);
> 			return -EINTR;
> 		}
> 
> But it might be better to do not check _TIF_PATCH_PENDING here at all.
> It will allow to apply the livepatch without restarting the syscall.
> The syscall will get restarted only by the fake signal when the
> transition is blocked for too long.

Yes, if we need to force the syscall to be restarted either way then I
don't see much of a point in preemptively calling
klp_update_patch_state(). It will be handled (indirectly) by
exit_to_user_mode_loop().

All we should need is to handle _TIF_NOTIFY_SIGNAL the same as
_TIF_SIGPENDING, then as you say vCPU tasks will only be interrupted and
forced to restart the syscall when the transition stalls for too long.
I'll send a patch for this shortly.

Thanks,
Seth

> 
> > 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) {
> > 			kvm_handle_signal_exit(vcpu);
> > 			return -EINTR;
> > 		}
> > 
> > 		if (ti_work & _TIF_NEED_RESCHED)
> > 			schedule();
> > 
> > 		if (ti_work & _TIF_NOTIFY_RESUME)
> > 			resume_user_mode_work(NULL);
> > 
> > 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> > 		if (ret)
> > 			return ret;
> > 
> > 		ti_work = read_thread_flags();
> > 	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> > 	return 0;
> > }
> > 
> > If the kvm and the live patching folks could check my thinking that
> > would be great.
> 
> Yeah, I am not sure about the kvm behavior either.
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-04 18:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03 17:49 [PATCH v2] entry/kvm: Make vCPU tasks exit to userspace when a livepatch is pending Seth Forshee
2022-05-03 17:53 ` Seth Forshee
2022-05-04  1:08 ` kernel test robot
2022-05-04 12:44 ` Thomas Gleixner
2022-05-04 13:07 ` Petr Mladek
2022-05-04 13:50   ` Seth Forshee
2022-05-04 14:28     ` Petr Mladek
2022-05-04 14:44       ` Seth Forshee
2022-05-04 14:57         ` Petr Mladek
2022-05-04 14:53       ` Eric W. Biederman
2022-05-04 14:16   ` Eric W. Biederman
2022-05-04 15:12     ` Petr Mladek
2022-05-04 17:37       ` Seth Forshee
2022-05-04 15:01 ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.