linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/5] posix-cpu-timers: Move expiry into task work context
@ 2019-08-01 14:32 Thomas Gleixner
  2019-08-01 14:32 ` [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, Oleg Nesterov, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

Running posix cpu timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take sighand
   lock, which is a 'sleeping spinlock' in RT

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

There is no real hard requirement to run the expiry code in hard interrupt
context. The posix CPU timers are an approximation anyway, so having them
expired and evaluated in task work context does not really make them worse.

That unearthed the fact that KVM is missing to handle task work before
entering a VM which is delaying pending task work until the vCPU thread
goes all the way back to user space qemu.

The series implements the necessary handling for x86/KVM and switches the
posix cpu timer expiry into task work for X86. The posix timer modification
is conditional on a selectable config switch as this requires that
task work is handled in KVM.

The available tests pass and no problematic difference has been observed.

Thanks,

	tglx

8<--------------------
 arch/x86/kvm/x86.c             |    8 ++++-
 arch/x86/Kconfig               |    1 
 include/linux/sched.h          |    3 ++
 include/linux/tracehook.h      |   15 ++++++++++
 kernel/task_work.c             |   19 ++++++++++++
 kernel/time/Kconfig            |    5 +++
 kernel/time/posix-cpu-timers.c |   61 ++++++++++++++++++++++++++++++-----------
 7 files changed, 95 insertions(+), 17 deletions(-)




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM
  2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
@ 2019-08-01 14:32 ` Thomas Gleixner
  2019-08-01 14:48   ` Peter Zijlstra
  2019-08-01 14:32 ` [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, Oleg Nesterov, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

TIF_NOTITY_RESUME is evaluated on return to user space along with other TIF
flags.

>From the kernels point of view a VMENTER is more or less equivalent to
return to user space which means that at least a subset of TIF flags needs
to be evaluated and handled.

Currently KVM handles only TIF_SIGPENDING and TIF_NEED_RESCHED, but
TIF_NOTIFY_RESUME is ignored. So pending task_work etc, is completely
ignored until the vCPU thread actually goes all the way back into
userspace/qemu.

Provide notify_resume_pending() and tracehook_handle_notify_resume() so
this can be handled similar to SIGPENDING.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/tracehook.h |   15 +++++++++++++++
 kernel/task_work.c        |   19 +++++++++++++++++++
 2 files changed, 34 insertions(+)

--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -163,6 +163,21 @@ static inline void set_notify_resume(str
 #endif
 }
 
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+/**
+ * notify_resume_pending - Check whether current has TIF_NOTIFY_RESUME set
+ */
+static inline bool notify_resume_pending(void)
+{
+	return test_thread_flag(TIF_NOTIFY_RESUME);
+}
+
+void tracehook_handle_notify_resume(void);
+#else
+static inline bool notify_resume_pending(void) { return false; }
+static inline void tracehook_handle_notify_resume(void) { }
+#endif
+
 /**
  * tracehook_notify_resume - report when about to return to user mode
  * @regs:		user-mode registers of @current task
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -116,3 +116,22 @@ void task_work_run(void)
 		} while (work);
 	}
 }
+
+#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
+/**
+ * tracehook_handle_notify_resume - Notify resume handling for virt
+ *
+ * Called with interrupts and preemption enabled from VMENTER/EXIT.
+ */
+void tracehook_handle_notify_resume(void)
+{
+	local_irq_disable();
+	while (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) {
+		local_irq_enable();
+		tracehook_notify_resume(NULL);
+		local_irq_disable();
+	}
+	local_irq_enable();
+}
+EXPORT_SYMBOL_GPL(tracehook_handle_notify_resume);
+#endif



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2019-08-01 14:32 ` [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM Thomas Gleixner
@ 2019-08-01 14:32 ` Thomas Gleixner
  2019-08-01 16:24   ` Oleg Nesterov
  2019-08-01 14:32 ` [patch 3/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, kvm, Radim Krcmar,
	Paolo Bonzini, Oleg Nesterov, John Stultz, Andy Lutomirski,
	Paul E. McKenney

TIF_NOTITY_RESUME is evaluated on return to user space along with other TIF
flags.

>From the kernels point of view a VMENTER is more or less equivalent to
return to user space which means that at least a subset of TIF flags needs
to be evaluated and handled.

Currently KVM handles only TIF_SIGPENDING and TIF_NEED_RESCHED, but
TIF_NOTIFY_RESUME is ignored. So pending task_work etc, is completely
ignored until the vCPU thread actually goes all the way back into
userspace/qemu.

Use the newly provided notify_resume_pending() and
tracehook_handle_notify_resume() to solve this similar to the existing
handling of SIGPENDING.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -52,6 +52,7 @@
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
 #include <linux/sched/isolation.h>
+#include <linux/tracehook.h>
 #include <linux/mem_encrypt.h>
 
 #include <trace/events/kvm.h>
@@ -7972,7 +7973,8 @@ static int vcpu_enter_guest(struct kvm_v
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
 	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
-	    || need_resched() || signal_pending(current)) {
+	    || need_resched() || signal_pending(current)
+	    || notify_resume_pending()) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
 			++vcpu->stat.signal_exits;
 			break;
 		}
+
+		if (notify_resume_pending())
+			tracehook_handle_notify_resume();
+
 		if (need_resched()) {
 			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
 			cond_resched();



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 3/5] posix-cpu-timers: Split run_posix_cpu_timers()
  2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2019-08-01 14:32 ` [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM Thomas Gleixner
  2019-08-01 14:32 ` [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT Thomas Gleixner
@ 2019-08-01 14:32 ` Thomas Gleixner
  2019-08-01 14:32 ` [patch 4/5] posix-cpu-timers: Defer timer handling to task_work Thomas Gleixner
  2019-08-01 14:32 ` [patch 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
  4 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, Oleg Nesterov, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

Split run_posix_cpu_timers() into two pieces, the hard interrupt context
part and the actual timer evaluation/expiry part.

The hard interrupt context part contains only the lockless fast path check
and for now calls the expiry part as before.

No functional change. Preparatory change to move the expiry part into task
context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>

---
 kernel/time/posix-cpu-timers.c |   37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1127,25 +1127,11 @@ static inline int fastpath_timer_check(s
 	return 0;
 }
 
-/*
- * This is called from the timer interrupt handler.  The irq handler has
- * already updated our counts.  We need to check if any timers fire now.
- * Interrupts are disabled.
- */
-void run_posix_cpu_timers(struct task_struct *tsk)
+static void __run_posix_cpu_timers(struct task_struct *tsk)
 {
-	LIST_HEAD(firing);
 	struct k_itimer *timer, *next;
 	unsigned long flags;
-
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * The fast path checks that there are no expired thread or thread
-	 * group timers.  If that's so, just return.
-	 */
-	if (!fastpath_timer_check(tsk))
-		return;
+	LIST_HEAD(firing);
 
 	if (!lock_task_sighand(tsk, &flags))
 		return;
@@ -1193,6 +1179,25 @@ void run_posix_cpu_timers(struct task_st
 }
 
 /*
+ * This is called from the timer interrupt handler.  The irq handler has
+ * already updated our counts.  We need to check if any timers fire now.
+ * Interrupts are disabled.
+ */
+void run_posix_cpu_timers(struct task_struct *tsk)
+{
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The fast path checks that there are no expired thread or thread
+	 * group timers.  If that's so, just return.
+	 */
+	if (!fastpath_timer_check(tsk))
+		return;
+
+	__run_posix_cpu_timers(tsk);
+}
+
+/*
  * Set one of the process-wide special case CPU timers or RLIMIT_CPU.
  * The tsk->sighand->siglock must be held by the caller.
  */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 4/5] posix-cpu-timers: Defer timer handling to task_work
  2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-08-01 14:32 ` [patch 3/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
@ 2019-08-01 14:32 ` Thomas Gleixner
  2019-08-01 14:51   ` Peter Zijlstra
  2019-08-01 15:39   ` Oleg Nesterov
  2019-08-01 14:32 ` [patch 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
  4 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, John Stultz, Andy Lutomirski,
	Paul E. McKenney, Oleg Nesterov, kvm, Radim Krcmar,
	Paolo Bonzini

Running posix cpu timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take sighand
   lock, which is a 'sleeping spinlock' in RT

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

There is no real hard requirement to run the expiry code in hard interrupt
context. The posix CPU timers are an approximation anyway, so having them
expired and evaluated in task work context does not really make them worse.

Make it conditional on a selectable config switch as this requires that
task work is handled in KVM.

The available tests pass and no problematic difference has been observed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
---
 include/linux/sched.h          |    3 +++
 kernel/time/Kconfig            |    5 +++++
 kernel/time/posix-cpu-timers.c |   26 +++++++++++++++++++++++++-
 3 files changed, 33 insertions(+), 1 deletion(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -880,6 +880,9 @@ struct task_struct {
 	struct task_cputime		cputime_expires;
 	struct list_head		cpu_timers[3];
 #endif
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+	struct callback_head		cpu_timer_work;
+#endif
 
 	/* Process credentials: */
 
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -52,6 +52,11 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
 config GENERIC_CMOS_UPDATE
 	bool
 
+# Select to handle posix CPU timers from task_work
+# and not from the timer interrupt context
+config POSIX_CPU_TIMERS_TASK_WORK
+	bool
+
 if GENERIC_CLOCKEVENTS
 menu "Timers subsystem"
 
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -14,6 +14,7 @@
 #include <linux/tick.h>
 #include <linux/workqueue.h>
 #include <linux/compat.h>
+#include <linux/task_work.h>
 #include <linux/sched/deadline.h>
 
 #include "posix-timers.h"
@@ -1127,7 +1128,7 @@ static inline int fastpath_timer_check(s
 	return 0;
 }
 
-static void __run_posix_cpu_timers(struct task_struct *tsk)
+static void handle_posix_cpu_timers(struct task_struct *tsk)
 {
 	struct k_itimer *timer, *next;
 	unsigned long flags;
@@ -1178,6 +1179,29 @@ static void __run_posix_cpu_timers(struc
 	}
 }
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+
+static void posix_cpu_timers_work(struct callback_head *work)
+{
+	handle_posix_cpu_timers(current);
+}
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	/* FIXME: Init it proper in fork or such */
+	init_task_work(&tsk->cpu_timer_work, posix_cpu_timers_work);
+	task_work_add(tsk, &tsk->cpu_timer_work, true);
+}
+
+#else
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	handle_posix_cpu_timers(tsk);
+}
+
+#endif
+
 /*
  * This is called from the timer interrupt handler.  The irq handler has
  * already updated our counts.  We need to check if any timers fire now.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK
  2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-08-01 14:32 ` [patch 4/5] posix-cpu-timers: Defer timer handling to task_work Thomas Gleixner
@ 2019-08-01 14:32 ` Thomas Gleixner
  4 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 14:32 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, Oleg Nesterov, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

X86 handles task work in KVM now. Enable the delegation of posix cpu timer
expiry into task work.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -213,6 +213,7 @@ config X86
 	select NEED_SG_DMA_LENGTH
 	select PCI_DOMAINS			if PCI
 	select PCI_LOCKLESS_CONFIG		if PCI
+	select POSIX_CPU_TIMERS_TASK_WORK
 	select PERF_EVENTS
 	select RTC_LIB
 	select RTC_MC146818_LIB



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM
  2019-08-01 14:32 ` [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM Thomas Gleixner
@ 2019-08-01 14:48   ` Peter Zijlstra
  2019-08-01 15:10     ` Thomas Gleixner
  2019-08-01 17:02     ` Andy Lutomirski
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-08-01 14:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Ingo Molnar, Sebastian Siewior, Anna-Maria Gleixner,
	Steven Rostedt, Julia Cartwright, Paul McKenney,
	Frederic Weisbecker, Oleg Nesterov, kvm, Radim Krcmar,
	Paolo Bonzini, John Stultz, Andy Lutomirski, Paul E. McKenney

On Thu, Aug 01, 2019 at 04:32:51PM +0200, Thomas Gleixner wrote:
> +#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
> +/**
> + * tracehook_handle_notify_resume - Notify resume handling for virt
> + *
> + * Called with interrupts and preemption enabled from VMENTER/EXIT.
> + */
> +void tracehook_handle_notify_resume(void)
> +{
> +	local_irq_disable();
> +	while (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) {
> +		local_irq_enable();
> +		tracehook_notify_resume(NULL);
> +		local_irq_disable();
> +	}
> +	local_irq_enable();

I'm confused by the IRQ state swizzling here, what is it doing?

> +}
> +EXPORT_SYMBOL_GPL(tracehook_handle_notify_resume);
> +#endif
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/5] posix-cpu-timers: Defer timer handling to task_work
  2019-08-01 14:32 ` [patch 4/5] posix-cpu-timers: Defer timer handling to task_work Thomas Gleixner
@ 2019-08-01 14:51   ` Peter Zijlstra
  2019-08-01 15:10     ` Thomas Gleixner
  2019-08-01 15:39   ` Oleg Nesterov
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-08-01 14:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Ingo Molnar, Sebastian Siewior, Anna-Maria Gleixner,
	Steven Rostedt, Julia Cartwright, Paul McKenney,
	Frederic Weisbecker, John Stultz, Andy Lutomirski,
	Paul E. McKenney, Oleg Nesterov, kvm, Radim Krcmar,
	Paolo Bonzini

On Thu, Aug 01, 2019 at 04:32:54PM +0200, Thomas Gleixner wrote:
> --- a/kernel/time/Kconfig
> +++ b/kernel/time/Kconfig
> @@ -52,6 +52,11 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
>  config GENERIC_CMOS_UPDATE
>  	bool
>  
> +# Select to handle posix CPU timers from task_work
> +# and not from the timer interrupt context
> +config POSIX_CPU_TIMERS_TASK_WORK
> +	bool
> +
>  if GENERIC_CLOCKEVENTS
>  menu "Timers subsystem"
>  


diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index deff97217496..76e37ad5bc31 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -58,6 +58,7 @@ config PREEMPT
 config PREEMPT_RT
 	bool "Fully Preemptible Kernel (Real-Time)"
 	depends on EXPERT && ARCH_SUPPORTS_RT
+	depends on POSIX_CPU_TIMERS_TASK_WORK
 	select PREEMPTION
 	help
 	  This option turns the kernel into a real-time kernel by replacing

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM
  2019-08-01 14:48   ` Peter Zijlstra
@ 2019-08-01 15:10     ` Thomas Gleixner
  2019-08-01 17:02     ` Andy Lutomirski
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Ingo Molnar, Sebastian Siewior, Anna-Maria Gleixner,
	Steven Rostedt, Julia Cartwright, Paul McKenney,
	Frederic Weisbecker, Oleg Nesterov, kvm, Radim Krcmar,
	Paolo Bonzini, John Stultz, Andy Lutomirski, Paul E. McKenney

On Thu, 1 Aug 2019, Peter Zijlstra wrote:
> On Thu, Aug 01, 2019 at 04:32:51PM +0200, Thomas Gleixner wrote:
> > +#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
> > +/**
> > + * tracehook_handle_notify_resume - Notify resume handling for virt
> > + *
> > + * Called with interrupts and preemption enabled from VMENTER/EXIT.
> > + */
> > +void tracehook_handle_notify_resume(void)
> > +{
> > +	local_irq_disable();
> > +	while (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) {
> > +		local_irq_enable();
> > +		tracehook_notify_resume(NULL);
> > +		local_irq_disable();
> > +	}
> > +	local_irq_enable();
> 
> I'm confused by the IRQ state swizzling here, what is it doing?

Hmm, right. It's not really needed. Modeled it after the user space return
code, but the KVM case is different because it evaluates the TIF bit again
before entering the VM with interrupts disabled anyway.

I'll remove the brainfart in V2.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/5] posix-cpu-timers: Defer timer handling to task_work
  2019-08-01 14:51   ` Peter Zijlstra
@ 2019-08-01 15:10     ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Ingo Molnar, Sebastian Siewior, Anna-Maria Gleixner,
	Steven Rostedt, Julia Cartwright, Paul McKenney,
	Frederic Weisbecker, John Stultz, Andy Lutomirski,
	Paul E. McKenney, Oleg Nesterov, kvm, Radim Krcmar,
	Paolo Bonzini

On Thu, 1 Aug 2019, Peter Zijlstra wrote:
> On Thu, Aug 01, 2019 at 04:32:54PM +0200, Thomas Gleixner wrote:
> > --- a/kernel/time/Kconfig
> > +++ b/kernel/time/Kconfig
> > @@ -52,6 +52,11 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
> >  config GENERIC_CMOS_UPDATE
> >  	bool
> >  
> > +# Select to handle posix CPU timers from task_work
> > +# and not from the timer interrupt context
> > +config POSIX_CPU_TIMERS_TASK_WORK
> > +	bool
> > +
> >  if GENERIC_CLOCKEVENTS
> >  menu "Timers subsystem"
> >  
> 
> 
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index deff97217496..76e37ad5bc31 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -58,6 +58,7 @@ config PREEMPT
>  config PREEMPT_RT
>  	bool "Fully Preemptible Kernel (Real-Time)"
>  	depends on EXPERT && ARCH_SUPPORTS_RT
> +	depends on POSIX_CPU_TIMERS_TASK_WORK

Indeed.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/5] posix-cpu-timers: Defer timer handling to task_work
  2019-08-01 14:32 ` [patch 4/5] posix-cpu-timers: Defer timer handling to task_work Thomas Gleixner
  2019-08-01 14:51   ` Peter Zijlstra
@ 2019-08-01 15:39   ` Oleg Nesterov
  2019-08-01 18:41     ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2019-08-01 15:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, John Stultz, Andy Lutomirski,
	Paul E. McKenney, kvm, Radim Krcmar, Paolo Bonzini

On 08/01, Thomas Gleixner wrote:
>
> +static void __run_posix_cpu_timers(struct task_struct *tsk)
> +{
> +	/* FIXME: Init it proper in fork or such */
> +	init_task_work(&tsk->cpu_timer_work, posix_cpu_timers_work);
> +	task_work_add(tsk, &tsk->cpu_timer_work, true);
> +}

What if update_process_times/run_posix_cpu_timers is called again before
this task does task_work_run() ?

somehow it should check that ->cpu_timer_work is not already queued...

Or suppose that this is called when task_work_run() executes this
cpu_timer_work. Looks like you need another flag checked by
__run_posix_cpu_timers() and cleare in posix_cpu_timers_work() ?

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 14:32 ` [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT Thomas Gleixner
@ 2019-08-01 16:24   ` Oleg Nesterov
  2019-08-01 18:34     ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2019-08-01 16:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, kvm, Radim Krcmar,
	Paolo Bonzini, John Stultz, Andy Lutomirski, Paul E. McKenney

On 08/01, Thomas Gleixner wrote:
>
> @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
>  			++vcpu->stat.signal_exits;
>  			break;
>  		}
> +
> +		if (notify_resume_pending())
> +			tracehook_handle_notify_resume();

shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?

I don't understand this code at all, but vcpu_run() does this even before
cond_resched().

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM
  2019-08-01 14:48   ` Peter Zijlstra
  2019-08-01 15:10     ` Thomas Gleixner
@ 2019-08-01 17:02     ` Andy Lutomirski
  1 sibling, 0 replies; 23+ messages in thread
From: Andy Lutomirski @ 2019-08-01 17:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, Oleg Nesterov, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney


> On Aug 1, 2019, at 7:48 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, Aug 01, 2019 at 04:32:51PM +0200, Thomas Gleixner wrote:
>> +#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>> +/**
>> + * tracehook_handle_notify_resume - Notify resume handling for virt
>> + *
>> + * Called with interrupts and preemption enabled from VMENTER/EXIT.
>> + */
>> +void tracehook_handle_notify_resume(void)
>> +{
>> +    local_irq_disable();
>> +    while (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) {
>> +        local_irq_enable();
>> +        tracehook_notify_resume(NULL);
>> +        local_irq_disable();
>> +    }
>> +    local_irq_enable();
> 
> I'm confused by the IRQ state swizzling here, what is it doing?

Me too. Also, why is a loop needed?

> 
>> +}
>> +EXPORT_SYMBOL_GPL(tracehook_handle_notify_resume);
>> +#endif
>> 
>> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 16:24   ` Oleg Nesterov
@ 2019-08-01 18:34     ` Thomas Gleixner
  2019-08-01 21:35       ` Sean Christopherson
  2019-08-02 12:04       ` Oleg Nesterov
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 18:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, kvm, Radim Krcmar,
	Paolo Bonzini, John Stultz, Andy Lutomirski, Paul E. McKenney

On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> On 08/01, Thomas Gleixner wrote:
> >
> > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> >  			++vcpu->stat.signal_exits;
> >  			break;
> >  		}
> > +
> > +		if (notify_resume_pending())
> > +			tracehook_handle_notify_resume();
> 
> shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> 
> I don't understand this code at all, but vcpu_run() does this even before
> cond_resched().

Yeah, I noticed that it's dropped around cond_resched().

My understanding is that for voluntary giving up the CPU via cond_resched()
it needs to be dropped.

For involuntary preemption (CONFIG_PREEMPT=y) it's not required as the
whole code section after preempt_enable() is fully preemptible.

Now the 1Mio$ question is whether any of the notify functions invokes
cond_resched() and whether that really matters. Paolo?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/5] posix-cpu-timers: Defer timer handling to task_work
  2019-08-01 15:39   ` Oleg Nesterov
@ 2019-08-01 18:41     ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 18:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, John Stultz, Andy Lutomirski,
	Paul E. McKenney, kvm, Radim Krcmar, Paolo Bonzini

On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> On 08/01, Thomas Gleixner wrote:
> >
> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
> > +{
> > +	/* FIXME: Init it proper in fork or such */
> > +	init_task_work(&tsk->cpu_timer_work, posix_cpu_timers_work);
> > +	task_work_add(tsk, &tsk->cpu_timer_work, true);
> > +}
> 
> What if update_process_times/run_posix_cpu_timers is called again before
> this task does task_work_run() ?
> 
> somehow it should check that ->cpu_timer_work is not already queued...

Right.

> Or suppose that this is called when task_work_run() executes this
> cpu_timer_work. Looks like you need another flag checked by
> __run_posix_cpu_timers() and cleare in posix_cpu_timers_work() ?

That's a non issue. The only thing which can happen is that it runs through
the task_work once more to figure out there is nothing to do.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 18:34     ` Thomas Gleixner
@ 2019-08-01 21:35       ` Sean Christopherson
  2019-08-01 21:44         ` Peter Zijlstra
  2019-08-01 21:44         ` Thomas Gleixner
  2019-08-02 12:04       ` Oleg Nesterov
  1 sibling, 2 replies; 23+ messages in thread
From: Sean Christopherson @ 2019-08-01 21:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oleg Nesterov, LKML, x86, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Anna-Maria Gleixner, Steven Rostedt,
	Julia Cartwright, Paul McKenney, Frederic Weisbecker, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

On Thu, Aug 01, 2019 at 08:34:53PM +0200, Thomas Gleixner wrote:
> On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> > On 08/01, Thomas Gleixner wrote:
> > >
> > > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> > >  			++vcpu->stat.signal_exits;
> > >  			break;
> > >  		}
> > > +
> > > +		if (notify_resume_pending())
> > > +			tracehook_handle_notify_resume();
> > 
> > shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> > 
> > I don't understand this code at all, but vcpu_run() does this even before
> > cond_resched().
> 
> Yeah, I noticed that it's dropped around cond_resched().
> 
> My understanding is that for voluntary giving up the CPU via cond_resched()
> it needs to be dropped.
> 
> For involuntary preemption (CONFIG_PREEMPT=y) it's not required as the
> whole code section after preempt_enable() is fully preemptible.
> 
> Now the 1Mio$ question is whether any of the notify functions invokes
> cond_resched() and whether that really matters. Paolo?

cond_resched() is called via tracehook_notify_resume()->task_work_run(),
and "kernel code can only call cond_resched() in places where it ...
cannot hold references to any RCU-protected data structures" according to
https://lwn.net/Articles/603252/.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 21:35       ` Sean Christopherson
@ 2019-08-01 21:44         ` Peter Zijlstra
  2019-08-01 21:44         ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-08-01 21:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Oleg Nesterov, LKML, x86, Ingo Molnar,
	Sebastian Siewior, Anna-Maria Gleixner, Steven Rostedt,
	Julia Cartwright, Paul McKenney, Frederic Weisbecker, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

On Thu, Aug 01, 2019 at 02:35:50PM -0700, Sean Christopherson wrote:
> On Thu, Aug 01, 2019 at 08:34:53PM +0200, Thomas Gleixner wrote:
> > On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> > > On 08/01, Thomas Gleixner wrote:
> > > >
> > > > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> > > >  			++vcpu->stat.signal_exits;
> > > >  			break;
> > > >  		}
> > > > +
> > > > +		if (notify_resume_pending())
> > > > +			tracehook_handle_notify_resume();
> > > 
> > > shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> > > 
> > > I don't understand this code at all, but vcpu_run() does this even before
> > > cond_resched().
> > 
> > Yeah, I noticed that it's dropped around cond_resched().
> > 
> > My understanding is that for voluntary giving up the CPU via cond_resched()
> > it needs to be dropped.
> > 
> > For involuntary preemption (CONFIG_PREEMPT=y) it's not required as the
> > whole code section after preempt_enable() is fully preemptible.
> > 
> > Now the 1Mio$ question is whether any of the notify functions invokes
> > cond_resched() and whether that really matters. Paolo?
> 
> cond_resched() is called via tracehook_notify_resume()->task_work_run(),
> and "kernel code can only call cond_resched() in places where it ...
> cannot hold references to any RCU-protected data structures" according to
> https://lwn.net/Articles/603252/.

This is SRCU, you can reschedule while holding that just fine. It will
just delay some kvm operations, like the memslot stuff. I don't think it
is terrible to keep it, but people more versed in KVM might know of a
good reason to drop it anyway.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 21:35       ` Sean Christopherson
  2019-08-01 21:44         ` Peter Zijlstra
@ 2019-08-01 21:44         ` Thomas Gleixner
  2019-08-01 21:47           ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 21:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oleg Nesterov, LKML, x86, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Anna-Maria Gleixner, Steven Rostedt,
	Julia Cartwright, Paul McKenney, Frederic Weisbecker, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

On Thu, 1 Aug 2019, Sean Christopherson wrote:
> On Thu, Aug 01, 2019 at 08:34:53PM +0200, Thomas Gleixner wrote:
> > On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> > > On 08/01, Thomas Gleixner wrote:
> > > >
> > > > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> > > >  			++vcpu->stat.signal_exits;
> > > >  			break;
> > > >  		}
> > > > +
> > > > +		if (notify_resume_pending())
> > > > +			tracehook_handle_notify_resume();
> > > 
> > > shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> > > 
> > > I don't understand this code at all, but vcpu_run() does this even before
> > > cond_resched().
> > 
> > Yeah, I noticed that it's dropped around cond_resched().
> > 
> > My understanding is that for voluntary giving up the CPU via cond_resched()
> > it needs to be dropped.
> > 
> > For involuntary preemption (CONFIG_PREEMPT=y) it's not required as the
> > whole code section after preempt_enable() is fully preemptible.
> > 
> > Now the 1Mio$ question is whether any of the notify functions invokes
> > cond_resched() and whether that really matters. Paolo?
> 
> cond_resched() is called via tracehook_notify_resume()->task_work_run(),
> and "kernel code can only call cond_resched() in places where it ...
> cannot hold references to any RCU-protected data structures" according to
> https://lwn.net/Articles/603252/.

Right you are.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 21:44         ` Thomas Gleixner
@ 2019-08-01 21:47           ` Thomas Gleixner
  2019-08-02 21:35             ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-01 21:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oleg Nesterov, LKML, x86, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Anna-Maria Gleixner, Steven Rostedt,
	Julia Cartwright, Paul McKenney, Frederic Weisbecker, kvm,
	Radim Krcmar, Paolo Bonzini, John Stultz, Andy Lutomirski,
	Paul E. McKenney

On Thu, 1 Aug 2019, Thomas Gleixner wrote:

> On Thu, 1 Aug 2019, Sean Christopherson wrote:
> > On Thu, Aug 01, 2019 at 08:34:53PM +0200, Thomas Gleixner wrote:
> > > On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> > > > On 08/01, Thomas Gleixner wrote:
> > > > >
> > > > > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> > > > >  			++vcpu->stat.signal_exits;
> > > > >  			break;
> > > > >  		}
> > > > > +
> > > > > +		if (notify_resume_pending())
> > > > > +			tracehook_handle_notify_resume();
> > > > 
> > > > shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> > > > 
> > > > I don't understand this code at all, but vcpu_run() does this even before
> > > > cond_resched().
> > > 
> > > Yeah, I noticed that it's dropped around cond_resched().
> > > 
> > > My understanding is that for voluntary giving up the CPU via cond_resched()
> > > it needs to be dropped.
> > > 
> > > For involuntary preemption (CONFIG_PREEMPT=y) it's not required as the
> > > whole code section after preempt_enable() is fully preemptible.
> > > 
> > > Now the 1Mio$ question is whether any of the notify functions invokes
> > > cond_resched() and whether that really matters. Paolo?
> > 
> > cond_resched() is called via tracehook_notify_resume()->task_work_run(),
> > and "kernel code can only call cond_resched() in places where it ...
> > cannot hold references to any RCU-protected data structures" according to
> > https://lwn.net/Articles/603252/.
> 
> Right you are.

Bah. Hit send too fast.

Right you are about cond_resched() being called, but for SRCU this does not
matter unless there is some way to do a synchronize operation on that SRCU
entity. It might have some other performance side effect though.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 18:34     ` Thomas Gleixner
  2019-08-01 21:35       ` Sean Christopherson
@ 2019-08-02 12:04       ` Oleg Nesterov
  1 sibling, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2019-08-02 12:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, kvm, Radim Krcmar,
	Paolo Bonzini, John Stultz, Andy Lutomirski, Paul E. McKenney

On 08/01, Thomas Gleixner wrote:
>
> On Thu, 1 Aug 2019, Oleg Nesterov wrote:
> > On 08/01, Thomas Gleixner wrote:
> > >
> > > @@ -8172,6 +8174,10 @@ static int vcpu_run(struct kvm_vcpu *vcp
> > >  			++vcpu->stat.signal_exits;
> > >  			break;
> > >  		}
> > > +
> > > +		if (notify_resume_pending())
> > > +			tracehook_handle_notify_resume();
> >
> > shouldn't you drop kvm->srcu before tracehook_handle_notify_resume() ?
> >
> > I don't understand this code at all, but vcpu_run() does this even before
> > cond_resched().
>
> Yeah, I noticed that it's dropped around cond_resched().
>
> My understanding is that for voluntary giving up the CPU via cond_resched()
> it needs to be dropped.

I am not sure it really needs, but this doesn't matter.

tracehook_handle_notify_resume() can do "anything", say it can run the
works queued by systemtap. I don't think it should delay synchronize_srcu().
And may be this is simply unsafe, even if I don't think a task_work can
ever call synchronize_srcu(kvm->srcu) directly.

Oleg.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-01 21:47           ` Thomas Gleixner
@ 2019-08-02 21:35             ` Paolo Bonzini
  2019-08-02 22:22               ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-02 21:35 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson
  Cc: Oleg Nesterov, LKML, x86, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Anna-Maria Gleixner, Steven Rostedt,
	Julia Cartwright, Paul McKenney, Frederic Weisbecker, kvm,
	Radim Krcmar, John Stultz, Andy Lutomirski, Paul E. McKenney

On 01/08/19 23:47, Thomas Gleixner wrote:
> Right you are about cond_resched() being called, but for SRCU this does not
> matter unless there is some way to do a synchronize operation on that SRCU
> entity. It might have some other performance side effect though.

I would use srcu_read_unlock/lock around the call.

However, I'm wondering if the API can be improved because basically we
have six functions for three checks of TIF flags.  Does it make sense to
have something like task_has_request_flags and task_do_requests (names
are horrible I know) that can be used like

	if (task_has_request_flags()) {
		int err;
		...srcu_read_unlock...
		// return -EINTR if signal_pending
		err = task_do_requests();
		if (err < 0)
			goto exit_no_srcu_read_unlock;
		...srcu_read_lock...
	}

taking care of all three cases with a single hook?  This is important
especially because all these checks are done by all KVM architectures in
slightly different ways, and a unified API would be a good reason to
make all architectures look the same.

(Of course I could also define this unified API in virt/kvm/kvm_main.c,
so this is not blocking the series in any way!).

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-02 21:35             ` Paolo Bonzini
@ 2019-08-02 22:22               ` Thomas Gleixner
  2019-08-02 22:39                 ` Andy Lutomirski
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2019-08-02 22:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Oleg Nesterov, LKML, x86, Peter Zijlstra,
	Ingo Molnar, Sebastian Siewior, Anna-Maria Gleixner,
	Steven Rostedt, Julia Cartwright, Paul McKenney,
	Frederic Weisbecker, kvm, Radim Krcmar, John Stultz,
	Andy Lutomirski, Paul E. McKenney

On Fri, 2 Aug 2019, Paolo Bonzini wrote:
> On 01/08/19 23:47, Thomas Gleixner wrote:
> > Right you are about cond_resched() being called, but for SRCU this does not
> > matter unless there is some way to do a synchronize operation on that SRCU
> > entity. It might have some other performance side effect though.
> 
> I would use srcu_read_unlock/lock around the call.
> 
> However, I'm wondering if the API can be improved because basically we
> have six functions for three checks of TIF flags.  Does it make sense to
> have something like task_has_request_flags and task_do_requests (names
> are horrible I know) that can be used like
> 
> 	if (task_has_request_flags()) {
> 		int err;
> 		...srcu_read_unlock...
> 		// return -EINTR if signal_pending
> 		err = task_do_requests();
> 		if (err < 0)
> 			goto exit_no_srcu_read_unlock;
> 		...srcu_read_lock...
> 	}
> 
> taking care of all three cases with a single hook?  This is important
> especially because all these checks are done by all KVM architectures in
> slightly different ways, and a unified API would be a good reason to
> make all architectures look the same.
> 
> (Of course I could also define this unified API in virt/kvm/kvm_main.c,
> so this is not blocking the series in any way!).

You're not holding up something. Having a common function for this is
definitely the right approach.

As this is virt specific because it only checks for non arch specific bits
(TIF_NOTIFY_RESUME should be available for all KVM archs) and the TIF bits
are a subset of the available TIF bits because all others do not make any
sense there, this really should be a common function for KVM so that all
other archs which obviously lack a TIF_NOTIFY_RESUME check, can be fixed up
and consolidated. If we add another TIF check later then we only have to do
it in one place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT
  2019-08-02 22:22               ` Thomas Gleixner
@ 2019-08-02 22:39                 ` Andy Lutomirski
  0 siblings, 0 replies; 23+ messages in thread
From: Andy Lutomirski @ 2019-08-02 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Sean Christopherson, Oleg Nesterov, LKML, x86,
	Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Anna-Maria Gleixner, Steven Rostedt, Julia Cartwright,
	Paul McKenney, Frederic Weisbecker, kvm, Radim Krcmar,
	John Stultz, Andy Lutomirski, Paul E. McKenney



> On Aug 2, 2019, at 3:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Fri, 2 Aug 2019, Paolo Bonzini wrote:
>>> On 01/08/19 23:47, Thomas Gleixner wrote:
>>> Right you are about cond_resched() being called, but for SRCU this does not
>>> matter unless there is some way to do a synchronize operation on that SRCU
>>> entity. It might have some other performance side effect though.
>> 
>> I would use srcu_read_unlock/lock around the call.
>> 
>> However, I'm wondering if the API can be improved because basically we
>> have six functions for three checks of TIF flags.  Does it make sense to
>> have something like task_has_request_flags and task_do_requests (names
>> are horrible I know) that can be used like
>> 
>>    if (task_has_request_flags()) {
>>        int err;
>>        ...srcu_read_unlock...
>>        // return -EINTR if signal_pending
>>        err = task_do_requests();
>>        if (err < 0)
>>            goto exit_no_srcu_read_unlock;
>>        ...srcu_read_lock...
>>    }
>> 
>> taking care of all three cases with a single hook?  This is important
>> especially because all these checks are done by all KVM architectures in
>> slightly different ways, and a unified API would be a good reason to
>> make all architectures look the same.
>> 
>> (Of course I could also define this unified API in virt/kvm/kvm_main.c,
>> so this is not blocking the series in any way!).
> 
> You're not holding up something. Having a common function for this is
> definitely the right approach.
> 
> As this is virt specific because it only checks for non arch specific bits
> (TIF_NOTIFY_RESUME should be available for all KVM archs) and the TIF bits
> are a subset of the available TIF bits because all others do not make any
> sense there, this really should be a common function for KVM so that all
> other archs which obviously lack a TIF_NOTIFY_RESUME check, can be fixed up
> and consolidated. If we add another TIF check later then we only have to do
> it in one place.
> 
> 

If we add a real API for this, can we make it, or a very similar API, work for exit_to_usermode_loop() too?  Maybe:

bool usermode_work_pending();
bool guestmode_work_pending();

void do_usermode_work();
void do_guestmode_work();

The first two are called with IRQs off.  The latter two are called with IRQs on.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-08-02 22:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 14:32 [patch 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
2019-08-01 14:32 ` [patch 1/5] tracehook: Provide TIF_NOTIFY_RESUME handling for KVM Thomas Gleixner
2019-08-01 14:48   ` Peter Zijlstra
2019-08-01 15:10     ` Thomas Gleixner
2019-08-01 17:02     ` Andy Lutomirski
2019-08-01 14:32 ` [patch 2/5] x86/kvm: Handle task_work on VMENTER/EXIT Thomas Gleixner
2019-08-01 16:24   ` Oleg Nesterov
2019-08-01 18:34     ` Thomas Gleixner
2019-08-01 21:35       ` Sean Christopherson
2019-08-01 21:44         ` Peter Zijlstra
2019-08-01 21:44         ` Thomas Gleixner
2019-08-01 21:47           ` Thomas Gleixner
2019-08-02 21:35             ` Paolo Bonzini
2019-08-02 22:22               ` Thomas Gleixner
2019-08-02 22:39                 ` Andy Lutomirski
2019-08-02 12:04       ` Oleg Nesterov
2019-08-01 14:32 ` [patch 3/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
2019-08-01 14:32 ` [patch 4/5] posix-cpu-timers: Defer timer handling to task_work Thomas Gleixner
2019-08-01 14:51   ` Peter Zijlstra
2019-08-01 15:10     ` Thomas Gleixner
2019-08-01 15:39   ` Oleg Nesterov
2019-08-01 18:41     ` Thomas Gleixner
2019-08-01 14:32 ` [patch 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).