linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible nohz-full/RCU issue in arm64 KVM
@ 2021-12-17 11:51 Nicolas Saenz Julienne
  2021-12-17 13:21 ` Mark Rutland
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Saenz Julienne @ 2021-12-17 11:51 UTC (permalink / raw)
  To: Mark Rutland, maz
  Cc: Will Deacon, paulmck, linux-arm-kernel, rcu, Thomas Gleixner,
	frederic, kvmarm, linux-kernel

Hi All,
arm64's guest entry code does the following:

int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
{
	[...]

	guest_enter_irqoff();

	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);

	[...]

	local_irq_enable();

	/*
	 * We do local_irq_enable() before calling guest_exit() so
	 * that if a timer interrupt hits while running the guest we
	 * account that tick as being spent in the guest.  We enable
	 * preemption after calling guest_exit() so that if we get
	 * preempted we make sure ticks after that is not counted as
	 * guest time.
	 */
	guest_exit();
	[...]
}


On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
state (EQS). Any interrupt happening between local_irq_enable() and
guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers do
the right thing if trggered in this context, but el1's won't. Is it possible to
hit an el1 handler (for example __el1_irq()) there?

Thanks,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 11:51 Possible nohz-full/RCU issue in arm64 KVM Nicolas Saenz Julienne
@ 2021-12-17 13:21 ` Mark Rutland
  2021-12-17 14:15   ` Nicolas Saenz Julienne
                     ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Mark Rutland @ 2021-12-17 13:21 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: maz, Will Deacon, paulmck, linux-arm-kernel, rcu,
	Thomas Gleixner, frederic, kvmarm, linux-kernel

On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> Hi All,

Hi,

> arm64's guest entry code does the following:
> 
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> {
> 	[...]
> 
> 	guest_enter_irqoff();
> 
> 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> 
> 	[...]
> 
> 	local_irq_enable();
> 
> 	/*
> 	 * We do local_irq_enable() before calling guest_exit() so
> 	 * that if a timer interrupt hits while running the guest we
> 	 * account that tick as being spent in the guest.  We enable
> 	 * preemption after calling guest_exit() so that if we get
> 	 * preempted we make sure ticks after that is not counted as
> 	 * guest time.
> 	 */
> 	guest_exit();
> 	[...]
> }
> 
> 
> On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> state (EQS). Any interrupt happening between local_irq_enable() and
> guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> do the right thing if trggered in this context, but el1's won't. Is it
> possible to hit an el1 handler (for example __el1_irq()) there?

I think you're right that the EL1 handlers can trigger here and won't exit the
EQS.

I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
taken from a guest mode? I couldn't spot any handling of that case, but I'm not
familiar enough with the x86 exception model to know if I'm looking in the
right place.

Note that the EL0 handlers *cannot* trigger for an exception taken from a
guest. We use separate vectors while running a guest (for both VHE and nVHE
modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
can ony take IRQ from EL1 *after* that returns.

We *might* need to audit the KVM vector handlers to make sure they're not
dependent on RCU protection (I assume they're not, but it's possible something
has leaked into the VHE code).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 13:21 ` Mark Rutland
@ 2021-12-17 14:15   ` Nicolas Saenz Julienne
  2021-12-17 14:38     ` Mark Rutland
  2021-12-17 14:51   ` Paolo Bonzini
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Nicolas Saenz Julienne @ 2021-12-17 14:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: maz, Will Deacon, paulmck, linux-arm-kernel, rcu,
	Thomas Gleixner, frederic, kvmarm, linux-kernel

On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

Well x86 has its own private KVM guest context exit function
'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
things):

	local_irq_disable();
	kvm_guest_enter_irqoff() // Inform CT, enter EQS
	__vmx_kvm_run()
	kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with PF_VCPU

	/*
	 * Consume any pending interrupts, including the possible source of
	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
	 * An instruction is required after local_irq_enable() to fully unblock
	 * interrupts on processors that implement an interrupt shadow, the
	 * stat.exits increment will do nicely.
	 */
	local_irq_enable();
	++vcpu->stat.exits;
	local_irq_disable();

	/*
	 * Wait until after servicing IRQs to account guest time so that any
	 * ticks that occurred while running the guest are properly accounted
	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
	 * of accounting via context tracking, but the loss of accuracy is
	 * acceptable for all known use cases.
	 */
	vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

So I guess we should convert to x86's scheme, and maybe create another generic
guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
disabled.

> Note that the EL0 handlers *cannot* trigger for an exception taken from a
> guest. We use separate vectors while running a guest (for both VHE and nVHE
> modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> can ony take IRQ from EL1 *after* that returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

IIUC in the window between local_irq_enable() and guest_exit() any driver
interrupt might trigger, isn't it?

Regards,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 14:15   ` Nicolas Saenz Julienne
@ 2021-12-17 14:38     ` Mark Rutland
  2021-12-17 15:54       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Rutland @ 2021-12-17 14:38 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: maz, Will Deacon, paulmck, linux-arm-kernel, rcu,
	Thomas Gleixner, frederic, kvmarm, linux-kernel

On Fri, Dec 17, 2021 at 03:15:29PM +0100, Nicolas Saenz Julienne wrote:
> On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > > Hi All,
> > 
> > Hi,
> > 
> > > arm64's guest entry code does the following:
> > > 
> > > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > > {
> > > 	[...]
> > > 
> > > 	guest_enter_irqoff();
> > > 
> > > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > > 
> > > 	[...]
> > > 
> > > 	local_irq_enable();
> > > 
> > > 	/*
> > > 	 * We do local_irq_enable() before calling guest_exit() so
> > > 	 * that if a timer interrupt hits while running the guest we
> > > 	 * account that tick as being spent in the guest.  We enable
> > > 	 * preemption after calling guest_exit() so that if we get
> > > 	 * preempted we make sure ticks after that is not counted as
> > > 	 * guest time.
> > > 	 */
> > > 	guest_exit();
> > > 	[...]
> > > }
> > > 
> > > 
> > > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > > state (EQS). Any interrupt happening between local_irq_enable() and
> > > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > > do the right thing if trggered in this context, but el1's won't. Is it
> > > possible to hit an el1 handler (for example __el1_irq()) there?
> > 
> > I think you're right that the EL1 handlers can trigger here and won't exit the
> > EQS.
> > 
> > I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> > taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> > familiar enough with the x86 exception model to know if I'm looking in the
> > right place.
> 
> Well x86 has its own private KVM guest context exit function
> 'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
> things):
> 
> 	local_irq_disable();
> 	kvm_guest_enter_irqoff() // Inform CT, enter EQS
> 	__vmx_kvm_run()
> 	kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with PF_VCPU
> 
> 	/*
> 	 * Consume any pending interrupts, including the possible source of
> 	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
> 	 * An instruction is required after local_irq_enable() to fully unblock
> 	 * interrupts on processors that implement an interrupt shadow, the
> 	 * stat.exits increment will do nicely.
> 	 */
> 	local_irq_enable();
> 	++vcpu->stat.exits;
> 	local_irq_disable();
> 
> 	/*
> 	 * Wait until after servicing IRQs to account guest time so that any
> 	 * ticks that occurred while running the guest are properly accounted
> 	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
> 	 * of accounting via context tracking, but the loss of accuracy is
> 	 * acceptable for all known use cases.
> 	 */
> 	vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

I see.

The abstraction's really messy here on x86, and the enter/exit sides aren't
clearly balanced.

For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
elsewhere. Also, guest_enter_irq_off() conditionally calls
rcu_virt_note_context_switch(), but I can't immediately spot anything on the
exit side that corresponded with that, which looks suspicious.

> So I guess we should convert to x86's scheme, and maybe create another generic
> guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
> disabled.

I think we might need to do some preparatory refactoring here so that this is
all clearly balanced even on x86, e.g. splitting the enter/exit steps into
multiple phases.

> > Note that the EL0 handlers *cannot* trigger for an exception taken from a
> > guest. We use separate vectors while running a guest (for both VHE and nVHE
> > modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> > can ony take IRQ from EL1 *after* that returns.
> > 
> > We *might* need to audit the KVM vector handlers to make sure they're not
> > dependent on RCU protection (I assume they're not, but it's possible something
> > has leaked into the VHE code).
> 
> IIUC in the window between local_irq_enable() and guest_exit() any driver
> interrupt might trigger, isn't it?

Yes, via the EL1 interrupt vectors, which I assume we'll fix in one go above.

Here I was trying to point out that there's another potential issue here if we
do anything in the context of the KVM exception vectors, as those can run C
code in a shallow exeption context, and can either return back into the guest
OR return to the caller of kvm_call_hyp_ret(__kvm_vcpu_run, vcpu).

So even if we fix kvm_arch_vcpu_ioctl_run() we might need to also rework
handlers that run in that shallow exception context, if they rely on RCU for
something.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 13:21 ` Mark Rutland
  2021-12-17 14:15   ` Nicolas Saenz Julienne
@ 2021-12-17 14:51   ` Paolo Bonzini
  2021-12-20 14:28   ` Marc Zyngier
  2021-12-20 16:10   ` Frederic Weisbecker
  3 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2021-12-17 14:51 UTC (permalink / raw)
  To: Mark Rutland, Nicolas Saenz Julienne
  Cc: paulmck, maz, frederic, linux-kernel, rcu, Thomas Gleixner,
	Will Deacon, kvmarm, linux-arm-kernel, kvm-riscv, Anup Patel

On 12/17/21 14:21, Mark Rutland wrote:
> I'm not immediately sure what we*should*  do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

ARM is missing something like commit 160457140187 ("KVM: x86: Defer 
vtime accounting 'til after IRQ handling", 2021-05-05).

With that change, it would be possible to move guest_exit() in the 
irq-disabled region without breaking time accounting.

RISC-V has the same issue and it would be fixed in the same way, so 
let's Cc Anup too.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 14:38     ` Mark Rutland
@ 2021-12-17 15:54       ` Paolo Bonzini
  2021-12-17 16:07         ` Paul E. McKenney
  2022-01-04 16:39         ` Mark Rutland
  0 siblings, 2 replies; 22+ messages in thread
From: Paolo Bonzini @ 2021-12-17 15:54 UTC (permalink / raw)
  To: Mark Rutland, Nicolas Saenz Julienne
  Cc: paulmck, maz, frederic, linux-kernel, rcu, Thomas Gleixner,
	Will Deacon, kvmarm, linux-arm-kernel

On 12/17/21 15:38, Mark Rutland wrote:
> For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> elsewhere. Also, guest_enter_irq_off() conditionally calls
> rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> exit side that corresponded with that, which looks suspicious.

rcu_note_context_switch() is a point-in-time notification; it's not 
strictly necessary, but it may improve performance a bit by avoiding 
unnecessary IPIs from the RCU subsystem.

There's no benefit from doing it when you're back from the guest, 
because at that point the CPU is just running normal kernel code.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 15:54       ` Paolo Bonzini
@ 2021-12-17 16:07         ` Paul E. McKenney
  2021-12-17 16:20           ` Nicolas Saenz Julienne
  2021-12-17 16:34           ` Paolo Bonzini
  2022-01-04 16:39         ` Mark Rutland
  1 sibling, 2 replies; 22+ messages in thread
From: Paul E. McKenney @ 2021-12-17 16:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> On 12/17/21 15:38, Mark Rutland wrote:
> > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > exit side that corresponded with that, which looks suspicious.
> 
> rcu_note_context_switch() is a point-in-time notification; it's not strictly
> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> from the RCU subsystem.
> 
> There's no benefit from doing it when you're back from the guest, because at
> that point the CPU is just running normal kernel code.

Do scheduling-clock interrupts from guest mode have the "user" parameter
set?  If so, that would keep RCU happy.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 16:07         ` Paul E. McKenney
@ 2021-12-17 16:20           ` Nicolas Saenz Julienne
  2021-12-17 16:43             ` Paul E. McKenney
  2021-12-17 16:34           ` Paolo Bonzini
  1 sibling, 1 reply; 22+ messages in thread
From: Nicolas Saenz Julienne @ 2021-12-17 16:20 UTC (permalink / raw)
  To: paulmck, Paolo Bonzini
  Cc: Mark Rutland, maz, frederic, linux-kernel, rcu, Thomas Gleixner,
	Will Deacon, kvmarm, linux-arm-kernel

Hi Paul,

On Fri, 2021-12-17 at 08:07 -0800, Paul E. McKenney wrote:
> On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> > On 12/17/21 15:38, Mark Rutland wrote:
> > > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > > exit side that corresponded with that, which looks suspicious.
> > 
> > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > from the RCU subsystem.
> > 
> > There's no benefit from doing it when you're back from the guest, because at
> > that point the CPU is just running normal kernel code.
> 
> Do scheduling-clock interrupts from guest mode have the "user" parameter
> set?  If so, that would keep RCU happy.

Are you referring to the user_mode() check in irqentry_enter()? If so I don't
think it'll help, arm64 doesn't use that function. It directly calls
enter_from_{user,kernel}_mode() through its custom entry/exit routines.

Regards,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 16:07         ` Paul E. McKenney
  2021-12-17 16:20           ` Nicolas Saenz Julienne
@ 2021-12-17 16:34           ` Paolo Bonzini
  2021-12-17 16:45             ` Paul E. McKenney
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-12-17 16:34 UTC (permalink / raw)
  To: paulmck
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On 12/17/21 17:07, Paul E. McKenney wrote:
>> rcu_note_context_switch() is a point-in-time notification; it's not strictly
>> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
>> from the RCU subsystem.
>>
>> There's no benefit from doing it when you're back from the guest, because at
>> that point the CPU is just running normal kernel code.
>
> Do scheduling-clock interrupts from guest mode have the "user" parameter
> set?  If so, that would keep RCU happy.

No, thread is in supervisor mode.  But after every interrupt (timer tick 
or anything), one of three things can happen:

* KVM will go around the execution loop and invoke 
rcu_note_context_switch() again

* or KVM will go back to user space

* or the thread will be preempted

and either will keep RCU happy as far as I understand.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 16:20           ` Nicolas Saenz Julienne
@ 2021-12-17 16:43             ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2021-12-17 16:43 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: Paolo Bonzini, Mark Rutland, maz, frederic, linux-kernel, rcu,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel

On Fri, Dec 17, 2021 at 05:20:21PM +0100, Nicolas Saenz Julienne wrote:
> Hi Paul,
> 
> On Fri, 2021-12-17 at 08:07 -0800, Paul E. McKenney wrote:
> > On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> > > On 12/17/21 15:38, Mark Rutland wrote:
> > > > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > > > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > > > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > > > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > > > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > > > exit side that corresponded with that, which looks suspicious.
> > > 
> > > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > > from the RCU subsystem.
> > > 
> > > There's no benefit from doing it when you're back from the guest, because at
> > > that point the CPU is just running normal kernel code.
> > 
> > Do scheduling-clock interrupts from guest mode have the "user" parameter
> > set?  If so, that would keep RCU happy.
> 
> Are you referring to the user_mode() check in irqentry_enter()? If so I don't
> think it'll help, arm64 doesn't use that function. It directly calls
> enter_from_{user,kernel}_mode() through its custom entry/exit routines.

I am talking about rcu_sched_clock_irq(), which is called from
update_process_times(), which is called from various places depending
on .config.  These call sites pass in either user_mode(regs) or
user_mode(get_irq_regs()).

Huh.  Maybe I should be looking into using user_mode(get_irq_regs())
in other places within RCU.  Except that I bet the possibility of RCU
being invoked from NMI handlers makes this a bit tricky.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 16:34           ` Paolo Bonzini
@ 2021-12-17 16:45             ` Paul E. McKenney
  2021-12-17 17:02               ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2021-12-17 16:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On Fri, Dec 17, 2021 at 05:34:04PM +0100, Paolo Bonzini wrote:
> On 12/17/21 17:07, Paul E. McKenney wrote:
> > > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > > from the RCU subsystem.
> > > 
> > > There's no benefit from doing it when you're back from the guest, because at
> > > that point the CPU is just running normal kernel code.
> > 
> > Do scheduling-clock interrupts from guest mode have the "user" parameter
> > set?  If so, that would keep RCU happy.
> 
> No, thread is in supervisor mode.  But after every interrupt (timer tick or
> anything), one of three things can happen:
> 
> * KVM will go around the execution loop and invoke rcu_note_context_switch()
> again
> 
> * or KVM will go back to user space

Here "user space" is a user process as opposed to a guest OS?

> * or the thread will be preempted
> 
> and either will keep RCU happy as far as I understand.

Regardless of the answer to my question above, yes, these will keep
RCU happy.  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 16:45             ` Paul E. McKenney
@ 2021-12-17 17:02               ` Paolo Bonzini
  2021-12-17 17:12                 ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-12-17 17:02 UTC (permalink / raw)
  To: paulmck
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On 12/17/21 17:45, Paul E. McKenney wrote:
> On Fri, Dec 17, 2021 at 05:34:04PM +0100, Paolo Bonzini wrote:
>> On 12/17/21 17:07, Paul E. McKenney wrote:
>>>> rcu_note_context_switch() is a point-in-time notification; it's not strictly
>>>> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
>>>> from the RCU subsystem.
>>>>
>>>> There's no benefit from doing it when you're back from the guest, because at
>>>> that point the CPU is just running normal kernel code.
>>>
>>> Do scheduling-clock interrupts from guest mode have the "user" parameter
>>> set?  If so, that would keep RCU happy.
>>
>> No, thread is in supervisor mode.  But after every interrupt (timer tick or
>> anything), one of three things can happen:
>>
>> * KVM will go around the execution loop and invoke rcu_note_context_switch()
>> again
>>
>> * or KVM will go back to user space
> 
> Here "user space" is a user process as opposed to a guest OS?

Yes, that code runs from ioctl(KVM_RUN) and the ioctl will return to the 
calling process.

Paolo

>> * or the thread will be preempted
>>
>> and either will keep RCU happy as far as I understand.
> 
> Regardless of the answer to my question above, yes, these will keep
> RCU happy.  ;-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 17:02               ` Paolo Bonzini
@ 2021-12-17 17:12                 ` Paul E. McKenney
  2021-12-17 17:23                   ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2021-12-17 17:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On Fri, Dec 17, 2021 at 06:02:23PM +0100, Paolo Bonzini wrote:
> On 12/17/21 17:45, Paul E. McKenney wrote:
> > On Fri, Dec 17, 2021 at 05:34:04PM +0100, Paolo Bonzini wrote:
> > > On 12/17/21 17:07, Paul E. McKenney wrote:
> > > > > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > > > > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > > > > from the RCU subsystem.
> > > > > 
> > > > > There's no benefit from doing it when you're back from the guest, because at
> > > > > that point the CPU is just running normal kernel code.
> > > > 
> > > > Do scheduling-clock interrupts from guest mode have the "user" parameter
> > > > set?  If so, that would keep RCU happy.
> > > 
> > > No, thread is in supervisor mode.  But after every interrupt (timer tick or
> > > anything), one of three things can happen:
> > > 
> > > * KVM will go around the execution loop and invoke rcu_note_context_switch()
> > > again
> > > 
> > > * or KVM will go back to user space
> > 
> > Here "user space" is a user process as opposed to a guest OS?
> 
> Yes, that code runs from ioctl(KVM_RUN) and the ioctl will return to the
> calling process.

Intriguing.  A user process within the guest OS or a user process outside
of any guest OS, that is, within the host?

							Thanx, Paul

> Paolo
> 
> > > * or the thread will be preempted
> > > 
> > > and either will keep RCU happy as far as I understand.
> > 
> > Regardless of the answer to my question above, yes, these will keep
> > RCU happy.  ;-)
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 17:12                 ` Paul E. McKenney
@ 2021-12-17 17:23                   ` Paolo Bonzini
  2021-12-17 17:47                     ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-12-17 17:23 UTC (permalink / raw)
  To: paulmck
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On 12/17/21 18:12, Paul E. McKenney wrote:
> On Fri, Dec 17, 2021 at 06:02:23PM +0100, Paolo Bonzini wrote:
>> On 12/17/21 17:45, Paul E. McKenney wrote:
>>> On Fri, Dec 17, 2021 at 05:34:04PM +0100, Paolo Bonzini wrote:
>>>> On 12/17/21 17:07, Paul E. McKenney wrote:
>>>>>> rcu_note_context_switch() is a point-in-time notification; it's not strictly
>>>>>> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
>>>>>> from the RCU subsystem.
>>>>>>
>>>>>> There's no benefit from doing it when you're back from the guest, because at
>>>>>> that point the CPU is just running normal kernel code.
>>>>>
>>>>> Do scheduling-clock interrupts from guest mode have the "user" parameter
>>>>> set?  If so, that would keep RCU happy.
>>>>
>>>> No, thread is in supervisor mode.  But after every interrupt (timer tick or
>>>> anything), one of three things can happen:
>>>>
>>>> * KVM will go around the execution loop and invoke rcu_note_context_switch()
>>>> again
>>>>
>>>> * or KVM will go back to user space
>>>
>>> Here "user space" is a user process as opposed to a guest OS?
>>
>> Yes, that code runs from ioctl(KVM_RUN) and the ioctl will return to the
>> calling process.
> 
> Intriguing.  A user process within the guest OS or a user process outside
> of any guest OS, that is, within the host?

A user process on the host.  The guest vCPU is nothing special: it's 
just a user thread that occasionally lets the guest run by invoking the 
KVM_RUN ioctl.  Hopefully, KVM_RUN will be where that user thread will 
spend most of the time so the guest runs at full steam.  KVM_RUN is the 
place where you have the code that Nicolas and Mark were discussing.

 From the point of view of the kernel however the thread is always in 
kernel mode when it runs the guest, because any interrupt will be 
recognized while still in the ioctl.

(I'll add that from the point of view of the scheduler, there's no 
difference between a CPU-bound guest and a "normal" CPU-bound process on 
the host, e.g. wasting time with "for(;;)" or calculating digits of PI 
is the same no matter if you're doing it in the guest or in the host. 
Likewise for I/O-bound guests; e.g. doing "hlt" or "wfi" constantly in 
the guest looks exactly the same to the scheduler as a process that 
spends its time in the poll() system call).

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 17:23                   ` Paolo Bonzini
@ 2021-12-17 17:47                     ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2021-12-17 17:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mark Rutland, Nicolas Saenz Julienne, maz, frederic,
	linux-kernel, rcu, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel

On Fri, Dec 17, 2021 at 06:23:32PM +0100, Paolo Bonzini wrote:
> On 12/17/21 18:12, Paul E. McKenney wrote:
> > On Fri, Dec 17, 2021 at 06:02:23PM +0100, Paolo Bonzini wrote:
> > > On 12/17/21 17:45, Paul E. McKenney wrote:
> > > > On Fri, Dec 17, 2021 at 05:34:04PM +0100, Paolo Bonzini wrote:
> > > > > On 12/17/21 17:07, Paul E. McKenney wrote:
> > > > > > > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > > > > > > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > > > > > > from the RCU subsystem.
> > > > > > > 
> > > > > > > There's no benefit from doing it when you're back from the guest, because at
> > > > > > > that point the CPU is just running normal kernel code.
> > > > > > 
> > > > > > Do scheduling-clock interrupts from guest mode have the "user" parameter
> > > > > > set?  If so, that would keep RCU happy.
> > > > > 
> > > > > No, thread is in supervisor mode.  But after every interrupt (timer tick or
> > > > > anything), one of three things can happen:
> > > > > 
> > > > > * KVM will go around the execution loop and invoke rcu_note_context_switch()
> > > > > again
> > > > > 
> > > > > * or KVM will go back to user space
> > > > 
> > > > Here "user space" is a user process as opposed to a guest OS?
> > > 
> > > Yes, that code runs from ioctl(KVM_RUN) and the ioctl will return to the
> > > calling process.
> > 
> > Intriguing.  A user process within the guest OS or a user process outside
> > of any guest OS, that is, within the host?
> 
> A user process on the host.  The guest vCPU is nothing special: it's just a
> user thread that occasionally lets the guest run by invoking the KVM_RUN
> ioctl.  Hopefully, KVM_RUN will be where that user thread will spend most of
> the time so the guest runs at full steam.  KVM_RUN is the place where you
> have the code that Nicolas and Mark were discussing.
> 
> From the point of view of the kernel however the thread is always in kernel
> mode when it runs the guest, because any interrupt will be recognized while
> still in the ioctl.
> 
> (I'll add that from the point of view of the scheduler, there's no
> difference between a CPU-bound guest and a "normal" CPU-bound process on the
> host, e.g. wasting time with "for(;;)" or calculating digits of PI is the
> same no matter if you're doing it in the guest or in the host. Likewise for
> I/O-bound guests; e.g. doing "hlt" or "wfi" constantly in the guest looks
> exactly the same to the scheduler as a process that spends its time in the
> poll() system call).

Thank you for the explanation!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 13:21 ` Mark Rutland
  2021-12-17 14:15   ` Nicolas Saenz Julienne
  2021-12-17 14:51   ` Paolo Bonzini
@ 2021-12-20 14:28   ` Marc Zyngier
  2021-12-20 16:10   ` Frederic Weisbecker
  3 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2021-12-20 14:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Nicolas Saenz Julienne, Will Deacon, paulmck, linux-arm-kernel,
	rcu, Thomas Gleixner, frederic, kvmarm, linux-kernel

On Fri, 17 Dec 2021 13:21:39 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and
> won't exit the EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do
> for an IRQ taken from a guest mode? I couldn't spot any handling of
> that case, but I'm not familiar enough with the x86 exception model
> to know if I'm looking in the right place.
> 
> Note that the EL0 handlers *cannot* trigger for an exception taken
> from a guest. We use separate vectors while running a guest (for
> both VHE and nVHE modes), and from the main kernel's PoV we return
> from kvm_call_hyp_ret(). We can ony take IRQ from EL1 *after* that
> returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

The *intent* certainly is that whatever is used in the VHE code to
handle exceptions arising whilst running in guest context must be
independent from RCU, if only because we share a bunch with the !VHE
code, and RCU is, unfortunately, not a thing there.

My most immediate concern is that the VHE/nVHE split now allows all
sort of instrumentation in VHE, which may rely on RCU. At the very
least, we should make most of the VHE switch code noinstr.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 13:21 ` Mark Rutland
                     ` (2 preceding siblings ...)
  2021-12-20 14:28   ` Marc Zyngier
@ 2021-12-20 16:10   ` Frederic Weisbecker
  2022-01-04 13:24     ` Mark Rutland
  3 siblings, 1 reply; 22+ messages in thread
From: Frederic Weisbecker @ 2021-12-20 16:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Nicolas Saenz Julienne, maz, Will Deacon, paulmck,
	linux-arm-kernel, rcu, Thomas Gleixner, kvmarm, linux-kernel

On Fri, Dec 17, 2021 at 01:21:39PM +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

This is one of the purposes of rcu_irq_enter(). el1 handlers don't call irq_enter()?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-20 16:10   ` Frederic Weisbecker
@ 2022-01-04 13:24     ` Mark Rutland
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2022-01-04 13:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Nicolas Saenz Julienne, maz, Will Deacon, paulmck,
	linux-arm-kernel, rcu, Thomas Gleixner, kvmarm, linux-kernel

On Mon, Dec 20, 2021 at 05:10:14PM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 17, 2021 at 01:21:39PM +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > > Hi All,
> > 
> > Hi,
> > 
> > > arm64's guest entry code does the following:
> > > 
> > > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > > {
> > > 	[...]
> > > 
> > > 	guest_enter_irqoff();
> > > 
> > > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > > 
> > > 	[...]
> > > 
> > > 	local_irq_enable();
> > > 
> > > 	/*
> > > 	 * We do local_irq_enable() before calling guest_exit() so
> > > 	 * that if a timer interrupt hits while running the guest we
> > > 	 * account that tick as being spent in the guest.  We enable
> > > 	 * preemption after calling guest_exit() so that if we get
> > > 	 * preempted we make sure ticks after that is not counted as
> > > 	 * guest time.
> > > 	 */
> > > 	guest_exit();
> > > 	[...]
> > > }
> > > 
> > > 
> > > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > > state (EQS). Any interrupt happening between local_irq_enable() and
> > > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > > do the right thing if trggered in this context, but el1's won't. Is it
> > > possible to hit an el1 handler (for example __el1_irq()) there?
> > 
> > I think you're right that the EL1 handlers can trigger here and won't exit the
> > EQS.
> > 
> > I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> > taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> > familiar enough with the x86 exception model to know if I'm looking in the
> > right place.
> 
> This is one of the purposes of rcu_irq_enter(). el1 handlers don't call irq_enter()?

Due to lockep/tracing/etc ordering, we don't use irq_enter() directly and
instead call rcu_irq_enter() and irq_enter_rcu() separately. Critically we only
call rcu_irq_enter() for IRQs taken from the idle thread, as this was
previously thought to be the only place where we could take an IRQ from an EL1
EQS.

See __el1_irq(), __enter_from_kernel_mode(), and __exit_to_kernel_mode() in
arch/arm64/kernel/entry-common.c. The latter two are largely analogous to the
common irqentry_enter9) and irqentry_exit() helpers in kernel/entry/common.c.

We need to either rework the KVM code or that entry code. I'll dig into this a
bit more...

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2021-12-17 15:54       ` Paolo Bonzini
  2021-12-17 16:07         ` Paul E. McKenney
@ 2022-01-04 16:39         ` Mark Rutland
  2022-01-04 17:07           ` Paolo Bonzini
  2022-01-11 11:32           ` Nicolas Saenz Julienne
  1 sibling, 2 replies; 22+ messages in thread
From: Mark Rutland @ 2022-01-04 16:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nicolas Saenz Julienne, paulmck, maz, frederic, linux-kernel,
	rcu, Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Anup Patel

On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> On 12/17/21 15:38, Mark Rutland wrote:
> > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > exit side that corresponded with that, which looks suspicious.
> 
> rcu_note_context_switch() is a point-in-time notification; it's not strictly
> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> from the RCU subsystem.
> 
> There's no benefit from doing it when you're back from the guest, because at
> that point the CPU is just running normal kernel code.

I see.

My main issue here was just that it's really difficult to see how the
entry/exit logic is balanced, and I reckon we can solve that by splitting
guest_{enter,exit}_irqoff() into helper functions to handle the vtime
accounting separately from the context tracking, so that arch code can do
something like:

  guest_timing_enter_irqoff();
  
  guest_eqs_enter_irqoff();
  < actually run vCPU here >
  guest_eqs_exit_irqoff();
  
  < handle pending IRQs here >
  
  guest_timing_exit_irqoff();

... which I hope should work for RISC-V too.

I've had a go, and I've pushed out a WIP to:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu

I also see we'll need to add some lockdep/irq-tracing management to arm64, and
it probably makes sense to fold that into common helpers, so I'll have a play
with that tomorrow.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2022-01-04 16:39         ` Mark Rutland
@ 2022-01-04 17:07           ` Paolo Bonzini
  2022-01-11 11:32           ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-01-04 17:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Nicolas Saenz Julienne, paulmck, maz, frederic, linux-kernel,
	rcu, Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Anup Patel

On 1/4/22 17:39, Mark Rutland wrote:
> My main issue here was just that it's really difficult to see how the
> entry/exit logic is balanced, and I reckon we can solve that by splitting
> guest_{enter,exit}_irqoff() into helper functions to handle the vtime
> accounting separately from the context tracking, so that arch code can do
> something like:
> 
>    guest_timing_enter_irqoff();
>    
>    guest_eqs_enter_irqoff();
>    < actually run vCPU here >
>    guest_eqs_exit_irqoff();
>    
>    < handle pending IRQs here >
>    
>    guest_timing_exit_irqoff();
> 
> ... which I hope should work for RISC-V too.
> 
> I've had a go, and I've pushed out a WIP to:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu

Yes, you have a point and it makes sense for x86 too.  You can send me a 
topic branch once you get all the acks.  Thanks!

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2022-01-04 16:39         ` Mark Rutland
  2022-01-04 17:07           ` Paolo Bonzini
@ 2022-01-11 11:32           ` Nicolas Saenz Julienne
  2022-01-11 12:23             ` Mark Rutland
  1 sibling, 1 reply; 22+ messages in thread
From: Nicolas Saenz Julienne @ 2022-01-11 11:32 UTC (permalink / raw)
  To: Mark Rutland, Paolo Bonzini
  Cc: paulmck, maz, frederic, linux-kernel, rcu, Thomas Gleixner,
	Will Deacon, kvmarm, linux-arm-kernel, Anup Patel

Hi Mark,

On Tue, 2022-01-04 at 16:39 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> > On 12/17/21 15:38, Mark Rutland wrote:
> > > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > > exit side that corresponded with that, which looks suspicious.
> > 
> > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > from the RCU subsystem.
> > 
> > There's no benefit from doing it when you're back from the guest, because at
> > that point the CPU is just running normal kernel code.
> 
> I see.
> 
> My main issue here was just that it's really difficult to see how the
> entry/exit logic is balanced, and I reckon we can solve that by splitting
> guest_{enter,exit}_irqoff() into helper functions to handle the vtime
> accounting separately from the context tracking, so that arch code can do
> something like:
> 
>   guest_timing_enter_irqoff();
>   
>   guest_eqs_enter_irqoff();
>   < actually run vCPU here >
>   guest_eqs_exit_irqoff();
>   
>   < handle pending IRQs here >
>   
>   guest_timing_exit_irqoff();
> 
> ... which I hope should work for RISC-V too.
> 
> I've had a go, and I've pushed out a WIP to:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu

Had a look at the patches and they seeem OK to me.

Thanks!

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Possible nohz-full/RCU issue in arm64 KVM
  2022-01-11 11:32           ` Nicolas Saenz Julienne
@ 2022-01-11 12:23             ` Mark Rutland
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2022-01-11 12:23 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: Paolo Bonzini, paulmck, maz, frederic, linux-kernel, rcu,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Anup Patel

On Tue, Jan 11, 2022 at 12:32:38PM +0100, Nicolas Saenz Julienne wrote:
> Hi Mark,
> 
> On Tue, 2022-01-04 at 16:39 +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> > > On 12/17/21 15:38, Mark Rutland wrote:
> > > > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > > > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > > > guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
> > > > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > > > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > > > exit side that corresponded with that, which looks suspicious.
> > > 
> > > rcu_note_context_switch() is a point-in-time notification; it's not strictly
> > > necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> > > from the RCU subsystem.
> > > 
> > > There's no benefit from doing it when you're back from the guest, because at
> > > that point the CPU is just running normal kernel code.
> > 
> > I see.
> > 
> > My main issue here was just that it's really difficult to see how the
> > entry/exit logic is balanced, and I reckon we can solve that by splitting
> > guest_{enter,exit}_irqoff() into helper functions to handle the vtime
> > accounting separately from the context tracking, so that arch code can do
> > something like:
> > 
> >   guest_timing_enter_irqoff();
> >   
> >   guest_eqs_enter_irqoff();
> >   < actually run vCPU here >
> >   guest_eqs_exit_irqoff();
> >   
> >   < handle pending IRQs here >
> >   
> >   guest_timing_exit_irqoff();
> > 
> > ... which I hope should work for RISC-V too.
> > 
> > I've had a go, and I've pushed out a WIP to:
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu
> 
> Had a look at the patches and they seeem OK to me.
> 
> Thanks!

Cool.

FWIW I have an updated version at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=kvm/entry-rework

... which is largely the same approach, but the helpers got renamed, the
lockdep/tracing bits got fixed, and I've aligned mips, riscv, and x86 on the
same approach.

Once I get a free hour or so I intend to rebase that atop v5.16 and post that
out. I'll start a new thread with that, and rope in the relevant arch
maintainers (since e.g. I'm not sure what to do for ppc and s390).

Thanks,
Mark.

> 
> -- 
> Nicolás Sáenz
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-01-11 12:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17 11:51 Possible nohz-full/RCU issue in arm64 KVM Nicolas Saenz Julienne
2021-12-17 13:21 ` Mark Rutland
2021-12-17 14:15   ` Nicolas Saenz Julienne
2021-12-17 14:38     ` Mark Rutland
2021-12-17 15:54       ` Paolo Bonzini
2021-12-17 16:07         ` Paul E. McKenney
2021-12-17 16:20           ` Nicolas Saenz Julienne
2021-12-17 16:43             ` Paul E. McKenney
2021-12-17 16:34           ` Paolo Bonzini
2021-12-17 16:45             ` Paul E. McKenney
2021-12-17 17:02               ` Paolo Bonzini
2021-12-17 17:12                 ` Paul E. McKenney
2021-12-17 17:23                   ` Paolo Bonzini
2021-12-17 17:47                     ` Paul E. McKenney
2022-01-04 16:39         ` Mark Rutland
2022-01-04 17:07           ` Paolo Bonzini
2022-01-11 11:32           ` Nicolas Saenz Julienne
2022-01-11 12:23             ` Mark Rutland
2021-12-17 14:51   ` Paolo Bonzini
2021-12-20 14:28   ` Marc Zyngier
2021-12-20 16:10   ` Frederic Weisbecker
2022-01-04 13:24     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).