All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Saenz Julienne <nsaenzju@redhat.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: maz <maz@kernel.org>, Will Deacon <will@kernel.org>,
	paulmck <paulmck@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	rcu <rcu@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	frederic <frederic@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Possible nohz-full/RCU issue in arm64 KVM
Date: Fri, 17 Dec 2021 15:15:29 +0100	[thread overview]
Message-ID: <70f112072d9496d21901946ea82832d3ed3a8cb2.camel@redhat.com> (raw)
In-Reply-To: <YbyO40zDW/kvUHEE@FVFF77S0Q05N>

On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

Well x86 has its own private KVM guest context exit function
'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
things):

	local_irq_disable();
	kvm_guest_enter_irqoff() // Inform CT, enter EQS
	__vmx_kvm_run()
	kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with PF_VCPU

	/*
	 * Consume any pending interrupts, including the possible source of
	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
	 * An instruction is required after local_irq_enable() to fully unblock
	 * interrupts on processors that implement an interrupt shadow, the
	 * stat.exits increment will do nicely.
	 */
	local_irq_enable();
	++vcpu->stat.exits;
	local_irq_disable();

	/*
	 * Wait until after servicing IRQs to account guest time so that any
	 * ticks that occurred while running the guest are properly accounted
	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
	 * of accounting via context tracking, but the loss of accuracy is
	 * acceptable for all known use cases.
	 */
	vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

So I guess we should convert to x86's scheme, and maybe create another generic
guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
disabled.

> Note that the EL0 handlers *cannot* trigger for an exception taken from a
> guest. We use separate vectors while running a guest (for both VHE and nVHE
> modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> can ony take IRQ from EL1 *after* that returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

IIUC in the window between local_irq_enable() and guest_exit() any driver
interrupt might trigger, isn't it?

Regards,

-- 
Nicolás Sáenz


WARNING: multiple messages have this Message-ID (diff)
From: Nicolas Saenz Julienne <nsaenzju@redhat.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: maz <maz@kernel.org>, Will Deacon <will@kernel.org>,
	paulmck <paulmck@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	 rcu <rcu@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	frederic <frederic@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Possible nohz-full/RCU issue in arm64 KVM
Date: Fri, 17 Dec 2021 15:15:29 +0100	[thread overview]
Message-ID: <70f112072d9496d21901946ea82832d3ed3a8cb2.camel@redhat.com> (raw)
In-Reply-To: <YbyO40zDW/kvUHEE@FVFF77S0Q05N>

On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

Well x86 has its own private KVM guest context exit function
'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
things):

	local_irq_disable();
	kvm_guest_enter_irqoff() // Inform CT, enter EQS
	__vmx_kvm_run()
	kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with PF_VCPU

	/*
	 * Consume any pending interrupts, including the possible source of
	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
	 * An instruction is required after local_irq_enable() to fully unblock
	 * interrupts on processors that implement an interrupt shadow, the
	 * stat.exits increment will do nicely.
	 */
	local_irq_enable();
	++vcpu->stat.exits;
	local_irq_disable();

	/*
	 * Wait until after servicing IRQs to account guest time so that any
	 * ticks that occurred while running the guest are properly accounted
	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
	 * of accounting via context tracking, but the loss of accuracy is
	 * acceptable for all known use cases.
	 */
	vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

So I guess we should convert to x86's scheme, and maybe create another generic
guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
disabled.

> Note that the EL0 handlers *cannot* trigger for an exception taken from a
> guest. We use separate vectors while running a guest (for both VHE and nVHE
> modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> can ony take IRQ from EL1 *after* that returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

IIUC in the window between local_irq_enable() and guest_exit() any driver
interrupt might trigger, isn't it?

Regards,

-- 
Nicolás Sáenz


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Nicolas Saenz Julienne <nsaenzju@redhat.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: paulmck <paulmck@kernel.org>, maz <maz@kernel.org>,
	frederic <frederic@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	rcu <rcu@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: Possible nohz-full/RCU issue in arm64 KVM
Date: Fri, 17 Dec 2021 15:15:29 +0100	[thread overview]
Message-ID: <70f112072d9496d21901946ea82832d3ed3a8cb2.camel@redhat.com> (raw)
In-Reply-To: <YbyO40zDW/kvUHEE@FVFF77S0Q05N>

On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > Hi All,
> 
> Hi,
> 
> > arm64's guest entry code does the following:
> > 
> > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > {
> > 	[...]
> > 
> > 	guest_enter_irqoff();
> > 
> > 	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > 
> > 	[...]
> > 
> > 	local_irq_enable();
> > 
> > 	/*
> > 	 * We do local_irq_enable() before calling guest_exit() so
> > 	 * that if a timer interrupt hits while running the guest we
> > 	 * account that tick as being spent in the guest.  We enable
> > 	 * preemption after calling guest_exit() so that if we get
> > 	 * preempted we make sure ticks after that is not counted as
> > 	 * guest time.
> > 	 */
> > 	guest_exit();
> > 	[...]
> > }
> > 
> > 
> > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > state (EQS). Any interrupt happening between local_irq_enable() and
> > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> > do the right thing if trggered in this context, but el1's won't. Is it
> > possible to hit an el1 handler (for example __el1_irq()) there?
> 
> I think you're right that the EL1 handlers can trigger here and won't exit the
> EQS.
> 
> I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
> taken from a guest mode? I couldn't spot any handling of that case, but I'm not
> familiar enough with the x86 exception model to know if I'm looking in the
> right place.

Well x86 has its own private KVM guest context exit function
'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
things):

	local_irq_disable();
	kvm_guest_enter_irqoff() // Inform CT, enter EQS
	__vmx_kvm_run()
	kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with PF_VCPU

	/*
	 * Consume any pending interrupts, including the possible source of
	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
	 * An instruction is required after local_irq_enable() to fully unblock
	 * interrupts on processors that implement an interrupt shadow, the
	 * stat.exits increment will do nicely.
	 */
	local_irq_enable();
	++vcpu->stat.exits;
	local_irq_disable();

	/*
	 * Wait until after servicing IRQs to account guest time so that any
	 * ticks that occurred while running the guest are properly accounted
	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
	 * of accounting via context tracking, but the loss of accuracy is
	 * acceptable for all known use cases.
	 */
	vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

So I guess we should convert to x86's scheme, and maybe create another generic
guest_{enter,exit}() flavor for virtualization schemes that run with interrupts
disabled.

> Note that the EL0 handlers *cannot* trigger for an exception taken from a
> guest. We use separate vectors while running a guest (for both VHE and nVHE
> modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> can ony take IRQ from EL1 *after* that returns.
> 
> We *might* need to audit the KVM vector handlers to make sure they're not
> dependent on RCU protection (I assume they're not, but it's possible something
> has leaked into the VHE code).

IIUC in the window between local_irq_enable() and guest_exit() any driver
interrupt might trigger, isn't it?

Regards,

-- 
Nicolás Sáenz

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2021-12-17 14:15 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17 11:51 Possible nohz-full/RCU issue in arm64 KVM Nicolas Saenz Julienne
2021-12-17 11:51 ` Nicolas Saenz Julienne
2021-12-17 11:51 ` Nicolas Saenz Julienne
2021-12-17 13:21 ` Mark Rutland
2021-12-17 13:21   ` Mark Rutland
2021-12-17 13:21   ` Mark Rutland
2021-12-17 14:15   ` Nicolas Saenz Julienne [this message]
2021-12-17 14:15     ` Nicolas Saenz Julienne
2021-12-17 14:15     ` Nicolas Saenz Julienne
2021-12-17 14:38     ` Mark Rutland
2021-12-17 14:38       ` Mark Rutland
2021-12-17 14:38       ` Mark Rutland
2021-12-17 15:54       ` Paolo Bonzini
2021-12-17 15:54         ` Paolo Bonzini
2021-12-17 15:54         ` Paolo Bonzini
2021-12-17 16:07         ` Paul E. McKenney
2021-12-17 16:07           ` Paul E. McKenney
2021-12-17 16:07           ` Paul E. McKenney
2021-12-17 16:20           ` Nicolas Saenz Julienne
2021-12-17 16:20             ` Nicolas Saenz Julienne
2021-12-17 16:20             ` Nicolas Saenz Julienne
2021-12-17 16:43             ` Paul E. McKenney
2021-12-17 16:43               ` Paul E. McKenney
2021-12-17 16:43               ` Paul E. McKenney
2021-12-17 16:34           ` Paolo Bonzini
2021-12-17 16:34             ` Paolo Bonzini
2021-12-17 16:34             ` Paolo Bonzini
2021-12-17 16:45             ` Paul E. McKenney
2021-12-17 16:45               ` Paul E. McKenney
2021-12-17 16:45               ` Paul E. McKenney
2021-12-17 17:02               ` Paolo Bonzini
2021-12-17 17:02                 ` Paolo Bonzini
2021-12-17 17:02                 ` Paolo Bonzini
2021-12-17 17:12                 ` Paul E. McKenney
2021-12-17 17:12                   ` Paul E. McKenney
2021-12-17 17:12                   ` Paul E. McKenney
2021-12-17 17:23                   ` Paolo Bonzini
2021-12-17 17:23                     ` Paolo Bonzini
2021-12-17 17:23                     ` Paolo Bonzini
2021-12-17 17:47                     ` Paul E. McKenney
2021-12-17 17:47                       ` Paul E. McKenney
2021-12-17 17:47                       ` Paul E. McKenney
2022-01-04 16:39         ` Mark Rutland
2022-01-04 16:39           ` Mark Rutland
2022-01-04 16:39           ` Mark Rutland
2022-01-04 17:07           ` Paolo Bonzini
2022-01-04 17:07             ` Paolo Bonzini
2022-01-04 17:07             ` Paolo Bonzini
2022-01-11 11:32           ` Nicolas Saenz Julienne
2022-01-11 11:32             ` Nicolas Saenz Julienne
2022-01-11 11:32             ` Nicolas Saenz Julienne
2022-01-11 12:23             ` Mark Rutland
2022-01-11 12:23               ` Mark Rutland
2022-01-11 12:23               ` Mark Rutland
2021-12-17 14:51   ` Paolo Bonzini
2021-12-17 14:51     ` Paolo Bonzini
2021-12-17 14:51     ` Paolo Bonzini
2021-12-20 14:28   ` Marc Zyngier
2021-12-20 14:28     ` Marc Zyngier
2021-12-20 14:28     ` Marc Zyngier
2021-12-20 16:10   ` Frederic Weisbecker
2021-12-20 16:10     ` Frederic Weisbecker
2021-12-20 16:10     ` Frederic Weisbecker
2022-01-04 13:24     ` Mark Rutland
2022-01-04 13:24       ` Mark Rutland
2022-01-04 13:24       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70f112072d9496d21901946ea82832d3ed3a8cb2.camel@redhat.com \
    --to=nsaenzju@redhat.com \
    --cc=frederic@kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.