Re: [patch 2/2] x86/kvm: Sanitize kvm_async_pf_task_wait()

From: Andy Lutomirski <luto@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>, KVM <kvm@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [patch 2/2] x86/kvm: Sanitize kvm_async_pf_task_wait()
Date: Sat, 7 Mar 2020 07:51:08 -0800	[thread overview]
Message-ID: <CALCETrX4p+++nS6N_yW2CnvMGUxngQBua65x9A9T-PB740LY0A@mail.gmail.com> (raw)
In-Reply-To: <CALCETrWc0wM1x-mAcKCPRUiGtzONtXiNVMFgWZwkRD3v3K3jsA@mail.gmail.com>

On Sat, Mar 7, 2020 at 7:10 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Sat, Mar 7, 2020 at 2:01 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Andy Lutomirski <luto@kernel.org> writes:

> Now I'm confused again.  Your patch is very careful not to schedule if
> we're in an RCU read-side critical section, but the regular preemption
> code (preempt_schedule_irq, etc) seems to be willing to schedule
> inside an RCU read-side critical section.  Why is the latter okay but
> not the async pf case?

I read more docs.  I guess the relevant situation is
CONFIG_PREEMPT_CPU, in which case it is legal to preempt an RCU
read-side critical section and obviously legal to put the whole CPU to
sleep, but it's illegal to explicitly block in an RCU read-side
critical section.  So I have a question for Paul: is it, in fact,
entirely illegal to block or merely illegal to block for an
excessively long time, e.g. waiting for user space or network traffic?
 In this situation, we cannot make progress until the host says we
can, so we are, in effect, blocking until the host tells us to stop
blocking.  Regardless, I agree that turning IRQs on is reasonable, and
allowing those IRQs to preempt us is reasonable.

As it stands in your patch, the situation is rather odd: we'll run
another task if that task *preempts* us (e.g. we block long enough to
run out of our time slice), but we won't run another task if we aren't
preempted.  This seems bizarre.

>
> Ignoring that, this still seems racy:
>
> STI
> nested #PF telling us to wake up
> #PF returns
> HLT
>
> doesn't this result in putting the CPU asleep for no good reason until
> the next interrupt hits?

I think this issue still stands and is actually a fairly easy race to hit.

STI
IRQ happens and we get preempted
another task runs and gets the #PF "async pf wakeup" event
reschedule, back to original task
HLT

The only particularly unusual thing here is that an IRQ (timer or
otherwise) needs to be queued up between when the #PF "async pf sleep"
event happens and STI so that it gets delivered before HLT.

ISTM the way to fully address this is to make the logic something like:

if (preemptible) {
  actually go to sleep.  do not HLT.  Do this even in an RCU read-side
critical section.
} else {
  /* ok, we have to wait, but it's still legal to handle IRQs. */
  if (choice A) {
    keep IRQs off.  Spin until we wake up.
  } else {
    while (still need to sleep) {
      HLT (with IRQs off!)
      local_irq_enable();
      /* if an interrupt was queued, handle it. */
      local_irq_disable();
    }
}