All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Saenz Julienne <nsaenz@amazon.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <kvm@vger.kernel.org>, <pbonzini@redhat.com>,
	<linux-kernel@vger.kernel.org>, <graf@amazon.de>,
	<dwmw2@infradead.org>, <fgriffo@amazon.com>, <anelkz@amazon.de>,
	<peterz@infradead.org>
Subject: Re: [RFC] KVM: Allow polling vCPUs for events
Date: Thu, 5 Oct 2023 16:27:54 +0000	[thread overview]
Message-ID: <CW0NB512KP2E.2ZZD07F49LND3@amazon.com> (raw)
In-Reply-To: <ZR35gq1NICwhOUAS@google.com>

Hi Sean,
Thanks for taking the time to look at this.

On Wed Oct 4, 2023 at 11:47 PM UTC, Sean Christopherson wrote:
[...]
> > @@ -3996,6 +4002,39 @@ static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
> >       return 0;
> >  }
> >
> > +static __poll_t kvm_vcpu_poll(struct file *file, poll_table *wait)
> > +{
> > +     struct kvm_vcpu *vcpu = file->private_data;
> > +
> > +     if (!vcpu->poll_mask)
> > +             return EPOLLERR;
> > +
> > +     switch (READ_ONCE(vcpu->mode)) {
> > +     case OUTSIDE_GUEST_MODE:
> > +             /*
> > +              * Make sure writes to vcpu->request are visible before the
> > +              * mode changes.
> > +              */
>
> Huh?  There are no writes to vcpu->request anywhere in here.

My thinking was the vcpu->requests load below could've been speculated
ahead of vcpu->mode's store, this will miss events when first entering
poll().

Since you pointed this out, I thought about it further. There is still
room for a race with the code as is, we need read vcpu->requests only
after poll_wait() returns, so as to make sure concurrent
kvm_make_request()/kvm_vcpu_kick() either wake up poll, or are visible
through the vcpu->requests check that precedes sleeping.

[...]

> > +             WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>
> This does not look remotely safe on multiple fronts.  For starters, I don't see
> anything in the .poll() infrastructure that provides serialization, e.g. if there
> are multiple tasks polling then this will be "interesting".

Would allowing only one poller be acceptable?

> And there is zero chance this is race-free, e.g. nothing prevents the vCPU task
> itself from changing vcpu->mode from POLLING_FOR_EVENTS to something else.
>
> Why on earth is this mucking with vcpu->mode?  Ignoring for the moment that using
> vcpu->requests as the poll source is never going to happen, there's zero reason

IIUC accessing vcpu->requests in the kvm_vcpu_poll() is out of the
question? Aren't we're forced to do so in order to avoid the race I
mention above.

> to write vcpu->mode.  From a correctness perspective, AFAICT there's no need for
> any shenanigans at all, i.e. kvm_make_vcpu_request() could blindly and unconditionally
> call wake_up_interruptible().

I was fixated with the halt/vtl_return use-cases, where we're either
running the vCPU or polling, and it seemed a decent way to policy
whether calling wake_up_interruptible() is needed. Clearly not the case,
I'll get rid of all the vcpu->mode mucking. :)

> I suspect what you want is a fast way to track if there *may* be pollers.  Keying
> off and *writing* vcpu->mode makes no sense to me.
>
> I think what you want is something like this, where kvm_vcpu_poll() could use
> atomic_fetch_or() and atomic_fetch_andnot() to manipulate vcpu->poll_mask.
> Or if we only want to support a single poller at a time, it could be a vanilla
> u64.  I suspect getting the poll_mask manipulation correct for multiple pollers
> would be tricky, e.g. to avoid false negatives and leave a poller hanging.

I'll have a go at the multiple poller approach.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 486800a7024b..5a260fb3b248 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -259,6 +259,14 @@ static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait)
>         return true;
>  }
>
> +static inline bool kvm_request_is_being_polled(struct kvm_vcpu *vcpu,
> +                                              unsigned int req)
> +{
> +       u32 poll_mask = kvm_request_to_poll_mask(req);
> +
> +       return (atomic_read(vcpu->poll_mask) & poll_mask)
> +}
> +
>  static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req,
>                                   struct cpumask *tmp, int current_cpu)
>  {
> @@ -285,6 +293,9 @@ static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req,
>                 if (cpu != -1 && cpu != current_cpu)
>                         __cpumask_set_cpu(cpu, tmp);
>         }
> +
> +       if (kvm_request_is_being_polled(vcpu, req))
> +               wake_up_interruptible(...);
>  }
>
>  bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,

I'll use this approach.

So since we have to provide a proper uAPI, do you have anything against
having user-space set the polling mask through an ioctl? Also any
suggestions on how kvm_request_to_poll_mask() should look like. For ex.
VSM mostly cares for regular interrupts/timers, so mapping

  KVM_REQ_UNBLOCK, KVM_REQ_HV_STIMER, KVM_REQ_EVENT, KVM_REQ_SMI,
  KVM_REQ_NMI

to a KVM_POLL_INTERRUPTS_FLAG would work. We can then have ad-hoc flags
for async-pf, kvmclock updates, dirty logging, etc...

Nicolas

  reply	other threads:[~2023-10-05 16:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-01 11:13 [RFC] KVM: Allow polling vCPUs for events Nicolas Saenz Julienne
2023-10-04 23:47 ` Sean Christopherson
2023-10-05 16:27   ` Nicolas Saenz Julienne [this message]
2023-10-06  1:16     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CW0NB512KP2E.2ZZD07F49LND3@amazon.com \
    --to=nsaenz@amazon.com \
    --cc=anelkz@amazon.de \
    --cc=dwmw2@infradead.org \
    --cc=fgriffo@amazon.com \
    --cc=graf@amazon.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.