Re: [PATCH 5/8] kvm: Add cap/kvm_run field for memory fault exits

From: Anish Moorthy <amoorthy@google.com>
To: Sean Christopherson <seanjc@google.com>,
	Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Ben Gardon <bgardon@google.com>,
	David Matlack <dmatlack@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev, peterx@redhat.com
Subject: Re: [PATCH 5/8] kvm: Add cap/kvm_run field for memory fault exits
Date: Fri, 17 Feb 2023 11:14:37 -0800	[thread overview]
Message-ID: <CAF7b7mqeXcHdFHewX3enn-vxf6y7CUWjXjB3TXithZ_PnzVLQQ@mail.gmail.com> (raw)
In-Reply-To: <Y+6iX6a22+GEuH1b@google.com>

On Thu, Feb 16, 2023 at 10:53 AM Anish Moorthy <amoorthy@google.com> wrote:
>
> On Wed, Feb 15, 2023 at 12:59 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > How is userspace expected to differentiate the gup_fast() failed exit
> > from the guest-private memory exit? I don't think flags are a good idea
> > for this, as it comes with the illusion that both events can happen on a
> > single exit. In reality, these are mutually exclusive.
> >
> > A fault type/code would be better here, with the option to add flags at
> > a later date that could be used to further describe the exit (if
> > needed).
>
> Agreed. Something like this, then?
>
> +    struct {
> +        __u32 fault_code;
> +        __u64 reserved;
> +        __u64 gpa;
> +        __u64 size;
> +    } memory_fault;
>
> The "reserved" field is meant to be the placeholder for a future "flags" field.
> Let me know if there's a better/more conventional way to achieve this.

On Thu, Feb 16, 2023 at 10:53 AM Anish Moorthy <amoorthy@google.com> wrote:
>
> 1. As Oliver touches on earlier, we'll probably want to use this same field for
>    different classes of memory fault in the future (such as the ones which Chao
>    is introducing in [1]): so it does make sense to add "code" and "flags"
>    fields which can be used to communicate more information to the user (and
>    which can just be set to MEM_FAULT_NOWAIT/0 in this series).

Let me walk back my responses here: I took a closer look at Chao's series, and
it doesn't seem that I should be trying to share KVM_EXIT_MEMORY_FAULT with it
in the first place. As far as I can understand (not that much, to be clear :)
we're signaling unrelated things, so it makes more sense to use different exits
(ie, rename mine to KVM_EXIT_MEMORY_FAULT_NOWAIT). That would prevent any
potential confusion about mutual exclusivity.

That also removes the need for a "fault_code" field in "memory_fault", which I
could rename to something more general like "guest_memory_range". As for the
"reserved" field, we could keep it around if we care about reusing
"guest_memory_range" between exits, or discard it if we don't.

On Thu, Feb 16, 2023 at 1:38 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Feb 16, 2023, Anish Moorthy wrote:
> > On Wed, Feb 15, 2023 at 12:59 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > index 109b18e2789c4..9352e7f8480fb 100644
> > > > > --- a/include/linux/kvm_host.h
> > > > > +++ b/include/linux/kvm_host.h
> > > > > @@ -801,6 +801,9 @@ struct kvm {
> > > > >     bool vm_bugged;
> > > > >     bool vm_dead;
> > > > >
> > > > > +   rwlock_t mem_fault_nowait_lock;
> > > > > +   bool mem_fault_nowait;
> > > >
> > > > A full-fat rwlock to protect a single bool? What benefits do you
> > > > expect from a rwlock? Why is it preferable to an atomic access, or a
> > > > simple bitop?
> > >
> > > There's no need to have any kind off dedicated atomicity.  The only readers are
> > > in vCPU context, just disallow KVM_CAP_MEM_FAULT_NOWAIT after vCPUs are created.
> >
> > I think we do need atomicity here.
>
> Atomicity, yes.  Mutually exclusivity, no.  AFAICT, nothing will break if userspace
> has multiple in-flight calls to toggled the flag.  And if we do want to guarantee
> there's only one writer, then kvm->lock or kvm->slots_lock will suffice.
>
> > Since we want to support this without having to pause vCPUs, there's an
> > atomicity requirement.
>
> Ensuring that vCPUs "see" the new value and not corrupting memory are two very
> different things.  Making the flag an atomic, wrapping with a rwlock, etc... do
> nothing to ensure vCPUs observe the new value.  And for non-crazy usage of bools,
> they're not even necessary to avoid memory corruption...

Oh, that's news to me- I've learned to treat any unprotected concurrent accesses
to memory as undefined behavior: guess there's always more to learn.

> I think what you really want to achieve is that vCPUs observe the NOWAIT flag
> before KVM returns to userspace.  There are a variety of ways to make that happen,
> but since this all about accessing guest memory, the simplest is likely to
> "protect" the flag with kvm->srcu, i.e. require SRCU be held by readers and then
> do a synchronize_srcu() to ensure all vCPUs have picked up the new value.
>
> Speaking of SRCU (which protect memslots), why not make this a memslot flag?  If
> the goal is to be able to turn the behavior on/off dynamically, wouldn't it be
> beneficial to turn off the NOWAIT behavior when a memslot is fully transfered?

Thanks for the suggestions, I never considered making this a memslot flag
actually. so I'll go look into that.