From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests Date: Tue, 4 Apr 2017 17:24:03 +0200 Message-ID: <20170404152403.GK11752@cbox> References: <20170331160658.4331-1-drjones@redhat.com> <20170331160658.4331-3-drjones@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com To: Andrew Jones Return-path: Received: from mail-wm0-f45.google.com ([74.125.82.45]:36388 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751092AbdDDPYB (ORCPT ); Tue, 4 Apr 2017 11:24:01 -0400 Received: by mail-wm0-f45.google.com with SMTP id o81so28540044wmb.1 for ; Tue, 04 Apr 2017 08:24:01 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20170331160658.4331-3-drjones@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Hi Drew, On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote: > Signed-off-by: Andrew Jones > --- > Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++ > 1 file changed, 114 insertions(+) > create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst > > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst > new file mode 100644 > index 000000000000..ea4a966d5c8a > --- /dev/null > +++ b/Documentation/virtual/kvm/vcpu-requests.rst > @@ -0,0 +1,114 @@ > +================= > +KVM VCPU Requests > +================= > + > +Overview > +======== > + > +KVM supports an internal API enabling threads to request a VCPU thread to > +perform some activity. For example, a thread may request a VCPU to flush > +its TLB with a VCPU request. The API consists of only four calls:: > + > + /* Check if VCPU @vcpu has request @req pending. Clears the request. */ > + bool kvm_check_request(int req, struct kvm_vcpu *vcpu); > + > + /* Check if any requests are pending for VCPU @vcpu. */ > + bool kvm_request_pending(struct kvm_vcpu *vcpu); > + > + /* Make request @req of VCPU @vcpu. */ > + void kvm_make_request(int req, struct kvm_vcpu *vcpu); > + > + /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */ > + bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); > + > +Typically a requester wants the VCPU to perform the activity as soon > +as possible after making the request. This means most requests, > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(), > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built > +into it. > + > +VCPU Kicks > +---------- > + > +A VCPU kick does one of three things: > + > + 1) wakes a sleeping VCPU (which sleeps outside guest mode). You could clarify this to say that a sleeping VCPU is a VCPU thread which is not runnable and placed on waitqueue, and waking it makes the thread runnable again. > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it > + out. > + 3) nothing, when the VCPU is already outside guest mode and not sleeping. > + > +VCPU Request Internals > +====================== > + > +VCPU requests are simply bit indices of the vcpu->requests bitmap. This > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests), > +may also be used. The first 8 bits are reserved for architecture > +independent requests, all additional bits are available for architecture > +dependent requests. Should we explain the ones that are generically defined and how they're supposed to be used? For example, we don't use them on ARM, and I don't think I understand why another thread would ever make a PENDING_TIMER request on a vcpu? > + > +VCPU Requests with Associated State > +=================================== > + > +Requesters that want the requested VCPU to handle new state need to ensure > +the state is observable to the requested VCPU thread's CPU at the time the nit: need to ensure that the newly written state is observable ... by the time it observed the request. > +CPU observes the request. This means a write memory barrier should be ^^^ must > +insert between the preparation of the state and the write of the VCPU ^^^ inserted I would rephrase this as: '... after writing the new state to memory and before setting the VCPU request bit.' > +request bitmap. Additionally, on the requested VCPU thread's side, a > +corresponding read barrier should be issued after reading the request bit ^^^ ^^^ must inserted (for consistency) > +and before proceeding to use the state associated with it. See the kernel ^^^ ^ read new > +memory barrier documentation [2]. I think it would be great if this document explains if this is currently taken care of by the API you explain above or if there are cases where people have to explicitly insert these barriers, and in that case, which barriers they should use (if we know at this point already). > + > +VCPU Requests and Guest Mode > +============================ > + I feel like an intro about the overall goal here is missing. How about something like this: When making requests to VCPUs, we want to avoid the receiving VCPU executing inside the guest for an arbitrary long time without handling the request. The way we prevent this from happening is by keeping track of when a VCPU is running and sending an IPI to the physical CPU running the VCPU when that is the case. However, each architecture implementation of KVM must take great care to ensure that requests are not missed when a VCPU stops running at the same time when a request is received. Also, I'm not sure what the semantics are with kvm_vcpu_block(). Is it ok to send a request to a VCPU and then the VCPU blocks and goes to sleep forever even though there are pending requests? kvm_vcpu_check_block() doesn't seem to check vcpu->requests which would indicate that this is the case, but maybe architectures that actually do use requests implement something else themselves? > +As long as the guest is either in guest mode, in which case it gets an IPI guest is in guest mode? Perhaps this could be more clearly written as: As long as the VCPU is running, it is marked as having vcpu->mode = IN_GUEST MODE. A requesting thread observing IN_GUEST_MODE will send an IPI to the CPU running the VCPU thread. On the other hand, when a requesting thread observes vcpu->mode == OUTSIDE_GUEST_MODE, it will not send any IPIs, but will simply set the request bit, a the VCPU thread will be able to check the requests before running the VCPU again. However, the transition... > +and will definitely see the request, or is outside guest mode, but has yet > +to do its final request check, and therefore when it does, it will see the > +request, then things will work. However, the transition from outside to > +inside guest mode, after the last request check has been made, opens a > +window where a request could be made, but the VCPU would not see until it > +exits guest mode some time later. See the table below. This text, and the table below, only deals with the details of entering the guest. Should we talk about kvm_vcpu_exiting_guest_mode() and anything related to exiting the guest? > + > ++------------------+-----------------+----------------+--------------+ > +| vcpu->mode | done last check | kick sends IPI | request seen | > ++==================+=================+================+==============+ > +| IN_GUEST_MODE | N/A | YES | YES | > ++------------------+-----------------+----------------+--------------+ > +| !IN_GUEST_MODE | NO | NO | YES | > ++------------------+-----------------+----------------+--------------+ > +| !IN_GUEST_MODE | YES | NO | NO | > ++------------------+-----------------+----------------+--------------+ > + > +To ensure the third scenario shown in the table above cannot happen, we > +need to ensure the VCPU's mode change is observable by all CPUs prior to > +its final request check and that a requester's request is observable by > +the requested VCPU prior to the kick. To do that we need general memory > +barriers between each pair of operations involving mode and requests, i.e. > + > + CPU_i CPU_j > +------------------------------------------------------------------------- > + vcpu->mode = IN_GUEST_MODE; kvm_make_request(REQ, vcpu); > + smp_mb(); smp_mb(); > + if (kvm_request_pending(vcpu)) if (vcpu->mode == IN_GUEST_MODE) > + handle_requests(); send_IPI(vcpu->cpu); > + > +Whether explicit barriers are needed, or reliance on implicit barriers is > +sufficient, is architecture dependent. Alternatively, an architecture may > +choose to just always send the IPI, as not sending it, when it's not > +necessary, is just an optimization. Is this universally true? This is certainly true on ARM, because we disable interrupts before doing all this, so the IPI remains pending and causes an immediate exit, but if any of the above is done with interrupts enabled, just sending an IPI does nothing to ensure the request is observed. Perhaps this is not a case we should care about. > + > +Additionally, the error prone third scenario described above also exhibits > +why a request-less VCPU kick is almost never correct. Without the > +assurance that a non-IPI generating kick will still result in an action by > +the requested VCPU, as the final kvm_request_pending() check does, then > +the kick may not initiate anything useful at all. If, for instance, a > +request-less kick was made to a VCPU that was just about to set its mode > +to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its > +entry without actually having done whatever it was the kick was meant to > +initiate. Indeed. > + > +References > +========== > + > +[1] Documentation/core-api/atomic_ops.rst > +[2] Documentation/memory-barriers.txt > -- > 2.9.3 > This is a great writeup! I enjoyed reading it and it made me think more carefully about a number of things, so I definitely think we should merge this. Thanks, -Christoffer