Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb

From: Christian Borntraeger <borntraeger@de.ibm.com>
To: "David Hildenbrand" <david@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Jones <drjones@redhat.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	James Hogan <james.hogan@imgtec.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Christoffer Dall <christoffer.dall@linaro.org>
Subject: Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb
Date: Fri, 17 Feb 2017 12:28:09 +0100	[thread overview]
Message-ID: <af7c9d95-95c3-6220-8e1c-c496941ba052@de.ibm.com> (raw)
In-Reply-To: <7f521412-1e8f-e519-1274-5db3ec7d36b8@redhat.com>

On 02/17/2017 11:13 AM, David Hildenbrand wrote:
> 
>>> This is really complicated stuff, and the basic reason for it (if I
>>> remember correctly) is that s390x does reenable all interrupts when
>>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based
>>> kicks don't work (as it is otherwise just racy), and if I remember
>>> correctly, SMP reschedule signals (s390x external calls) would be
>>> slower. (Christian, please correct me if I'm wrong)
>>
>> No the reason was that there are some requests that need to be handled
>> outside run SIE. For example one reason was the guest prefix page.
>> This must be mapped read/write ALL THE TIME when a guest is running,
>> otherwise the host might crash. So we have to exit SIE and make sure that
>> it does not reenter, therefore we use the RELOAD_MMU request from a notifier
>> that is called from page table functions, whenever memory management decides
>> to unmap/write protect (dirty pages tracking, reference tracking, page migration
>> or compaction...)
>>
>> SMP-based request wills kick out the guest, but for some thing like the
>> one above it will be too late.
> 
> While what you said is 100% correct, I had something else in mind that
> hindered using vcpu_kick() and especially kvm_make_all_cpus_request().
> And I remember that being related to how preemption and
> OUTSIDE_GUEST_MODE is handled. I think this boils down to what would
> have to be implemented in kvm_arch_vcpu_should_kick().
> 
> x86 can track the guest state using vcpu->mode, because they can be sure
> that the guest can't reschedule while in the critical guest entry/exit
> section. This is not true for s390x, as preemption is enabled. That's
> why vcpu->mode cannot be used in its current form to track if a VCPU is
> in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently
> relies on this setting.
> 
> For now, calling vcpu_kick() on s390x will result in a BUG().
> 
> 
> On s390x, there are 3 use cases I see for requests:
> 
> 1. Remote requests that need a sync
> 
> Make a request, wait until SIE has been left and make sure the request
> will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU
> notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a
> candidate.
> 
> 2. Remote requests that don't need a sync
> 
> E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while
> KVM_REQ_DISABLE_IBS does.
> 
> 3. local requests
> 
> E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix()
> 
> 
> Of course, having a unified interface would be better.
> 
> /* set the request and kick the CPU out of guest mode */
> kvm_set_request(req, vcpu);
> 
> /* set the request, kick the CPU out of guest mode, wait until guest
> mode has been left and make sure the request will be handled before
> reentering guest mode */
> kvm_set_sync_request(req, vcpu);
> 
> 
> Same maybe even for multiple VCPUs (as there are then ways to speed it
> up, e.g. first kick all, then wait for all)
> 
> This would require arch specific callbacks to
> 1. pre announce the request (e.g. set PROG_REQUEST on s390x)
> 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later
> kvm_s390_vsie_kick(vcpu) on s390x)
> 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x)
> 
> This would only make sense if there are other use cases for sync
> requests. At least I remember that Power also has a faster way for
> kicking VCPUs, not involving SMP rescheds. I can't judge if this is a
> s390x only thing and is better be left as is :)

Hmmm, maybe we should simply enable kvm_vcpu_kick and kvm_vcpu_wake_up in
common code. They should work for the cases from common code. We must still
keep the s390 specific functions and we will call those from within s390 code
when necessary. Back then you tried to replace our functions and that had
issues (functionality wise and speed wise) - maybe just keeping those
is the easiest solution. 

For kvm_make_all_cpus_request we already have the slow path of waking all CPUs,
so that would be ok. So why not have vcpu->mode = IN_GUEST_MODE for the whole
inner loop of s390?

I will try to come up with a patch.

Christian