Re: [PATCH] arm64: KVM: Only skip MMIO insn once

From: Andrew Jones <drjones@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, mark.rutland@arm.com
Subject: Re: [PATCH] arm64: KVM: Only skip MMIO insn once
Date: Thu, 22 Aug 2019 12:42:01 +0200	[thread overview]
Message-ID: <20190822104201.czgbse2ffww6vzvl@kamzik.brq.redhat.com> (raw)
In-Reply-To: <bcf8fd9c-3784-5c03-bb34-d8e7fdcd9a06@kernel.org>

On Thu, Aug 22, 2019 at 10:38:52AM +0100, Marc Zyngier wrote:
> On 22/08/2019 10:25, Andrew Jones wrote:
> > On Thu, Aug 22, 2019 at 09:30:44AM +0100, Marc Zyngier wrote:
> >> Hi Drew,
> >>
> >> On 21/08/2019 20:50, Andrew Jones wrote:
> >>> If after an MMIO exit to userspace a VCPU is immediately run with an
> >>> immediate_exit request, such as when a signal is delivered or an MMIO
> >>> emulation completion is needed, then the VCPU completes the MMIO
> >>> emulation and immediately returns to userspace. As the exit_reason
> >>> does not get changed from KVM_EXIT_MMIO in these cases we have to
> >>> be careful not to complete the MMIO emulation again, when the VCPU is
> >>> eventually run again, because the emulation does an instruction skip
> >>> (and doing too many skips would be a waste of guest code :-) We need
> >>> to use additional VCPU state to track if the emulation is complete.
> >>> As luck would have it, we already have 'mmio_needed', which even
> >>> appears to be used in this way by other architectures already.
> >>>
> >>> Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation")
> >>> Signed-off-by: Andrew Jones <drjones@redhat.com>
> >>> ---
> >>>  virt/kvm/arm/arm.c  | 3 ++-
> >>>  virt/kvm/arm/mmio.c | 1 +
> >>>  2 files changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> >>> index 35a069815baf..322cf9030bbe 100644
> >>> --- a/virt/kvm/arm/arm.c
> >>> +++ b/virt/kvm/arm/arm.c
> >>> @@ -669,7 +669,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  	if (ret)
> >>>  		return ret;
> >>>  
> >>> -	if (run->exit_reason == KVM_EXIT_MMIO) {
> >>> +	if (vcpu->mmio_needed) {
> >>> +		vcpu->mmio_needed = 0;
> >>>  		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> >>>  		if (ret)
> >>>  			return ret;
> >>> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> >>> index a8a6a0c883f1..2d9b5e064ae0 100644
> >>> --- a/virt/kvm/arm/mmio.c
> >>> +++ b/virt/kvm/arm/mmio.c
> >>> @@ -201,6 +201,7 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>>  	if (is_write)
> >>>  		memcpy(run->mmio.data, data_buf, len);
> >>>  	vcpu->stat.mmio_exit_user++;
> >>> +	vcpu->mmio_needed	= 1;
> >>>  	run->exit_reason	= KVM_EXIT_MMIO;
> >>>  	return 0;
> >>>  }
> >>>
> >>
> >> Thanks for this. That's quite embarrassing. Out of curiosity,
> >> how was this spotted?
> > 
> > avocado has a guest execution state snapshotting feature. The feature
> > simply periodically uses QEMU's 'info registers' monitor command while
> > a guest is running. The monitor command kicks the vcpu to userspace with
> > a signal, and since avocado's snapshot rate was set relatively high that
> > increased the probability of causing a noticeable (weird things / guest
> > crashes) event during guest boot (when MMIO activity is also high). The
> > signals correlated with guest crashes lead me to this code.
> 
> Nice one. I guess I could try and reproduce it with the kvmtool debug
> feature that does a similar thing.

Since we don't even need the signals, we can just use kselftests to send a
few KVM_RUN ioctls. Here's one

From: Andrew Jones <drjones@redhat.com>
Date: Thu, 22 Aug 2019 06:38:26 -0400
Subject: [PATCH] KVM: selftests: Demonstrate AArch64 extra instruction skip

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../testing/selftests/kvm/aarch64/vcpu-skip.c | 40 +++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/vcpu-skip.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ba7849751989..1b90b99bc351 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -28,6 +28,7 @@ TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 
+TEST_GEN_PROGS_aarch64 += aarch64/vcpu-skip
 TEST_GEN_PROGS_aarch64 += clear_dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/aarch64/vcpu-skip.c b/tools/testing/selftests/kvm/aarch64/vcpu-skip.c
new file mode 100644
index 000000000000..f3a747739d36
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/vcpu-skip.c
@@ -0,0 +1,40 @@
+#include "kvm_util.h"
+#include "../lib/kvm_util_internal.h"
+
+#define MMIO 0xb000000
+
+static uint64_t result;
+
+static void guest_code(void)
+{
+	asm volatile(
+		"mov	x0, #1\n"
+		"str	x0, [%0]\n"
+		"str	x0, [%1]\n"
+		"str	x0, [%0]\n"
+	: : "r" (MMIO), "r" (&result) : "x0");
+}
+
+int main(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	int fd;
+
+	vm = vm_create_default(1, 0, guest_code);
+	virt_pg_map(vm, MMIO, MMIO, 0);
+	run = vcpu_state(vm, 1);
+	fd = vcpu_find(vm, 1)->fd;
+
+	ioctl(fd, KVM_RUN, NULL);
+
+	run->immediate_exit = 1;
+	ioctl(fd, KVM_RUN, NULL);
+
+	run->immediate_exit = 0;
+	ioctl(fd, KVM_RUN, NULL);
+
+	sync_global_from_guest(vm, result);
+	TEST_ASSERT(result, "Skipped instruction");
+	return 0;
+}
-- 
2.18.1

> 
> >> Patch wise, I'd have a small preference for the following (untested)
> >> patch, as it keeps the mmio_needed accesses close together, making
> >> it easier to read (at least for me). What do you think?
> >>
> >> 	M.
> >>
> >> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> >> index a8a6a0c883f1..6af5c91337f2 100644
> >> --- a/virt/kvm/arm/mmio.c
> >> +++ b/virt/kvm/arm/mmio.c
> >> @@ -86,6 +86,12 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>  	unsigned int len;
> >>  	int mask;
> >>  
> >> +	/* Detect an already handled MMIO return */
> >> +	if (unlikely(!vcpu->mmio_needed))
> >> +		return 0;
> >> +
> >> +	vcpu->mmio_needed = 0;
> >> +
> >>  	if (!run->mmio.is_write) {
> >>  		len = run->mmio.len;
> >>  		if (len > sizeof(unsigned long))
> >> @@ -188,6 +194,7 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>  	run->mmio.is_write	= is_write;
> >>  	run->mmio.phys_addr	= fault_ipa;
> >>  	run->mmio.len		= len;
> >> +	vcpu->mmio_needed	= 1;
> >>  
> >>  	if (!ret) {
> >>  		/* We handled the access successfully in the kernel. */
> > 
> > That looks good to me. Should I repost?
> 
> Yes please. I'll try to get Paolo to pick it as quickly as possible.

On it's way

Thanks,
drew