From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B51AC35242 for ; Tue, 11 Feb 2020 23:12:32 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EA2E6206DB for ; Tue, 11 Feb 2020 23:12:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="X/5roPCc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA2E6206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:58244 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1ehj-0007hx-2z for qemu-devel@archiver.kernel.org; Tue, 11 Feb 2020 18:12:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:60167) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1egV-0006Qw-J8 for qemu-devel@nongnu.org; Tue, 11 Feb 2020 18:11:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j1egS-0005SX-Ok for qemu-devel@nongnu.org; Tue, 11 Feb 2020 18:11:15 -0500 Received: from mail-io1-xd42.google.com ([2607:f8b0:4864:20::d42]:40995) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j1egS-0005Rk-Gx for qemu-devel@nongnu.org; Tue, 11 Feb 2020 18:11:12 -0500 Received: by mail-io1-xd42.google.com with SMTP id m25so7322ioo.8 for ; Tue, 11 Feb 2020 15:11:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UaJqnlppmmo5Euf5kNzvGUdR8aDw1HJr8Rhq4ma/w6A=; b=X/5roPCcT84wBD5huDg0ZNJBk2XZoMXWbJQaOIExytH4PmWNVlrNZw7/eoxIpMYhD3 sXEMZqgvsRIstpYcXAOwREEJoOwBXkW/VFx/WNCLy7wQyZw8O6bM5MbR6uauuVyxz5/c PlzGqiyozt+jPc7v51mXBEgLiS6AENwgnfdHxFgjh0sscmfTvAbyu02Ggmt4mIuCtg98 7vUBQunQ88lDKuN9OzLKZyA20twj1oTBgU+h6JpvUQ8Zwjc08NZA1dxx7IHtAUco7uMF XBn/dSmi+OgVAxRCIHYvY8XMzv57vzVo9SHTOb756B8vJa4J7vxAl/699wbuT7XypQEy s+Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UaJqnlppmmo5Euf5kNzvGUdR8aDw1HJr8Rhq4ma/w6A=; b=ikeaQ0jn8f5yD5rBz5qbPsXpwaJNaiVnfVK+PpgN8jqrq1Ya9ob7aYpVAW5uKf+Euq F3X3MQXFnWsRIdD4r5Xzlf7eO1nHlRySU/gZahGmauucis1Fbi5mPKTBZGyGwxtaXaAJ it3OJ3fCV+82Ykbj9bqtvkxJMqSZXwA1mb+KOstltE77ccEkCgHKJJyjwDBrFPbzXz5l jOP1cvOE6Zx5MhrWXb01GbtTvafqnxxmiXtMAOAq5vq8cdq5eo23Wwj3O5QEqp/HWyFp yHABPXWfBMThgfozn0sHThcpzrGSDQ5le/wK6uparQKgIv6klCFuJ4rTIg3Kn2U46zfV dsYQ== X-Gm-Message-State: APjAAAVzdwQi+awpBzkCbS6S3aFMLO8Y2qf2V71pcHkDN9jmXH5ib+k5 hThksJApXmVIH/iBKgS9Z8XpUul3VSo8u+8Kfh6U+w== X-Google-Smtp-Source: APXvYqwVTzuQL8eauDDOUd895nP3R94cc9Ppz2BTB7izsqKyO5GCmo8ZtahKQMLv9vLMHGWqR1LbinAzrH9rDny1va8= X-Received: by 2002:a02:c78f:: with SMTP id n15mr16717359jao.100.1581462670740; Tue, 11 Feb 2020 15:11:10 -0800 (PST) MIME-Version: 1.0 References: <20200129202441.12745-1-beata.michalska@linaro.org> <20200129202441.12745-3-beata.michalska@linaro.org> <20200205165739.2jkklbpmy3yrdq3q@kamzik.brq.redhat.com> <20200207081950.4gxxlzwgjgfr6zur@kamzik.brq.redhat.com> In-Reply-To: <20200207081950.4gxxlzwgjgfr6zur@kamzik.brq.redhat.com> From: Beata Michalska Date: Tue, 11 Feb 2020 23:10:58 +0000 Message-ID: Subject: Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS To: Andrew Jones Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::d42 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Christoffer Dall , QEMU Developers , qemu-arm , Paolo Bonzini , kvmarm@lists.cs.columbia.edu Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Fri, 7 Feb 2020 at 08:20, Andrew Jones wrote: > > On Thu, Feb 06, 2020 at 09:48:05PM +0000, Beata Michalska wrote: > > On Wed, 5 Feb 2020 at 16:57, Andrew Jones wrote: > > > > > > On Wed, Jan 29, 2020 at 08:24:41PM +0000, Beata Michalska wrote: > > > > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort > > > > exception with no valid ISS info to be decoded. The lack of decode info > > > > makes it at least tricky to emulate those instruction which is one of the > > > > (many) reasons why KVM will not even try to do so. > > > > > > > > Add support for handling those by requesting KVM to inject external > > > > dabt into the quest. > > > > > > > > Signed-off-by: Beata Michalska > > > > --- > > > > target/arm/cpu.h | 2 ++ > > > > target/arm/kvm.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > target/arm/kvm32.c | 3 ++ > > > > target/arm/kvm64.c | 3 ++ > > > > target/arm/kvm_arm.h | 19 +++++++++++ > > > > 5 files changed, 123 insertions(+) > > > > > > > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h > > > > index c1aedbe..e04a8d3 100644 > > > > --- a/target/arm/cpu.h > > > > +++ b/target/arm/cpu.h > > > > @@ -558,6 +558,8 @@ typedef struct CPUARMState { > > > > uint8_t has_esr; > > > > uint64_t esr; > > > > } serror; > > > > + /* Status field for pending external dabt */ > > > > + uint8_t ext_dabt_pending; > > > > > > > > /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */ > > > > uint32_t irq_line_state; > > > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > > > > index 8d82889..e7bc9b7 100644 > > > > --- a/target/arm/kvm.c > > > > +++ b/target/arm/kvm.c > > > > @@ -37,6 +37,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { > > > > > > > > static bool cap_has_mp_state; > > > > static bool cap_has_inject_serror_esr; > > > > +static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */ > > > > > > nit: the KVM_CAP_ARM_INJECT_EXT_DABT comment is unnecessary > > > > Might be - I just find it handy when looking for related details. > > I will remove that one though. > > > > > > > > > > > > > static ARMHostCPUFeatures arm_host_cpu_features; > > > > > > > > @@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs) > > > > KVM_CAP_ARM_INJECT_SERROR_ESR); > > > > } > > > > > > > > +void kvm_arm_init_ext_dabt_injection(CPUState *cs) > > > > +{ > > > > + cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state, > > > > + KVM_CAP_ARM_INJECT_EXT_DABT); > > > > +} > > > > + > > > > bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try, > > > > int *fdarray, > > > > struct kvm_vcpu_init *init) > > > > @@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > > > > ret = -EINVAL; > > > > } > > > > > > > > + if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) > > > > + if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) { > > > > + warn_report("Failed to enable DABT NISV cap"); > > > > + } > > > > + > > > > > > Missing {} around the outer block. > > > > Checkpatch didn't complain ... > > Will fix that. > > > > > > > > As KVM_CAP_ARM_INJECT_EXT_DABT is a VM capability then I think we should > > > set cap_has_inject_ext_dabt here, like cap_has_mp_state gets set. I see > > > you've followed the pattern used for cap_has_inject_serror_esr, but that > > > looks wrong too since KVM_CAP_ARM_INJECT_SERROR_ESR is also a VM > > > capability. The way it is now we just keep setting > > > cap_has_inject_serror_esr to the same value, NR_VCPUS times. > > > > > You are totally right - I have completely missed that point! Thanks. > > > > > > return ret; > > > > } > > > > > > > > @@ -598,6 +610,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu) > > > > events.exception.serror_esr = env->serror.esr; > > > > } > > > > > > > > + if (cap_has_inject_ext_dabt) { > > > > + events.exception.ext_dabt_pending = env->ext_dabt_pending; > > > > + } > > > > + > > > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events); > > > > if (ret) { > > > > error_report("failed to put vcpu events"); > > > > @@ -627,6 +643,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu) > > > > env->serror.has_esr = events.exception.serror_has_esr; > > > > env->serror.esr = events.exception.serror_esr; > > > > > > > > + env->ext_dabt_pending = events.exception.ext_dabt_pending; > > > > + > > > > > > afaict from Documentation/virt/kvm/api.txt and the KVM code you cannot > > > get this state. Therefore the above line (and extra stray blank line) > > > should be dropped. > > > > > That's true, though this is a lightweight way of resetting the vcpu state. > > We would have to do that otherwise to mark that this case has been handled > > and that the abort is no longer pending. > > There's no reason to wait until get-vcpu-events time to reset this. > According to the KVM documentation (and the code) the event is > immediately delivered at vcpu_ioctl(KVM_SET_VCPU_EVENTS) time. > So I think the kvm_put_vcpu_events() patch should be > > if (env->ext_dabt_pending) { > events.exception.ext_dabt_pending = env->ext_dabt_pending; > env->ext_dabt_pending = 0; > } > > (env->ext_dabt_pending will only be non-zero if we have the capability) > I would expect this then to be reset only after successful ioctl call. Will update it. > > > > > > return 0; > > > > } > > > > > > > > @@ -634,6 +652,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run) > > > > { > > > > } > > > > > > > > + > > > > > > stray blank line > > > > > > > MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run) > > > > { > > > > ARMCPU *cpu; > > > > @@ -699,6 +718,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) > > > > ret = EXCP_DEBUG; > > > > } /* otherwise return to guest */ > > > > break; > > > > + case KVM_EXIT_ARM_NISV: > > > > + /* External DABT with no valid iss to decode */ > > > > + ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss, > > > > + run->arm_nisv.fault_ipa); > > > > + break; > > > > default: > > > > qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n", > > > > __func__, run->exit_reason); > > > > @@ -833,3 +857,75 @@ int kvm_arch_msi_data_to_gsi(uint32_t data) > > > > { > > > > return (data - 32) & 0xffff; > > > > } > > > > + > > > > +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss, > > > > + uint64_t fault_ipa) > > > > +{ > > > > + ARMCPU *cpu = ARM_CPU(cs); > > > > + CPUARMState *env = &cpu->env; > > > > + uint32_t ins, ins_fetched; > > > > > > ins_fetched is a bool > > > > Actually that's int (as per cpu_memory_rw_debug) > > But that's not the way you're using it here. It you really want to > ensure the return value is converted to a bool, then you could do > > ins_fetched = !!cpu_memory_rw_debug() > > > > > > > > + > > > > + /* > > > > + * Hacky workaround for kernels that for aarch32 guests, instead of expected > > > > + * external data abort, inject the IMPLEMENTATION DEFINED exception with the > > > > + * lock-down. This is actually handled by the guest which results in > > > > + * re-running the faulting instruction. > > > > + * This intends to break the vicious cycle. > > > > + */ > > > > > > This doesn't seem like the right thing to do. What happens on real > > > hardware in this case? If an OS would get into a "vicious cycle" on > > > real hardware then it should get into one on KVM too. > > > > > That's the problem. that would not happen on a real hardware. > > We might end up here due to a KVM bug (which has been fixed recently) > > when the injected > > abort is not the one expected (as of not the one that would be > > triggered by hw in this > > particular case). > > Details in : https://patchwork.kernel.org/patch/11358083/ > > If KVM can be fixed (and in this case already is fixed - 21aecdbd7f3a), > then doesn't it make more sense to just fix KVM, than to add a workaround > to QEMU? > This is to cover the cases (hopefully rare but still) when the host kernel won't get updated, leaving the issue exposed to the guest. Not sure what or if there is any policy regarding that in Qemu (?) There is also no mechanism within KVM to allow the user space to verify whether the fix is available or not. Capabilities are not designed for that (if I got things right) unless user-space interface is broken ... which is not entirely the case here. > > > > > > + if (!is_a64(env)) { > > > > + static uint8_t setback; > > > > + > > > > + /* > > > > + * The state has not been synchronized yet, so if this is re-occurrence > > > > + * of the same abort triggered by guest, the status for pending external > > > > + * abort should not get cleared yet > > > > + */ > > > > + if (unlikely(env->ext_dabt_pending)) { > > > > + if (setback) { > > > > + error_report("Most probably triggered kernel issue with" > > > > + " injecting external data abort."); > > > > + error_printf("Giving up trying ...\n"); > > > > + /* Here is where the fun comes to an end */ > > > > + return -EINVAL; > > > > + } > > > > + } > > > > + setback = env->ext_dabt_pending; > > > > + } > > > > + > > > > + kvm_cpu_synchronize_state(cs); > > > > + /* > > > > + * ISS [23:14] is invalid so there is a limited info > > > > + * on what has just happened so the only *useful* thing that can > > > > + * be retrieved from ISS is WnR & DFSC (though in some cases WnR > > > > + * might be less of a value as well) > > > > + */ > > > > + > > > > + /* > > > > + * Get current PC before it will get updated to exception vector entry > > > > + */ > > > > + target_ulong ins_addr = is_a64(env) ? env->pc : env->regs[15]; > > > > > > ins_addr should be declared above > > > > > > But what are we doing? pc is a guest virtual address. Oh, I see... > > > > > > > + > > > > + /* > > > > + * Get the faulting instruction > > > > + * Instructions have a fixed length of 32 bits > > > > + * and are always little-endian > > > > + */ > > > > + ins_fetched = !cpu_memory_rw_debug(cs, ins_addr, (uint8_t *) &ins, > > > > + sizeof(uint32_t), 0); > > > > > > ... we're trying to actual walk the KVM guest's page tables. That > > > seems a bit odd, and the "_debug" function name used for it makes > > > it seem even more odd. > > > > > > > + > > > > + error_report("Data abort exception with no valid ISS generated by " > > > > + "guest memory access at physical address: 0x" TARGET_FMT_lx, > > > > + (target_ulong)fault_ipa); > > > > + > > > > + error_printf(ins_fetched ? "%s : 0x%" PRIx32 " (ins encoded)\n" : "%s\n", > > > > + "KVM unable to emulate faulting instruction", > > > > + (!ins_fetched ? 0 : ldl_le_p(&ins))); > > > > + error_printf("Injecting a data abort exception into guest.\n"); > > > > > > The guest shouldn't be able to generate three lines of errors on the host > > > whenever it wants. That's a security bug. One trace point here seems like > > > the most we should do. Or, after these three lines we should kill the > > > guest. > > > > > You have a point here, this can indeed be exploited by the malicious > > guest. Not sure if killing it is necessary here. > > I could limit the logging though that gets tricky for aborts triggered by > > user-space processes which would probably leave the guest running > > without much of an issue there. > > I can y get rid of logging the additional info and keep the single > > statement of an instruction not being emulated but that still leaves an > > open door for potential exploits... > > > > > Actually, I don't think we should attempt to get the instruction at all. > > > We should do what the KVM documenation suggests we do when we get > > > this exit. We should either do a user selected action: one of suspend, > > > dump, restart, or inject a dabt (as is done below). The last choice hopes > > > that the guest will handle it in some graceful way, or that it'll crash > > > with all the information needed for a post-mortem crash investigation. > > > > > > And I don't think we should do the "very brave" option of trying to > > > emulate the instruction, even if we identify it as a valid MMIO address. > > > That just sounds like a huge can of worms. > > > > > > Does QEMU already have a way for users to select a > > > don't-know-how-to-handle-guest-exception behavior? > > > > > > > The function is attempting to inject the external data abort. The rest is for > > the aftermath analysis to easy figuring out what has happened > > which will not be so easy in case of user-space process triggering > > the DABT with no valid ISS. There is no attempt of emulating the instruction > > just simply saying the guest tried to access this address with this instruction > > which couldn't be handled. Might be excessive in some cases > > (like misbehaving kernel) but it also might be handy on those rare > > occasions when there is not much to analyze. > > But as per the issue you have raised above I will rework that. > > > > Not sure if there is a mechanism to specify the 'preferred' behavior though. > > I think a mechanism for a preferred behavior would be nice, because > all options make sense, depending on the use case. I'm not sure what > the right default is, though. Maybe this dabt injection, but there's > a good chance it'll just loop here. So maybe QEMU should just abort. > Aborting should be the safest default (from the host's PoV). I can still play around with this for a bit to make it more .. sensible, keeping in mind the potential risk with the current solution you have pointed out. I would still like to keep the abort injection as that is the main reason behind having this change. It might be a bit tricky, (time-wise) to add the 'def behaviour' interface though. Is there anything else that could potentially benefit from it ? BR Beata > > Thanks, > drew > > > > > > BR > > > > > > Beata > > > > > > > > + /* > > > > + * Set pending ext dabt and trigger SET_EVENTS so that > > > > + * KVM can inject the abort > > > > + */ > > > > + env->ext_dabt_pending = 1; > > > > + > > > > + return 0; > > > > +} > > > > > > Thanks, > > > drew > > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C616C35242 for ; Tue, 11 Feb 2020 23:11:18 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id EF9F8206DB for ; Tue, 11 Feb 2020 23:11:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="X/5roPCc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF9F8206DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvmarm-bounces@lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 731D74A94D; Tue, 11 Feb 2020 18:11:16 -0500 (EST) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@linaro.org Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bOxEC+flmt2Q; Tue, 11 Feb 2020 18:11:14 -0500 (EST) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id B77374AE96; Tue, 11 Feb 2020 18:11:14 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 4AAB34A94D for ; Tue, 11 Feb 2020 18:11:13 -0500 (EST) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8W47OG6ZO0bn for ; Tue, 11 Feb 2020 18:11:11 -0500 (EST) Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 9A0064A65C for ; Tue, 11 Feb 2020 18:11:11 -0500 (EST) Received: by mail-io1-f66.google.com with SMTP id z193so45497iof.1 for ; Tue, 11 Feb 2020 15:11:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UaJqnlppmmo5Euf5kNzvGUdR8aDw1HJr8Rhq4ma/w6A=; b=X/5roPCcT84wBD5huDg0ZNJBk2XZoMXWbJQaOIExytH4PmWNVlrNZw7/eoxIpMYhD3 sXEMZqgvsRIstpYcXAOwREEJoOwBXkW/VFx/WNCLy7wQyZw8O6bM5MbR6uauuVyxz5/c PlzGqiyozt+jPc7v51mXBEgLiS6AENwgnfdHxFgjh0sscmfTvAbyu02Ggmt4mIuCtg98 7vUBQunQ88lDKuN9OzLKZyA20twj1oTBgU+h6JpvUQ8Zwjc08NZA1dxx7IHtAUco7uMF XBn/dSmi+OgVAxRCIHYvY8XMzv57vzVo9SHTOb756B8vJa4J7vxAl/699wbuT7XypQEy s+Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UaJqnlppmmo5Euf5kNzvGUdR8aDw1HJr8Rhq4ma/w6A=; b=OJN///zh4bQQ5Fhwgq4vJka8zM4EcG1VNO/krYYaz9PjNU7QxZBNauEfjoZLManXPz Lonp8APkvphnyR+x9p9cnGwlOb9rRPz+44hLeHioOvXE4LjWii4RwbCsQASNPkS/hS5J IgfdDJOQXqppiNVgDjksR0dH6afCShrsbw7qYEoujbZ2q7Vi4YR1wH+oXIlGCU4pt+cQ BMd2kA2kUfsscBoJSQRKHrY22J13czrqKPdyA3IZb9ni4P3iA1vurhjT/zbsAVGhq297 wy+9ZboMLOi+XB8XG3kCpe9S8vHbWp3eDvmzu0VhSR2RphWZRfIHk+l/4HyENhAviXET A7IQ== X-Gm-Message-State: APjAAAVXRkwSpu1mbQu+eH2uiWs7HBCs7f7pnEecotPZvrBoGpmYWs+D wlKXcIa3wnf6sFvxEJkZNJvJ86NwXWzAukPrzjy6mw== X-Google-Smtp-Source: APXvYqwVTzuQL8eauDDOUd895nP3R94cc9Ppz2BTB7izsqKyO5GCmo8ZtahKQMLv9vLMHGWqR1LbinAzrH9rDny1va8= X-Received: by 2002:a02:c78f:: with SMTP id n15mr16717359jao.100.1581462670740; Tue, 11 Feb 2020 15:11:10 -0800 (PST) MIME-Version: 1.0 References: <20200129202441.12745-1-beata.michalska@linaro.org> <20200129202441.12745-3-beata.michalska@linaro.org> <20200205165739.2jkklbpmy3yrdq3q@kamzik.brq.redhat.com> <20200207081950.4gxxlzwgjgfr6zur@kamzik.brq.redhat.com> In-Reply-To: <20200207081950.4gxxlzwgjgfr6zur@kamzik.brq.redhat.com> From: Beata Michalska Date: Tue, 11 Feb 2020 23:10:58 +0000 Message-ID: Subject: Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS To: Andrew Jones Cc: QEMU Developers , qemu-arm , Paolo Bonzini , kvmarm@lists.cs.columbia.edu X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, 7 Feb 2020 at 08:20, Andrew Jones wrote: > > On Thu, Feb 06, 2020 at 09:48:05PM +0000, Beata Michalska wrote: > > On Wed, 5 Feb 2020 at 16:57, Andrew Jones wrote: > > > > > > On Wed, Jan 29, 2020 at 08:24:41PM +0000, Beata Michalska wrote: > > > > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort > > > > exception with no valid ISS info to be decoded. The lack of decode info > > > > makes it at least tricky to emulate those instruction which is one of the > > > > (many) reasons why KVM will not even try to do so. > > > > > > > > Add support for handling those by requesting KVM to inject external > > > > dabt into the quest. > > > > > > > > Signed-off-by: Beata Michalska > > > > --- > > > > target/arm/cpu.h | 2 ++ > > > > target/arm/kvm.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > target/arm/kvm32.c | 3 ++ > > > > target/arm/kvm64.c | 3 ++ > > > > target/arm/kvm_arm.h | 19 +++++++++++ > > > > 5 files changed, 123 insertions(+) > > > > > > > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h > > > > index c1aedbe..e04a8d3 100644 > > > > --- a/target/arm/cpu.h > > > > +++ b/target/arm/cpu.h > > > > @@ -558,6 +558,8 @@ typedef struct CPUARMState { > > > > uint8_t has_esr; > > > > uint64_t esr; > > > > } serror; > > > > + /* Status field for pending external dabt */ > > > > + uint8_t ext_dabt_pending; > > > > > > > > /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */ > > > > uint32_t irq_line_state; > > > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > > > > index 8d82889..e7bc9b7 100644 > > > > --- a/target/arm/kvm.c > > > > +++ b/target/arm/kvm.c > > > > @@ -37,6 +37,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { > > > > > > > > static bool cap_has_mp_state; > > > > static bool cap_has_inject_serror_esr; > > > > +static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */ > > > > > > nit: the KVM_CAP_ARM_INJECT_EXT_DABT comment is unnecessary > > > > Might be - I just find it handy when looking for related details. > > I will remove that one though. > > > > > > > > > > > > > static ARMHostCPUFeatures arm_host_cpu_features; > > > > > > > > @@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs) > > > > KVM_CAP_ARM_INJECT_SERROR_ESR); > > > > } > > > > > > > > +void kvm_arm_init_ext_dabt_injection(CPUState *cs) > > > > +{ > > > > + cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state, > > > > + KVM_CAP_ARM_INJECT_EXT_DABT); > > > > +} > > > > + > > > > bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try, > > > > int *fdarray, > > > > struct kvm_vcpu_init *init) > > > > @@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > > > > ret = -EINVAL; > > > > } > > > > > > > > + if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER)) > > > > + if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) { > > > > + warn_report("Failed to enable DABT NISV cap"); > > > > + } > > > > + > > > > > > Missing {} around the outer block. > > > > Checkpatch didn't complain ... > > Will fix that. > > > > > > > > As KVM_CAP_ARM_INJECT_EXT_DABT is a VM capability then I think we should > > > set cap_has_inject_ext_dabt here, like cap_has_mp_state gets set. I see > > > you've followed the pattern used for cap_has_inject_serror_esr, but that > > > looks wrong too since KVM_CAP_ARM_INJECT_SERROR_ESR is also a VM > > > capability. The way it is now we just keep setting > > > cap_has_inject_serror_esr to the same value, NR_VCPUS times. > > > > > You are totally right - I have completely missed that point! Thanks. > > > > > > return ret; > > > > } > > > > > > > > @@ -598,6 +610,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu) > > > > events.exception.serror_esr = env->serror.esr; > > > > } > > > > > > > > + if (cap_has_inject_ext_dabt) { > > > > + events.exception.ext_dabt_pending = env->ext_dabt_pending; > > > > + } > > > > + > > > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events); > > > > if (ret) { > > > > error_report("failed to put vcpu events"); > > > > @@ -627,6 +643,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu) > > > > env->serror.has_esr = events.exception.serror_has_esr; > > > > env->serror.esr = events.exception.serror_esr; > > > > > > > > + env->ext_dabt_pending = events.exception.ext_dabt_pending; > > > > + > > > > > > afaict from Documentation/virt/kvm/api.txt and the KVM code you cannot > > > get this state. Therefore the above line (and extra stray blank line) > > > should be dropped. > > > > > That's true, though this is a lightweight way of resetting the vcpu state. > > We would have to do that otherwise to mark that this case has been handled > > and that the abort is no longer pending. > > There's no reason to wait until get-vcpu-events time to reset this. > According to the KVM documentation (and the code) the event is > immediately delivered at vcpu_ioctl(KVM_SET_VCPU_EVENTS) time. > So I think the kvm_put_vcpu_events() patch should be > > if (env->ext_dabt_pending) { > events.exception.ext_dabt_pending = env->ext_dabt_pending; > env->ext_dabt_pending = 0; > } > > (env->ext_dabt_pending will only be non-zero if we have the capability) > I would expect this then to be reset only after successful ioctl call. Will update it. > > > > > > return 0; > > > > } > > > > > > > > @@ -634,6 +652,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run) > > > > { > > > > } > > > > > > > > + > > > > > > stray blank line > > > > > > > MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run) > > > > { > > > > ARMCPU *cpu; > > > > @@ -699,6 +718,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) > > > > ret = EXCP_DEBUG; > > > > } /* otherwise return to guest */ > > > > break; > > > > + case KVM_EXIT_ARM_NISV: > > > > + /* External DABT with no valid iss to decode */ > > > > + ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss, > > > > + run->arm_nisv.fault_ipa); > > > > + break; > > > > default: > > > > qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n", > > > > __func__, run->exit_reason); > > > > @@ -833,3 +857,75 @@ int kvm_arch_msi_data_to_gsi(uint32_t data) > > > > { > > > > return (data - 32) & 0xffff; > > > > } > > > > + > > > > +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss, > > > > + uint64_t fault_ipa) > > > > +{ > > > > + ARMCPU *cpu = ARM_CPU(cs); > > > > + CPUARMState *env = &cpu->env; > > > > + uint32_t ins, ins_fetched; > > > > > > ins_fetched is a bool > > > > Actually that's int (as per cpu_memory_rw_debug) > > But that's not the way you're using it here. It you really want to > ensure the return value is converted to a bool, then you could do > > ins_fetched = !!cpu_memory_rw_debug() > > > > > > > > + > > > > + /* > > > > + * Hacky workaround for kernels that for aarch32 guests, instead of expected > > > > + * external data abort, inject the IMPLEMENTATION DEFINED exception with the > > > > + * lock-down. This is actually handled by the guest which results in > > > > + * re-running the faulting instruction. > > > > + * This intends to break the vicious cycle. > > > > + */ > > > > > > This doesn't seem like the right thing to do. What happens on real > > > hardware in this case? If an OS would get into a "vicious cycle" on > > > real hardware then it should get into one on KVM too. > > > > > That's the problem. that would not happen on a real hardware. > > We might end up here due to a KVM bug (which has been fixed recently) > > when the injected > > abort is not the one expected (as of not the one that would be > > triggered by hw in this > > particular case). > > Details in : https://patchwork.kernel.org/patch/11358083/ > > If KVM can be fixed (and in this case already is fixed - 21aecdbd7f3a), > then doesn't it make more sense to just fix KVM, than to add a workaround > to QEMU? > This is to cover the cases (hopefully rare but still) when the host kernel won't get updated, leaving the issue exposed to the guest. Not sure what or if there is any policy regarding that in Qemu (?) There is also no mechanism within KVM to allow the user space to verify whether the fix is available or not. Capabilities are not designed for that (if I got things right) unless user-space interface is broken ... which is not entirely the case here. > > > > > > + if (!is_a64(env)) { > > > > + static uint8_t setback; > > > > + > > > > + /* > > > > + * The state has not been synchronized yet, so if this is re-occurrence > > > > + * of the same abort triggered by guest, the status for pending external > > > > + * abort should not get cleared yet > > > > + */ > > > > + if (unlikely(env->ext_dabt_pending)) { > > > > + if (setback) { > > > > + error_report("Most probably triggered kernel issue with" > > > > + " injecting external data abort."); > > > > + error_printf("Giving up trying ...\n"); > > > > + /* Here is where the fun comes to an end */ > > > > + return -EINVAL; > > > > + } > > > > + } > > > > + setback = env->ext_dabt_pending; > > > > + } > > > > + > > > > + kvm_cpu_synchronize_state(cs); > > > > + /* > > > > + * ISS [23:14] is invalid so there is a limited info > > > > + * on what has just happened so the only *useful* thing that can > > > > + * be retrieved from ISS is WnR & DFSC (though in some cases WnR > > > > + * might be less of a value as well) > > > > + */ > > > > + > > > > + /* > > > > + * Get current PC before it will get updated to exception vector entry > > > > + */ > > > > + target_ulong ins_addr = is_a64(env) ? env->pc : env->regs[15]; > > > > > > ins_addr should be declared above > > > > > > But what are we doing? pc is a guest virtual address. Oh, I see... > > > > > > > + > > > > + /* > > > > + * Get the faulting instruction > > > > + * Instructions have a fixed length of 32 bits > > > > + * and are always little-endian > > > > + */ > > > > + ins_fetched = !cpu_memory_rw_debug(cs, ins_addr, (uint8_t *) &ins, > > > > + sizeof(uint32_t), 0); > > > > > > ... we're trying to actual walk the KVM guest's page tables. That > > > seems a bit odd, and the "_debug" function name used for it makes > > > it seem even more odd. > > > > > > > + > > > > + error_report("Data abort exception with no valid ISS generated by " > > > > + "guest memory access at physical address: 0x" TARGET_FMT_lx, > > > > + (target_ulong)fault_ipa); > > > > + > > > > + error_printf(ins_fetched ? "%s : 0x%" PRIx32 " (ins encoded)\n" : "%s\n", > > > > + "KVM unable to emulate faulting instruction", > > > > + (!ins_fetched ? 0 : ldl_le_p(&ins))); > > > > + error_printf("Injecting a data abort exception into guest.\n"); > > > > > > The guest shouldn't be able to generate three lines of errors on the host > > > whenever it wants. That's a security bug. One trace point here seems like > > > the most we should do. Or, after these three lines we should kill the > > > guest. > > > > > You have a point here, this can indeed be exploited by the malicious > > guest. Not sure if killing it is necessary here. > > I could limit the logging though that gets tricky for aborts triggered by > > user-space processes which would probably leave the guest running > > without much of an issue there. > > I can y get rid of logging the additional info and keep the single > > statement of an instruction not being emulated but that still leaves an > > open door for potential exploits... > > > > > Actually, I don't think we should attempt to get the instruction at all. > > > We should do what the KVM documenation suggests we do when we get > > > this exit. We should either do a user selected action: one of suspend, > > > dump, restart, or inject a dabt (as is done below). The last choice hopes > > > that the guest will handle it in some graceful way, or that it'll crash > > > with all the information needed for a post-mortem crash investigation. > > > > > > And I don't think we should do the "very brave" option of trying to > > > emulate the instruction, even if we identify it as a valid MMIO address. > > > That just sounds like a huge can of worms. > > > > > > Does QEMU already have a way for users to select a > > > don't-know-how-to-handle-guest-exception behavior? > > > > > > > The function is attempting to inject the external data abort. The rest is for > > the aftermath analysis to easy figuring out what has happened > > which will not be so easy in case of user-space process triggering > > the DABT with no valid ISS. There is no attempt of emulating the instruction > > just simply saying the guest tried to access this address with this instruction > > which couldn't be handled. Might be excessive in some cases > > (like misbehaving kernel) but it also might be handy on those rare > > occasions when there is not much to analyze. > > But as per the issue you have raised above I will rework that. > > > > Not sure if there is a mechanism to specify the 'preferred' behavior though. > > I think a mechanism for a preferred behavior would be nice, because > all options make sense, depending on the use case. I'm not sure what > the right default is, though. Maybe this dabt injection, but there's > a good chance it'll just loop here. So maybe QEMU should just abort. > Aborting should be the safest default (from the host's PoV). I can still play around with this for a bit to make it more .. sensible, keeping in mind the potential risk with the current solution you have pointed out. I would still like to keep the abort injection as that is the main reason behind having this change. It might be a bit tricky, (time-wise) to add the 'def behaviour' interface though. Is there anything else that could potentially benefit from it ? BR Beata > > Thanks, > drew > > > > > > BR > > > > > > Beata > > > > > > > > + /* > > > > + * Set pending ext dabt and trigger SET_EVENTS so that > > > > + * KVM can inject the abort > > > > + */ > > > > + env->ext_dabt_pending = 1; > > > > + > > > > + return 0; > > > > +} > > > > > > Thanks, > > > drew > > > > > > _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm