From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50AEDC2BA83 for ; Wed, 12 Feb 2020 13:59:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 22E0221569 for ; Wed, 12 Feb 2020 13:59:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728245AbgBLN7o (ORCPT ); Wed, 12 Feb 2020 08:59:44 -0500 Received: from 8bytes.org ([81.169.241.247]:53960 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725887AbgBLN7o (ORCPT ); Wed, 12 Feb 2020 08:59:44 -0500 Received: by theia.8bytes.org (Postfix, from userid 1000) id BE6EC20E; Wed, 12 Feb 2020 14:59:42 +0100 (CET) Date: Wed, 12 Feb 2020 14:59:34 +0100 From: Joerg Roedel To: Andy Lutomirski Cc: x86@kernel.org, hpa@zytor.com, Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Hellstrom , Jiri Slaby , Dan Williams , Tom Lendacky , Juergen Gross , Kees Cook , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, Joerg Roedel Subject: Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support Message-ID: <20200212135934.GL20066@8bytes.org> References: <20200211135256.24617-1-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 11, 2020 at 07:48:12PM -0800, Andy Lutomirski wrote: > > > > On Feb 11, 2020, at 5:53 AM, Joerg Roedel wrote: > > > > > > > * Putting some NMI-load on the guest will make it crash usually > > within a minute > > Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to > ask for help. Some time between when you start filling it out and when > you do VMGEXIT, you get NMI. If the NMI does its own GHCB access [0], > it will clobber the outer #VC’a state, resulting in a failure when > VMGEXIT happens. There’s a related failure mode if the NMI is after > the VMGEXIT but before the result is read. > > I suspect you can fix this by saving the GHCB at the beginning of > do_nmi and restoring it at the end. This has the major caveat that it > will not work if do_nmi comes from user mode and schedules, but I > don’t believe this can happen. > > [0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that > this happens. Very true, thank you! You probably saved me a few hours of debugging this further :) I will implement better handling for nested #VC exceptions, which hopefully solves the NMI crashes. Thanks again, Joerg