From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC2F3C433B4 for ; Mon, 3 May 2021 14:23:05 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 85E26611AB for ; Mon, 3 May 2021 14:23:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85E26611AB Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.121638.229411 (Exim 4.92) (envelope-from ) id 1ldZT9-0002fR-IK; Mon, 03 May 2021 14:22:43 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 121638.229411; Mon, 03 May 2021 14:22:43 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1ldZT9-0002fK-FO; Mon, 03 May 2021 14:22:43 +0000 Received: by outflank-mailman (input) for mailman id 121638; Mon, 03 May 2021 14:22:42 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1ldZT8-0002fF-2A for xen-devel@lists.xenproject.org; Mon, 03 May 2021 14:22:42 +0000 Received: from mx2.suse.de (unknown [195.135.220.15]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 9e3e2dcf-7981-472b-a806-d40f7047c150; Mon, 03 May 2021 14:22:40 +0000 (UTC) Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 47EE2B2E3; Mon, 3 May 2021 14:22:39 +0000 (UTC) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 9e3e2dcf-7981-472b-a806-d40f7047c150 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1620051759; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=54ugadqo6jk1DJ6iuEFaNypD8iMhvQy14YPAp7q2tG4=; b=W03zWk2Z6oM2XlFVtzLa2FvpQ9Qa1Fegf37WwojQ8rklIP1reaW8mmjDHK/NHoYPQM4HwM 5MdNoj5oqeUkoZ768YRHCkGR4er7+leh2X1GlqOveHYyhboz2pvOpNATFwEsbmof9S+yU8 8v9tQHCRJMC+8J42KZgr6F/VDc9mwpc= Subject: Re: [PATCH v3 03/22] x86/xstate: re-size save area when CPUID policy changes To: Andrew Cooper Cc: George Dunlap , Ian Jackson , Julien Grall , Stefano Stabellini , Wei Liu , =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= , "xen-devel@lists.xenproject.org" References: <322de6db-e01f-0b57-5777-5d94a13c441a@suse.com> <8ba8f016-0aed-277b-bbea-80022d057791@suse.com> <5a954be8-e213-36d8-27da-4c51243dc280@citrix.com> From: Jan Beulich Message-ID: Date: Mon, 3 May 2021 16:22:38 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <5a954be8-e213-36d8-27da-4c51243dc280@citrix.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit On 03.05.2021 15:57, Andrew Cooper wrote: > On 22/04/2021 15:44, Jan Beulich wrote: >> vCPU-s get maximum size areas allocated initially. Hidden (and in >> particular default-off) features may allow for a smaller size area to >> suffice. >> >> Suggested-by: Andrew Cooper >> Signed-off-by: Jan Beulich >> --- >> v2: Use 1ul instead of 1ull. Re-base. >> --- >> This could be further shrunk if we used XSAVEC / if we really used >> XSAVES, as then we don't need to also cover the holes. But since we >> currently use neither of the two in reality, this would require more >> work than just adding the alternative size calculation here. >> >> Seeing that both vcpu_init_fpu() and cpuid_policy_updated() get called >> from arch_vcpu_create(), I'm not sure we really need this two-stage >> approach - the slightly longer period of time during which >> v->arch.xsave_area would remain NULL doesn't look all that problematic. >> But since xstate_alloc_save_area() gets called for idle vCPU-s, it has >> to stay anyway in some form, so the extra code churn may not be worth >> it. >> >> Instead of cpuid_policy_xcr0_max(), cpuid_policy_xstates() may be the >> interface to use here. But it remains to be determined whether the >> xcr0_accum field is meant to be inclusive of XSS (in which case it would >> better be renamed) or exclusive. Right now there's no difference as we >> don't support any XSS-controlled features. > > I've been figuring out what we need to for supervisors states.  The > current code is not in a good shape, but I also think some of the > changes in this series take us in an unhelpful direction. >From reading through the rest your reply I'm not sure I see what you mean. ORing in host_xss at certain points shouldn't be a big deal. > I've got a cleanup series which I will post shortly.  It interacts > texturally although not fundamentally with this series, but does fix > several issues. > > For supervisor states, we need use XSAVES unilaterally, even for PV.  > This is because XSS_CET_S needs to be the HVM kernel's context, or Xen's > in PV context (specifically, MSR_PL0_SSP which is the shstk equivalent > of TSS.RSP0). > > > A consequence is that Xen's data handling shall use the compressed > format, and include supervisor states.  (While in principle we could > manage CET_S, CET_U, and potentially PT when vmtrace gets expanded, each > WRMSR there is a similar order of magnitude to an XSAVES/XRSTORS > instruction.) I agree. > I'm planning a host xss setting, similar to mmu_cr4_features, which > shall be the setting in context for everything other than HVM vcpus > (which need the guest setting in context, and/or the VT-x bodge to > support host-only states).  Amongst other things, all context switch > paths, including from-HVM, need to step XSS up to the host setting to > let XSAVES function correctly. > > However, a consequence of this is that the size of the xsave area needs > deriving from host, as well as guest-max state.  i.e. even if some VMs > aren't using CET, we still need space in the xsave areas to function > correctly when a single VM is using CET. Right - as said above, taking this into consideration here shouldn't be overly problematic. > Another consequence is that we need to rethink our hypercall behaviour.  > There is no such thing as supervisor states in an uncompressed XSAVE > image, which means we can't continue with that being the ABI. I don't think the hypercall input / output blob needs to follow any specific hardware layout. > I've also found some substantial issues with how we handle > xcr0/xcr0_accum and plan to address these.  There is no such thing as > xcr0 without the bottom bit set, ever, and xcr0_accum needs to default > to X87|SSE seeing as that's how we use it anyway.  However, in a context > switch, I expect we'll still be using xcr0_accum | host_xss when it > comes to the context switch path. Right, and to avoid confusion I think we also want to move from xcr0_accum to e.g. xstate_accum, covering both XCR0 and XSS parts all in one go. > In terms of actual context switching, we want to be using XSAVES/XRSTORS > whenever it is available, even if we're not using supervisor states.  > XSAVES has both the inuse and modified optimisations, without the broken > consequence of XSAVEOPT (which is firmly in the "don't ever use this" > bucket now). The XSAVEOPT anomaly is affecting user mode only, isn't it? Or are you talking of something I have forgot about? > There's no point ever using XSAVEC.  There is no hardware where it > exists in the absence of XSAVES, and can't even in theoretical > circumstances due to (perhaps unintentional) linkage of the CPUID data.  > XSAVEC also doesn't use the modified optimisation, and is therefore > strictly worse than XSAVES, even when MSR_XSS is 0. > > Therefore, our choice of instruction wants to be XSAVES, or XSAVE, or > FXSAVE, depending on hardware capability. Makes sense to me (perhaps - see above - minus your omission of XSAVEOPT here). Jan