From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92049C433ED for ; Thu, 22 Apr 2021 14:49:15 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1EF96613E6 for ; Thu, 22 Apr 2021 14:49:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1EF96613E6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.115499.220380 (Exim 4.92) (envelope-from ) id 1lZadZ-00047j-CK; Thu, 22 Apr 2021 14:49:01 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 115499.220380; Thu, 22 Apr 2021 14:49:01 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lZadZ-00047c-8F; Thu, 22 Apr 2021 14:49:01 +0000 Received: by outflank-mailman (input) for mailman id 115499; Thu, 22 Apr 2021 14:48:59 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lZadX-00047W-C6 for xen-devel@lists.xenproject.org; Thu, 22 Apr 2021 14:48:59 +0000 Received: from mx2.suse.de (unknown [195.135.220.15]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id bfa7b65c-c3e7-45be-9a66-a31a63cbdbdf; Thu, 22 Apr 2021 14:48:58 +0000 (UTC) Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 76379B16E; Thu, 22 Apr 2021 14:48:57 +0000 (UTC) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: bfa7b65c-c3e7-45be-9a66-a31a63cbdbdf X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1619102937; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=whV4digUq1DW2nHxtRw8sF8wf/NZv487D6zdQFusjPM=; b=X3NfmOp8L/htldif072mA7Zp7Wbak0BirqDpv3UbwamSUvZZgtip5n9ZlgHQ2+d3wk7Z3l 7gZSTZjr/rseWEB0XbxmiVq5CzORG3pgw3wHoLK3EYj7Obt+K+arkMbD1ZFWkIPEyRvPcC d88IsqLLC3Cbe4ptTbwFm5ihPmVxFYA= Subject: [PATCH v3 09/22] x86/xstate: enable AMX components From: Jan Beulich To: "xen-devel@lists.xenproject.org" Cc: Andrew Cooper , George Dunlap , Ian Jackson , Wei Liu , =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= , Anthony Perard References: <322de6db-e01f-0b57-5777-5d94a13c441a@suse.com> Message-ID: <0dbeab9e-087d-c7f5-3d79-f507e8ddeb0d@suse.com> Date: Thu, 22 Apr 2021 16:48:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <322de6db-e01f-0b57-5777-5d94a13c441a@suse.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit These being controlled by XCR0, enabling support is relatively straightforward. Note however that there won't be any use of them until their dependent ISA extension CPUID flags get exposed, not the least due to recalculate_xstate() handling the dependencies in kind of a reverse manner. Signed-off-by: Jan Beulich --- v3: Add new states to XSTATE_NONLAZY. v2: New. --- a/tools/libs/light/libxl_cpuid.c +++ b/tools/libs/light/libxl_cpuid.c @@ -221,6 +221,9 @@ int libxl_cpuid_parse_config(libxl_cpuid {"md-clear", 0x00000007, 0, CPUID_REG_EDX, 10, 1}, {"serialize", 0x00000007, 0, CPUID_REG_EDX, 14, 1}, {"cet-ibt", 0x00000007, 0, CPUID_REG_EDX, 20, 1}, + {"amx-bf16", 0x00000007, 0, CPUID_REG_EDX, 22, 1}, + {"amx-tile", 0x00000007, 0, CPUID_REG_EDX, 24, 1}, + {"amx-int8", 0x00000007, 0, CPUID_REG_EDX, 25, 1}, {"ibrsb", 0x00000007, 0, CPUID_REG_EDX, 26, 1}, {"stibp", 0x00000007, 0, CPUID_REG_EDX, 27, 1}, {"l1d-flush", 0x00000007, 0, CPUID_REG_EDX, 28, 1}, --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -168,7 +168,8 @@ static const char *const str_7d0[32] = [18] = "pconfig", [20] = "cet-ibt", - + [22] = "amx-bf16", + [24] = "amx-tile", [25] = "amx-int8", [26] = "ibrsb", [27] = "stibp", [28] = "l1d-flush", [29] = "arch-caps", [30] = "core-caps", [31] = "ssbd", --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -198,6 +198,14 @@ static void recalculate_xstate(struct cp xstate_size(X86_XCR0_PKRU_POS)); } + if ( p->feat.amx_tile ) + { + xstates |= X86_XCR0_TILECFG | X86_XCR0_TILEDATA; + xstate_size = max(xstate_size, + xstate_offset(X86_XCR0_TILEDATA_POS) + + xstate_size(X86_XCR0_TILEDATA_POS)); + } + p->xstate.max_size = xstate_size; p->xstate.xcr0_low = xstates & ~XSTATE_XSAVES_ONLY; p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32; --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -640,6 +640,10 @@ static bool valid_xcr0(uint64_t xcr0) if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) ) return false; + /* TILECFG and TILEDATA must be the same. */ + if ( !(xcr0 & X86_XCR0_TILECFG) != !(xcr0 & X86_XCR0_TILEDATA) ) + return false; + return true; } --- a/xen/include/asm-x86/x86-defns.h +++ b/xen/include/asm-x86/x86-defns.h @@ -96,6 +96,10 @@ #define X86_XCR0_HI_ZMM (1ULL << X86_XCR0_HI_ZMM_POS) #define X86_XCR0_PKRU_POS 9 #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS) +#define X86_XCR0_TILECFG_POS 17 +#define X86_XCR0_TILECFG (1ULL << X86_XCR0_TILECFG_POS) +#define X86_XCR0_TILEDATA_POS 18 +#define X86_XCR0_TILEDATA (1ULL << X86_XCR0_TILEDATA_POS) #define X86_XCR0_LWP_POS 62 #define X86_XCR0_LWP (1ULL << X86_XCR0_LWP_POS) --- a/xen/include/asm-x86/xstate.h +++ b/xen/include/asm-x86/xstate.h @@ -32,7 +32,8 @@ extern uint32_t mxcsr_mask; #define XSTATE_FP_SSE (X86_XCR0_FP | X86_XCR0_SSE) #define XSTATE_ALL (~(1ULL << 63)) -#define XSTATE_NONLAZY (X86_XCR0_BNDREGS | X86_XCR0_BNDCSR | X86_XCR0_PKRU) +#define XSTATE_NONLAZY (X86_XCR0_BNDREGS | X86_XCR0_BNDCSR | X86_XCR0_PKRU | \ + X86_XCR0_TILECFG | X86_XCR0_TILEDATA) #define XSTATE_LAZY (XSTATE_ALL & ~XSTATE_NONLAZY) #define XSTATE_XSAVES_ONLY 0 #define XSTATE_COMPACTION_ENABLED (1ULL << 63) --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -268,6 +268,9 @@ XEN_CPUFEATURE(MD_CLEAR, 9*32+10) / XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */ XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*a SERIALIZE insn */ XEN_CPUFEATURE(CET_IBT, 9*32+20) /* CET - Indirect Branch Tracking */ +XEN_CPUFEATURE(AMX_BF16, 9*32+22) /* AMX BFloat16 instructions */ +XEN_CPUFEATURE(AMX_TILE, 9*32+24) /* AMX tile architecture */ +XEN_CPUFEATURE(AMX_INT8, 9*32+25) /* AMX 8-bit integer instructions */ XEN_CPUFEATURE(IBRSB, 9*32+26) /*A IBRS and IBPB support (used by Intel) */ XEN_CPUFEATURE(STIBP, 9*32+27) /*A STIBP */ XEN_CPUFEATURE(L1D_FLUSH, 9*32+28) /*S MSR_FLUSH_CMD and L1D flush. */ --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -222,7 +222,7 @@ def crunch_numbers(state): # instruction groups which are specified to require XSAVE for state # management. XSAVE: [XSAVEOPT, XSAVEC, XGETBV1, XSAVES, - AVX, MPX, PKU, LWP], + AVX, MPX, PKU, AMX_TILE, LWP], # AVX is taken to mean hardware support for 256bit registers (which in # practice depends on the VEX prefix to encode), and the instructions @@ -290,6 +290,11 @@ def crunch_numbers(state): # In principle the TSXLDTRK insns could also be considered independent. RTM: [TSXLDTRK], + + # AMX-TILE means hardware support for tile registers and general non- + # computational instructions. All further AMX features are built on top + # of AMX-TILE. + AMX_TILE: [AMX_BF16, AMX_INT8], } deep_features = tuple(sorted(deps.keys()))