From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEBEAC433ED for ; Wed, 31 Mar 2021 16:54:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 92CA360FED for ; Wed, 31 Mar 2021 16:54:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233704AbhCaQxy (ORCPT ); Wed, 31 Mar 2021 12:53:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233966AbhCaQxw (ORCPT ); Wed, 31 Mar 2021 12:53:52 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6203DC061574 for ; Wed, 31 Mar 2021 09:53:52 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id o2so8208179plg.1 for ; Wed, 31 Mar 2021 09:53:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=hk1eGF9UROjG/f/Flt2fUdcK/q5+Gj+fZdKcwTUQYH8=; b=p+WEaeVoSyBA0mxHcFAHnmyH9tTYKSvE0ibbhRPxRDUQduiljO1luAFiZb0qjbpql6 yR63zxtgjsqVT5o0BG22OkkyaaORrogc0OnDKFQYB1CGkphoBz6f3pnHxINsKeiIDeHo QmKzsRo6qrh1XZIjHSbNg7nUie179ftSIvOBYA3VBHo6lAMyCnN9epfUA3JgHHzPQQyB yJK8ezTHafDcb4LNLHpYEAanQCevhoOyrdAnNF8cRDRHEFryHQ/NgOspx+hulpe53Gu5 W0h7y+beCdqcy+xmBVO0GXQo+aMHRhis0bjX3Ua1aODL520iFPaH0piN32KcU4Ul9h7L ELEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=hk1eGF9UROjG/f/Flt2fUdcK/q5+Gj+fZdKcwTUQYH8=; b=BXCLwOJFnwS+Ukc6rAkPZ1eLZxa0gjrM8jguuDnk7VbuoAq9pkM8uYrP4yKnifv8v1 6pNmk8GSqvLSPUe3NMUwmlyMR09oRazVy0boXWWqTuS7cxyROLWPwhvWBTUSj9cIxNbj u0F4TZ3/lD9RrZX3cSFu/Sde9TSaRCmLfu6zVHL9xLrkEb3sF4fuJp/6EvsfbfM3UuUJ 8I9NM2/U73yo6VndQDjdSdUFhoXThu+Nl4vQzhWSx25jcN8/yZDvGTb4iLtjnjAVFvAN r4U5tatfq8XLpv4hNIKD0ggQ4xmNbZ36ZAkxWI2/3A75MgMnqSLGpbnZ1pQd4OTJWJsg Dy1Q== X-Gm-Message-State: AOAM530Z4wpvsEFsWk57zJAfaSJTfwYUyNAkWajyNyu/JN5IRNjwOem6 n6aP/UgIw4wqQcmuEEIMcaqMzg== X-Google-Smtp-Source: ABdhPJx30pIX87T/1v4szyFkgLQbcJSdPKomLZItBbXr12z7PTW6nzPFAxKb6rJCAsys+wAg7E0kfw== X-Received: by 2002:a17:90a:c08a:: with SMTP id o10mr4369678pjs.67.1617209631861; Wed, 31 Mar 2021 09:53:51 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:6c04:8e42:2555:a3ed? ([2601:646:c200:1ef2:6c04:8e42:2555:a3ed]) by smtp.gmail.com with ESMTPSA id h15sm2848098pfo.20.2021.03.31.09.53.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 31 Mar 2021 09:53:51 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features Date: Wed, 31 Mar 2021 09:53:50 -0700 Message-Id: References: Cc: David Laight , Dave Hansen , Andy Lutomirski , Greg KH , "Bae, Chang Seok" , X86 ML , LKML , libc-alpha , Florian Weimer , Rich Felker , Kyle Huey , Keno Fischer , Linux API In-Reply-To: To: Len Brown X-Mailer: iPhone Mail (18D70) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 31, 2021, at 9:31 AM, Len Brown wrote: >=20 > =EF=BB=BFOn Tue, Mar 30, 2021 at 6:01 PM David Laight wrote: >=20 >>> Can we leave it in live registers? That would be the speed-of-light >>> signal handler approach. But we'd need to teach the signal handler to >>> not clobber it. Perhaps that could be part of the contract that a >>> fast signal handler signs? INIT=3D0 AMX state could simply sit >>> patiently in the AMX registers for the duration of the signal handler. >>> You can't get any faster than doing nothing :-) >>>=20 >>> Of course part of the contract for the fast signal handler is that it >>> knows that it can't possibly use XRESTOR of the stuff on the stack to >>> necessarily get back to the state of the signaled thread (assuming we >>> even used XSTATE format on the fast signal handler stack, it would >>> forget the contents of the AMX registers, in this example) >>=20 >> gcc will just use the AVX registers for 'normal' code within >> the signal handler. >> So it has to have its own copy of all the registers. >> (Well, maybe you could make the TMX instructions fault, >> but that would need a nested signal delivered.) >=20 > This is true, by default, but it doesn't have to be true. >=20 > Today, gcc has an annotation for user-level interrupts > https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Functi= on-Attributes >=20 > An analogous annotation could be created for fast signals. > gcc can be told exactly what registers and instructions it can use for > that routine. >=20 > Of course, this begs the question about what routines that handler calls, > and that would need to be constrained too. >=20 > Today signal-safety(7) advises programmers to limit what legacy signal han= dlers > can call. There is no reason that a fast-signal-safety(7) could not be cr= eated > for the fast path. >=20 >> There is also the register save buffer that you need in order >> to long-jump out of a signal handler. >> Unfortunately that is required to work. >> I'm pretty sure the original setjmp/longjmp just saved the stack >> pointer - but that really doesn't work any more. >>=20 >> OTOH most signal handlers don't care - but there isn't a flag >> to sigset() (etc) so ask for a specific register layout. >=20 > Right, the idea is to optimize for *most* signal handlers, > since making any changes to *all* signal handlers is intractable. >=20 > So the idea is that opting-in to a fast signal handler would opt-out > of some legacy signal capibilities. Complete state is one of them, > and thus long-jump is not supported, because the complete state > may not automatically be available. Long jump is probably the easiest problem of all: sigsetjmp() is a *function= *, following ABI, so sigsetjmp() is expected to clobber most or all of the e= xtended state. But this whole annotation thing will require serious compiler support. We al= ready have problems with compilers inlining functions and getting confused a= bout attributes. An API like: if (get_amx()) { use AMX; } else { don=E2=80=99t; } Avoids this problem. And making XCR0 dynamic, for all its faults, at least h= elps force a degree of discipline on user code. >=20 > thanks, > Len Brown, Intel Open Source Technology Center