From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AC11C433B4 for ; Mon, 17 May 2021 13:49:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BD8A61029 for ; Mon, 17 May 2021 13:49:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237527AbhEQNua (ORCPT ); Mon, 17 May 2021 09:50:30 -0400 Received: from mga02.intel.com ([134.134.136.20]:55610 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233718AbhEQNu3 (ORCPT ); Mon, 17 May 2021 09:50:29 -0400 IronPort-SDR: o+jqUplEFiIVZjbhtEGAz4ywUONgRkxo4mkQN8+LEnf8AZbK196Gx60Yu7QJ349ze61R4WVNGT Q3uw+5dvyeVQ== X-IronPort-AV: E=McAfee;i="6200,9189,9986"; a="187589768" X-IronPort-AV: E=Sophos;i="5.82,307,1613462400"; d="scan'208";a="187589768" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 May 2021 06:49:12 -0700 IronPort-SDR: ZOL+ry6wZLlVgKgJuPhe6rOnNHAxySLURxlnWDEg+BqMR/8lJ6U9SbGED9w9WIx/bHsCeHGUg+ fbOg+Oz7rQAg== X-IronPort-AV: E=Sophos;i="5.82,307,1613462400"; d="scan'208";a="410832083" Received: from avandeve-mobl.amr.corp.intel.com (HELO [10.212.212.163]) ([10.212.212.163]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 May 2021 06:49:12 -0700 Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Thomas Gleixner , Len Brown , Borislav Petkov Cc: Willy Tarreau , Andy Lutomirski , Florian Weimer , "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , linux-api@vger.kernel.org, "libc-alpha@sourceware.org" , Rich Felker , Kyle Huey , Keno Fischer References: <20210415044258.GA6318@zn.tnic> <20210415052938.GA2325@1wt.eu> <20210415054713.GB6318@zn.tnic> <20210419141454.GE9093@zn.tnic> <20210419191539.GH9093@zn.tnic> <20210419215809.GJ9093@zn.tnic> <874kf11yoz.ffs@nanos.tec.linutronix.de> From: Arjan van de Ven Message-ID: Date: Mon, 17 May 2021 06:49:11 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <874kf11yoz.ffs@nanos.tec.linutronix.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Having a proper interface (syscall, prctl) which user space can use to > ask for permission and allocation of the necessary buffer(s) is clearly > avoiding the downsides and provides the necessary mechanisms for proper > control and failure handling. this would need to be a "get / put" interface, so a refcount; that way things nest nicely. For API symmetry I'd want to have the put there, even if we may decide to be infinitely lazy in cleaning up the state. it also would want it to take an arguement that's a bitmask, so that this can be applied to future state as well. Eh actually I'd start with also adding AVX512 to this. Even though for obvious compat reasons that one is on by default (so at process start we might need to start with a count of 1) it's interesting to fold that into this same framework. (and who knows, dropping AVX512 state if you don't need it might improve context switches) Syscalls are relatively cheap (and I can imagine the C library doing a TLS cache of the count if it becomes an issue) so can be done on a relatively finegrained level. I've worked on OpenBLAS before, and that library basically has a global initialization function that ends up getting called on the first big math op (it may spawn threads as well etc) but which "stays around" for consecutive math functions; a get/put model would work quite well for such math library (since it's based on BLAS like almost all such math libraries, I expect this to be the common pattern)