From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A4B2C433E2 for ; Tue, 23 Mar 2021 21:03:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC5D8619D3 for ; Tue, 23 Mar 2021 21:03:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233578AbhCWVCg (ORCPT ); Tue, 23 Mar 2021 17:02:36 -0400 Received: from mail-ej1-f52.google.com ([209.85.218.52]:41594 "EHLO mail-ej1-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233437AbhCWVCK (ORCPT ); Tue, 23 Mar 2021 17:02:10 -0400 Received: by mail-ej1-f52.google.com with SMTP id u5so29378167ejn.8 for ; Tue, 23 Mar 2021 14:02:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=piHhFa85E0Z3Zzr7hilVx8SqiHOmrEsubdVllNbeaBA=; b=KGPT9DNoqn4YLVsfyY7YSiXA8VfnpmLlRxLS3oXy7rMSq76c/8pgUJQqtPFI14WIbZ eF0OLOP2n7qctpAdNIkfKH4RfSX0xW51IeHcI2zMwK2ExK8x3JDBTqgjF3zBt+x5i3pT PWUKxnzsthcxa0Yr4SsUTD6yBfHRQ1KMwQUBnSb8sGb6zVYdUJrQgnYW/raW6WfuMN4t Z1AML0hwg7tW+Q4+soRlZ64tEvyjGOsvLVn1PeKkis6ndsB2Ry2IM7kAzduTmdOh4bDx T9IXW1b+MEo4+sX6RV9HegGfWxKpup7R+XeVuvBvD/5uJFcngpj0eFnXfD8FYlBpzUYw ahCw== X-Gm-Message-State: AOAM532YNQ++9VXx/Ewukhqhr1CJvzVvccc4idDLbXudjcqAVX9AT2nA nmV9kg0A8ANVukqpwx6x46Kl/4J3ftSyScEzrvk= X-Google-Smtp-Source: ABdhPJx2bwST4v/J9Mt5/AUPh8ItMB7D1mIZ9dp+GaBqZS3BYP3o0NQx4i3QgDhom/jkaMyBoZNMimGdmZa09F0fK1s= X-Received: by 2002:a17:907:761c:: with SMTP id jx28mr95779ejc.417.1616533329256; Tue, 23 Mar 2021 14:02:09 -0700 (PDT) MIME-Version: 1.0 References: <20210221185637.19281-1-chang.seok.bae@intel.com> <20210221185637.19281-15-chang.seok.bae@intel.com> <87o8fda2ye.fsf@nanos.tec.linutronix.de> In-Reply-To: From: Len Brown Date: Tue, 23 Mar 2021 17:01:58 -0400 Message-ID: Subject: Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state To: Andy Lutomirski Cc: Thomas Gleixner , Borislav Petkov , Ingo Molnar , X86 ML , Len Brown , Dave Hansen , "Liu, Jing2" , "Ravi V. Shankar" , LKML , "Bae, Chang Seok" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > I have an obnoxious question: do we really want to use the XFD mechanism? Obnoxious questions are often the most valuable! :-) > Right now, glibc, and hence most user space code, blindly uses > whatever random CPU features are present for no particularly good > reason, which means that all these features get stuck in the XINUSE=1 > state, even if there is no code whatsoever in the process that > benefits. AVX512 is bad enough as we're seeing right now. AMX will > be much worse if this happens. > > We *could* instead use XCR0 and require an actual syscall to enable > it. We could even then play games like requiring whomever enables the > feature to allocate memory for the state save area for signals, and > signal delivery could save the state and disable the feature, this > preventing the signal frame from blowing up to 8 or 12 or who knows > how many kB. This approach would have some challenges. Enumeration today is two parts. 1. CPUID tells you if the feature exists in the HW 2. xgetbv/XCR0 tells you if the OS supports that feature Since #2 would be missing, you are right, there would need to be a new API enabling the user to request the OS to enable support for that task. If that new API is not invoked before the user touches the feature, they die with a #UD. And so there would need to be some assurance that the API is successfully called before any library might use the feature. Is there a practical way to guarantee that, given that the feature may be used (or not) only by a dynamically linked library? If a library spawns threads and queries the size of XSAVE before the API is called, it may be confused when that size changes after the API is called. So a simple question, "who calls the API, and when?" isn't so simple. Finally, note that XCR0 updates cause a VMEXIT, while XFD updates do not. So context switching XCR0 is possible, but is problematic. The other combination is XFD + API rather than XCR0 + API. With XFD, the context switching is faster, and the faulting (#NM and the new MSR with #NM cause) is viable. We have the bit set in XCR0, so no state size advantage. Still have issues with API logistics. So we didn't see that the API adds any value, only pain, over transparent 1st use enabling with XFD and no API. cheers, Len Brown, Intel Open Source Technology Center ps. I agree that un-necessary XINUSE=1 is possible. Notwithstanding the issues initially deploying AVX512, I am skeptical that it is common today. IMO, the problem with AVX512 state is that we guaranteed it will be zero for XINUSE=0. That means we have to write 0's on saves. It would be better to be able to skip the write -- even if we can't save the space we can save the data transfer. (This is what we did for AMX). pps. your idea of requiring the user to allocate their own signal stack is interesting. It isn't really about allocating the stack, though -- the stack of the task that uses the feature is generally fine already. The opportunity is to allow tasks that do *not* use the new feature to get away with minimal data transfer and stack size. As we don't have the 0's guarantee for AMX, we bought the important part of that back.