All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Len Brown <lenb@kernel.org>, Andy Lutomirski <luto@kernel.org>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>,
	Borislav Petkov <bp@suse.de>, Ingo Molnar <mingo@kernel.org>,
	X86 ML <x86@kernel.org>, "Brown\, Len" <len.brown@intel.com>,
	Dave Hansen <dave.hansen@intel.com>, "Liu\,
	Jing2" <jing2.liu@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Documentation List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v4 22/22] x86/fpu/xstate: Introduce boot-parameters to control state component support
Date: Sat, 27 Mar 2021 23:20:27 +0100	[thread overview]
Message-ID: <87r1k0ck7o.ffs@nanos.tec.linutronix.de> (raw)
In-Reply-To: <CAJvTdKmdMfD4BddMJs4iwvHWRSv4PV7Dh2vxjM57UJ3pw5UJDQ@mail.gmail.com>

Len,

On Sat, Mar 27 2021 at 00:53, Len Brown wrote:
>> 3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE
>>
>> System software may disable use of Intel AMX by clearing XCR0[18:17],
>> by clearing CR4.OSXSAVE, or by setting
>> IA32_XFD[18]. It is recommended that system software initialize AMX
>> state (e.g., by executing TILERELEASE)
>> before doing so. This is because maintaining AMX state in a
>> non-initialized state may have negative power and
>> performance implications.
>
> I agree that the wording here about disabling AMX is ominous.

Which is what I pointed out 7 days ago already, but that got lost in the
ABI and command line noise... Thanks Andy for bringing it back!

> The hardware initializes with AMX disabled.
> The kernel probes AMX, enables it in XCR0, and keeps it enabled.
>
> Initially, XFD is "armed" for all tasks.
> When a task accesses AMX state, #NM fires, we allocate a context
> switch buffer, and we "disarm" XFD for that task.
> As we have that buffer in-hand for the lifetime of the task, we never
> "arm" XFD for that task again.
>
> XFD is context switched, and so the next time it is set, is when we
> are restoring some other task's state.
>
> n.b. I'm describing the Linux flow.  The VMM scenario is a little different.
>
>> Since you reviewed the patch set, I assume you are familiar with how
>> Linux manages XSTATE.  Linux does *not* eagerly load XSTATE on context
>> switch.  Instead, Linux loads XSTATE when the kernel needs it loaded
>> or before executing user code.  This means that the kernel can (and
>> does, and it's a performance win) execute kernel thread code and/or go
>> idle, *including long-term deep idle*, with user XSTATE loaded.
>
> Yes, this scenario is clear.
>
> There are several cases.
>
> 1. Since TMM registers are volatile, a routine using TMM that wants
> them to persist across a call must save them,
>     and will TILERELEASE before invoking that call.  That is the
> calling convention,
>     and I expect that if it is not followed, debugging (of tools) will
> occur until it is.
>
>     The only way for a user program's XSTATE to be present during the
> kernel's call to idle
>     is if it sleep via a system call when no other task wants to run
> on that CPU.
>
>     Since system calls are calls, in this case, AMX INIT=1 during
>     idle.

What is the guarantee for that? A calling convention?

That's uninteresting because that's only the recommended and desired
state and not the guaranteed state.

>     All deep C-state are enabled, the idle CPU is able to contribute
> it's maximum turbo buget to its peers.
>
> 2. A correct program with live TMM registers takes an interrupt, and
> we enter the kernel AMX INIT=0.
>     Yes, we will enter the syscall at the frequency of the app (like
> we always do).

That's about interrupts not syscalls and I assume this should be all
s/syscall/interrupt/ for the whole #2 including 2a

>     Yes, turbo frequency may be limited by the activity of this
> processor and its peers (like it always is)
>
>    2a. If we return to the same program, then depending on how long
> the syscall runs, we may execute
>          the program and the system call code at a frequency lower
> than we might if AMX INIT=1 at time of interrupt.

So the frequency effect is relevant for the duration of the interrupt
and the eventually appended soft interrupt, right?

The program state is uninteresting because even if the kernel would
do XSAVES, TILERELEASE on interrupt entry then it would restore the
state before returning and then the program would have the same
conditions as before the interrupt.

>    2b. If we context switch to a task that has AMX INIT=1, then any
> AMX-imposed limits on turbo
>          are immediately gone.

Immediately on context switch? Definitely not.

      switch_to(prev, next)
        XSAVES(prev)
        eventually set XFD[18]

The point where AMX INIT=1 of 'next' becomes relevant is on return to
user space where XRSTORS happens. Up to that point AMX INIT=0 stays in
effect.

Now what guarantees that 'next' is returning to user space immediately?

Nothing.

If it's a user task this can be a wakeup for whatever which might cause
another wait depending on the callchain that task is in. It can be
preempted before reaching XRSTORS which is the point that matters to
flip the AMX INIT state back to 1.

It can be a kernel task or a chain of kernel tasks with arbitrary
runtime.

As a consequence the scheduler might migrate 'prev' from CPU_A to CPU_L
and what happens to that state on CPU_A? Does it magically move along
with 'prev' to CPU_L? I can't see how, but what do I know about magic.

So now the chain of kernel tasks finishes and there is nothing to do,
CPU_A goes idle with AMX INIT=0, which prevents the CPU from going deep,
drains power, can't contribute to the turbo state or whatever undesired
side effects that has.

You can get the same effect not only by device interrupts but also by
regular task migration, ptrace, breakpoints, any form of traps,
exception the task triggers in user space, user space freezing, kill -9
and .....

> 3. A buggy or purposely bogus program is fully empowered to violate
> the programming conventions.
>     Say such a program called a long sleep, and nothing else wanted to
> run on that CPU, so the kernel
>     went idle with AMX INIT=0.  Indeed, this could retard the core
> from getting into the deepest available
>     C-state, which could impact the turbo budget of neighboring cores.
> However, if that were some kind
>     of DOS, it would be simpler and more effective to simply hog a CPU
> by running code.  Also, as soon
>     as another thread switches in with INIT=1, there is no concept of
> AMX frequency caps. (see note for 2b)

It's irrelevant whether this is intentionally buggy or not. It's equally
irrelevant whether this is a stupid attempt of DOS or not.

What's relevant is that this has undesired side effects of various
sorts.

> I do not see a situation where the kernel needs to issue TILERELEASE
> (though a VMM likely would).

So #3 does not qualify for you? Interesting POV.

> What did I miss?

See #2.b

What's the actual downside of issuing TILERELEASE conditionally
depending on prev->AMX INIT=0? Is it slooooow or what's the real
problem here?

Thanks,

        tglx

  reply	other threads:[~2021-03-27 22:21 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-21 18:56 [PATCH v4 00/22] x86: Support Intel Advanced Matrix Extensions Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 01/22] x86/fpu/xstate: Modify the initialization helper to handle both static and dynamic buffers Chang S. Bae
2021-03-10 13:40   ` Borislav Petkov
2021-02-21 18:56 ` [PATCH v4 02/22] x86/fpu/xstate: Modify state copy helpers " Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 03/22] x86/fpu/xstate: Modify address finders " Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 04/22] x86/fpu/xstate: Modify the context restore helper " Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 05/22] x86/fpu/xstate: Add a new variable to indicate dynamic user states Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 06/22] x86/fpu/xstate: Add new variables to indicate dynamic xstate buffer size Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 07/22] x86/fpu/xstate: Calculate and remember dynamic xstate buffer sizes Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 08/22] x86/fpu/xstate: Convert the struct fpu 'state' field to a pointer Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 09/22] x86/fpu/xstate: Introduce helpers to manage the xstate buffer dynamically Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 10/22] x86/fpu/xstate: Define the scope of the initial xstate data Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 11/22] x86/fpu/xstate: Update the xstate save function to support dynamic states Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 12/22] x86/fpu/xstate: Update the xstate buffer address finder " Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 13/22] x86/fpu/xstate: Update the xstate context copy function " Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state Chang S. Bae
2021-03-20 22:13   ` Thomas Gleixner
2021-03-20 22:21     ` Andy Lutomirski
2021-03-23 21:01       ` Len Brown
2021-03-24  3:14         ` Liu, Jing2
2021-03-24 21:09           ` Len Brown
2021-03-24 21:26             ` Andy Lutomirski
2021-03-24 21:30               ` Dave Hansen
2021-03-24 21:42                 ` Andy Lutomirski
2021-03-24 21:58                   ` Dave Hansen
2021-03-24 22:12                     ` Andy Lutomirski
2021-03-25  5:12             ` Liu, Jing2
2021-03-25  6:59               ` Bae, Chang Seok
2021-03-25  7:26                 ` Liu, Jing2
2021-03-23 21:52     ` Bae, Chang Seok
2021-03-24 14:24       ` Dave Hansen
2021-03-29 13:14     ` Len Brown
2021-03-29 13:33       ` Thomas Gleixner
2021-03-29 15:43         ` Len Brown
2021-03-29 16:06           ` Len Brown
2021-03-29 17:43             ` Andy Lutomirski
2021-03-29 18:57               ` Len Brown
2021-03-29 18:49           ` Thomas Gleixner
2021-03-29 22:16             ` Len Brown
2021-03-30  8:28               ` Thomas Gleixner
2021-03-30 16:38                 ` Len Brown
2021-03-26 16:34   ` Jann Horn
2021-03-29 18:14     ` Bae, Chang Seok
2021-02-21 18:56 ` [PATCH v4 15/22] x86/fpu/xstate: Support ptracer-induced xstate buffer expansion Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 16/22] x86/fpu/xstate: Extend the table to map state components with features Chang S. Bae
2021-03-20 21:25   ` Thomas Gleixner
2021-03-23 21:52     ` Bae, Chang Seok
2021-02-21 18:56 ` [PATCH v4 17/22] x86/cpufeatures/amx: Enumerate Advanced Matrix Extension (AMX) feature bits Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 18/22] x86/fpu/amx: Define AMX state components and have it used for boot-time checks Chang S. Bae
2021-03-20 21:31   ` Thomas Gleixner
2021-03-23 21:52     ` Bae, Chang Seok
2021-02-21 18:56 ` [PATCH v4 19/22] x86/fpu/amx: Enable the AMX feature in 64-bit mode Chang S. Bae
2021-03-20 21:26   ` Thomas Gleixner
2021-03-23 21:51     ` Bae, Chang Seok
2021-02-21 18:56 ` [PATCH v4 20/22] selftest/x86/amx: Include test cases for the AMX state management Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 21/22] x86/fpu/xstate: Support dynamic user state in the signal handling path Chang S. Bae
2021-02-21 18:56 ` [PATCH v4 22/22] x86/fpu/xstate: Introduce boot-parameters to control state component support Chang S. Bae
2021-02-21 19:30   ` Randy Dunlap
2021-02-21 20:10     ` Bae, Chang Seok
2021-02-21 20:37       ` Randy Dunlap
2021-03-20 20:56   ` Thomas Gleixner
2021-03-25 22:59     ` Len Brown
2021-03-25 23:10       ` Dave Hansen
2021-03-26 15:27         ` Len Brown
2021-03-26 19:22           ` Thomas Gleixner
2021-03-26  1:41       ` Andy Lutomirski
2021-03-26 15:33         ` Len Brown
2021-03-26 15:48           ` Andy Lutomirski
2021-03-26 17:53             ` Len Brown
2021-03-26 18:12               ` Andy Lutomirski
2021-03-27  4:53                 ` Len Brown
2021-03-27 22:20                   ` Thomas Gleixner [this message]
2021-03-29 13:31                     ` Len Brown
2021-03-29 14:10                       ` Thomas Gleixner
2021-03-26 18:17               ` Borislav Petkov
2021-03-27  4:41                 ` Len Brown
2021-03-26  1:50       ` Thomas Gleixner
2021-03-26 15:36         ` Len Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1k0ck7o.ffs@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bp@suse.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jing2.liu@intel.com \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.