Re: [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features

From: "Chang S. Bae" <chang.seok.bae@intel.com>
To: Dave Hansen <dave.hansen@intel.com>, <x86@kernel.org>,
	<tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
	<dave.hansen@linux.intel.com>
Cc: <hpa@zytor.com>, <corbet@lwn.net>, <bagasdotme@gmail.com>,
	<tony.luck@intel.com>, <yang.zhong@intel.com>,
	<linux-doc@vger.kernel.org>, <linux-man@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 1/4] Documentation/x86: Explain the purpose for dynamic features
Date: Tue, 13 Sep 2022 22:25:35 -0700	[thread overview]
Message-ID: <ccd567b5-78cb-53bc-9428-d4720ee6fc9c@intel.com> (raw)
In-Reply-To: <129981cd-32b7-0228-f932-4367f6c79316@intel.com>

On 9/9/2022 2:36 PM, Dave Hansen wrote:
> On 9/9/22 13:15, Chang S. Bae wrote:
>> +The purpose for dynamic features
>> +-------------------------------- >> +
>> + - Legacy userspace libraries have hard-coded sizes for an alternate signal
>> +   stack. With the arch_prctl() options, the signal frame beyond AVX-512
>> +   and PKRU will not be written by old programs as they are prevented from
>> +   using dynamic features. Then, the small signal stack will be compatible
>> +   on systems that support dynamic features.
> 
> This doesn't really ever broach the _problem_ that dynamic features solve.
> 
> 	Legacy userspace libraries often have hard-coded, static sizes
> 	for alternate signal stacks, often using MINSIGSTKSZ which is
> 	typically 2k.  That stack must be able to store at *least*
> 	the signal frame that the kernel sets up before jumping into
> 	the signal handler.  That signal frame must include an XSAVE
> 	buffer defined by the CPU. >
> 	However, that means that the size of signal stacks is dynamic,
> 	not static, because different CPUs have differently-sized XSAVE
> 	buffers.  

Yes, it was missing some points like:
* The buffer size is dynamic.
* And it depends on the CPU.

 >	Those old <=2k buffers are now too small for new CPU
 > 	features like AVX-512, which is causing stack overflows at
 > 	signal entry.

FWIW, some details are worth to be noted:
* Today's kernel prevents the overflow with commit 2beb4a53fc3f
   ("x86/signal: Detect and prevent an alternate signal stack overflow").
* On sigaltstack(), it also rejects a too-small altstack with
   the CONFIG_STRICT_SIGALTSTACK_SIZE option [2].

But, then I think AVX-512 is kind of unrelated here as it is not a 
dynamic feature. Maybe, something like:

     A compiled-in size of 2KB with existing applications is too small
     for new CPU features like AMX. Instead of universally requiring
     larger stack, this dynamic enabling can selectively enforce programs
     to have properly-sized altstacks.

>> + - Modern server systems are consolidating more applications to share the
>> +   CPU resource.
> 
> I'm not sure what this means.  Are you saying that CPU time is more
> overcommitted?  Or that different users are more likely to be sharing
> the same CPU core?  Or, is this trying to allude to the frequency
> penalties that cores (and even packages) pay for using features like
> AVX-512?

Sorry, this point looks to be too sketchy. But, clarifying the problem, 
may help but it is hardly related to the solution to one of them.

The AVX-512 use was proliferated especially in userspace libraries. Then 
notable side effect like the frequency drop was observed. But, it is 
unclear how this dynamic enabling can prevent the library code from 
enabling those features.

>> The risk of applications interfering with each other is
>> +   growing. The controllability on the resource trends to be more
>> +   warranted. Thus, this permission mechanism will be useful for that.
> 
> Should this be something more like:
> 
> Historically, a CPU shared very few resources with its neighbors outside
> of caches.  A CPU could execute whatever instructions it wanted without
> impacting other CPUs.  Also, there were minimal long-lasting temporal
> effects; an application that preceded yours running on a CPU would not
> impact how your application runs.
> 
> That model has been eroding, first with SMT where multiple logical CPUs
> share a core's resources.  Then, with features like AVX-512 that have a
> frequency and thermal impact which can last even after AVX-512 use
> ceases and have an impact wider than a single core.
> 
> In other words, it has become easier to be a "noisy neighbor".
> 
> Dynamic features allow the kernel limit applications' ability to become
> noisy neighbors in the first place.

Yeah, but it looks to be less relevant than the coscheduling mechanism 
as the solution for this. Maybe I'm missing something here.

I'd step back from this second point until finding a case that it solves 
any other problem.

Thanks,
Chang