x86 CPU features detection for applications (and AMX)

* x86 CPU features detection for applications (and AMX)
@ 2021-06-23 15:04 Florian Weimer
  2021-06-23 15:32 ` Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Florian Weimer @ 2021-06-23 15:04 UTC (permalink / raw)
  To: libc-alpha, linux-api, x86, linux-arch; +Cc: H.J. Lu

We have an interface in glibc to query CPU features:

  X86-specific Facilities
  <https://www.gnu.org/software/libc/manual/html_node/X86.html>

CPU_FEATURE_USABLE all preconditions for a feature are met,
HAS_CPU_FEATURE means it's in silicon but possibly dormant.
CPU_FEATURE_USABLE is supposed to look at XCR0, AT_HWCAP2 etc. before
enabling the relevant bit (so it cannot pass through any unknown bits).

It turns out we screwed up in the glibc 2.33 release the absolutely
required headers weren't actually installed:

  [PATCH] x86: Install <bits/platform/x86.h> [BZ #27958]
  <https://sourceware.org/pipermail/libc-alpha/2021-June/127215.html>

Given that the magic constants aren't available in any other way, this
feature was completely unusable, so we can perhaps revisit it and switch
to a different approach.

Previously kernel developers have expressed dismay that we didn't
coordinate the interface with them.  This is why I want raise this now.

When we designed this glibc interface, we assumed that bits would be
static during the life-time of the process, initialized at process
start.  That follows the model of previous x86 CPU feature enablement.
In the background, CPU_FEATURE_USABLE/HAS_CPU_FEATURE calls a function
which returns a pointer to eight 32-bit words, based on the index passed
to the function (out-of-range indidces return a pointer to zeros,
enabling forward compatibility).  The macros then use a magic constants
that encodes he lookup index and which of those 128 bits to extract to
find that bit, plus the feature/usable choice.  This means that we
*could* keep this interface unchanged if the kernel gives us a way to
read up-to-date feature state from a 256 bit area (or at least 32 bit
word) in thread-specific data.  Similar to what we have with
set_robust_list and rseq today.

This still wouldn't cover the enable/disable side, but at least it would
work for CPU features which are modal and come and go.  The fact that we
tell GCC to cache the returned pointer from that internal function, but
not that the data is immutable works to our advantage here.

On the other hand, maybe there is a way to give users a better
interface.  Obviously we want to avoid a syscall for a simple CPU
feature check.  And we also need something to enable/disable CPU
features.

Thanks,
Florian

PS: Is it true that there is no public mailing list for Linux
discussions specific to x86?

^ permalink raw reply	[flat|nested] 35+ messages in thread