linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] x86/cpufeatures: Re-arrange a few features and enumerate AVX512 BFLOAT16 intructions
@ 2019-06-17 18:00 Fenghua Yu
  2019-06-17 18:00 ` [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11 Fenghua Yu
  2019-06-17 18:00 ` [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions Fenghua Yu
  0 siblings, 2 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-06-17 18:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

To enumerate AVX512 BFLOAT16 feature CPUID.7.1:EAX[5] and other future
features in CPUID.7.1:EAX, Boris suggests to create a new pure feature
bits word.

Boris further suggests to re-define word 11 as a scattered features word
and move the four X86_FEATURE_CQM_* features in existing word 11 and
word 12 into the new word 11. Then use word 12 to hold features in
CPUID.7.1:EAX including AVX512 BFLOAT16 instructions.

Change Log:
v2:
- Remove patch 0001 and keep x86_cache_max_rmid and x86_cache_occ_scale
  in cpuinfo_x86 per Thomas Gleixner's comment. 

v1:
Address all comments from Borislav Petkov:
- Re-define feature word 11 as a scattered features word and move the
  four X86_FEATURE_CQM_* features into the word
- Re-define feature word 12 to hold CPUID.7.1:EAX including BFLOAT16
- Name a dummy leaf 12 in cpuid_leafs in patch 0002 to avoid bisect
  error
- Simplify code to get number of rmid and monitoring scale in
  rdt_get_mon_l3_config()

Fenghua Yu (2):
  x86/cpufeatures: Combine word 11 and 12 into new scattered features
    word 11
  x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions

 arch/x86/include/asm/cpufeature.h  |  4 +--
 arch/x86/include/asm/cpufeatures.h | 18 ++++++----
 arch/x86/kernel/cpu/common.c       | 56 ++++++++++++++++--------------
 arch/x86/kernel/cpu/cpuid-deps.c   |  4 +++
 arch/x86/kernel/cpu/scattered.c    |  4 +++
 arch/x86/kvm/cpuid.h               |  2 --
 6 files changed, 51 insertions(+), 37 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11
  2019-06-17 18:00 [PATCH v2 0/2] x86/cpufeatures: Re-arrange a few features and enumerate AVX512 BFLOAT16 intructions Fenghua Yu
@ 2019-06-17 18:00 ` Fenghua Yu
  2019-06-19 17:36   ` Borislav Petkov
  2019-06-20 11:37   ` [tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word tip-bot for Fenghua Yu
  2019-06-17 18:00 ` [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions Fenghua Yu
  1 sibling, 2 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-06-17 18:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

It's a waste for the four X86_FEATURE_CQM_* features to occupy two
pure feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be added
in word 11 in the future.

Leaf 11 in cpuid_leafs is renamed as CPUID_LNX_4 to reflect it's a Linux
defined leaf.

Although word 12 doesn't have any feature now, leaf 12 in cpuid_leafs
still needs to be kept because cpuid_leafs must have NCAPINTS leafs.
Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful name
in the next patch when CPUID.7.1:EAX occupies world 12.

Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeature.h  |  4 +--
 arch/x86/include/asm/cpufeatures.h | 17 ++++++----
 arch/x86/kernel/cpu/common.c       | 53 +++++++++++++++---------------
 arch/x86/kernel/cpu/cpuid-deps.c   |  3 ++
 arch/x86/kernel/cpu/scattered.c    |  4 +++
 arch/x86/kvm/cpuid.h               |  2 --
 6 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
 	CPUID_LNX_3,
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
-	CPUID_F_0_EDX,
-	CPUID_F_1_EDX,
+	CPUID_LNX_4,
+	CPUID_DUMMY,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..4f0a3d093794 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -269,13 +269,16 @@
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
 
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC		(11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC	(12*32+ 0) /* LLC occupancy monitoring */
-#define X86_FEATURE_CQM_MBM_TOTAL	(12*32+ 1) /* LLC Total MBM monitoring */
-#define X86_FEATURE_CQM_MBM_LOCAL	(12*32+ 2) /* LLC Local MBM monitoring */
+/*
+ * Extended auxiliary flags: Linux defined - For features scattered in various
+ * CPUID levels like 0xf, word 11.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC		(11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC	(11*32+ 1) /* LLC occupancy monitoring */
+#define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
+#define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..f080be35da41 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -801,6 +801,31 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
 	}
 }
 
+static void get_cqm_info(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
+		u32 eax, ebx, ecx, edx;
+
+		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
+		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
+		/* will be overridden if occupancy monitoring exists */
+		c->x86_cache_max_rmid = ebx;
+
+		if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
+		    cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
+		    cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
+			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
+
+			c->x86_cache_max_rmid = ecx;
+			c->x86_cache_occ_scale = ebx;
+		}
+	} else {
+		c->x86_cache_max_rmid = -1;
+		c->x86_cache_occ_scale = -1;
+	}
+}
+
 void get_cpu_cap(struct cpuinfo_x86 *c)
 {
 	u32 eax, ebx, ecx, edx;
@@ -832,33 +857,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_D_1_EAX] = eax;
 	}
 
-	/* Additional Intel-defined flags: level 0x0000000F */
-	if (c->cpuid_level >= 0x0000000F) {
-
-		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
-		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[CPUID_F_0_EDX] = edx;
-
-		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
-			/* will be overridden if occupancy monitoring exists */
-			c->x86_cache_max_rmid = ebx;
-
-			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
-			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
-			c->x86_capability[CPUID_F_1_EDX] = edx;
-
-			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
-			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
-			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
-				c->x86_cache_max_rmid = ecx;
-				c->x86_cache_occ_scale = ebx;
-			}
-		} else {
-			c->x86_cache_max_rmid = -1;
-			c->x86_cache_occ_scale = -1;
-		}
-	}
-
 	/* AMD-defined flags: level 0x80000001 */
 	eax = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = eax;
@@ -889,6 +887,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 
 	init_scattered_cpuid_features(c);
 	init_speculation_control(c);
+	get_cqm_info(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, after probe.
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..fa07a224e7b9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_4VNNIW,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_4FMAPS,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..adf9b71386ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,10 @@ struct cpuid_bit {
 static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_APERFMPERF,       CPUID_ECX,  0, 0x00000006, 0 },
 	{ X86_FEATURE_EPB,		CPUID_ECX,  3, 0x00000006, 0 },
+	{ X86_FEATURE_CQM_LLC,		CPUID_EDX,  1, 0x0000000f, 0 },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	CPUID_EDX,  0, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	CPUID_EDX,  1, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	CPUID_EDX,  2, 0x0000000f, 1 },
 	{ X86_FEATURE_CAT_L3,		CPUID_EBX,  1, 0x00000010, 0 },
 	{ X86_FEATURE_CAT_L2,		CPUID_EBX,  2, 0x00000010, 0 },
 	{ X86_FEATURE_CDP_L3,		CPUID_ECX,  2, 0x00000010, 1 },
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5b6d1f..d78a61408243 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -47,8 +47,6 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0001_ECX] = {0x80000001, 0, CPUID_ECX},
 	[CPUID_7_0_EBX]       = {         7, 0, CPUID_EBX},
 	[CPUID_D_1_EAX]       = {       0xd, 1, CPUID_EAX},
-	[CPUID_F_0_EDX]       = {       0xf, 0, CPUID_EDX},
-	[CPUID_F_1_EDX]       = {       0xf, 1, CPUID_EDX},
 	[CPUID_8000_0008_EBX] = {0x80000008, 0, CPUID_EBX},
 	[CPUID_6_EAX]         = {         6, 0, CPUID_EAX},
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions
  2019-06-17 18:00 [PATCH v2 0/2] x86/cpufeatures: Re-arrange a few features and enumerate AVX512 BFLOAT16 intructions Fenghua Yu
  2019-06-17 18:00 ` [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11 Fenghua Yu
@ 2019-06-17 18:00 ` Fenghua Yu
  2019-06-19 17:31   ` Borislav Petkov
  2019-06-20 11:38   ` [tip:x86/cpu] x86/cpufeatures: Enumerate the " tip-bot for Fenghua Yu
  1 sibling, 2 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-06-17 18:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

AVX512 Vector Neural Network Instructions (VNNI) in Intel Deep Learning
Boost support BFLOAT16 format (BF16). BF16 is a short version of FP32 and
has several advantages over FP16. BF16 offers more than enough range for
deep learning training tasks and doesn't need to handle hardware exception
as this is a performance optimization. FP32 accumulation after the
multiply is essential to achieve sufficient numerical behavior on an
application level.

AVX512 BFLOAT16 instructions can be enumerated by:
	CPUID.7.1:EAX[bit 5] AVX512_BF16

Use word 12, which is empty now, to hold features in CPUID.7.1:EAX
including AVX512_BF16. Leaf CPUID_DUMMY is renamed as CPUID_7_1_EAX.

Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeature.h  | 2 +-
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 3 +++
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 403f70c2e431..58acda503817 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -23,7 +23,7 @@ enum cpuid_leafs
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
 	CPUID_LNX_4,
-	CPUID_DUMMY,
+	CPUID_7_1_EAX,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4f0a3d093794..625191ceb214 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -280,6 +280,9 @@
 #define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
+/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
+#define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_IRPERF		(13*32+ 1) /* Instructions Retired Count */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f080be35da41..645a6c52d925 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -848,6 +848,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_7_0_EBX] = ebx;
 		c->x86_capability[CPUID_7_ECX] = ecx;
 		c->x86_capability[CPUID_7_EDX] = edx;
+
+		cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[CPUID_7_1_EAX] = eax;
 	}
 
 	/* Extended state features: level 0x0000000d */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index fa07a224e7b9..a444028d8145 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -62,6 +62,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_AVX512_BF16,	X86_FEATURE_AVX512VL  },
 	{}
 };
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions
  2019-06-17 18:00 ` [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions Fenghua Yu
@ 2019-06-19 17:31   ` Borislav Petkov
  2019-06-19 21:34     ` Fenghua Yu
  2019-06-20 11:38   ` [tip:x86/cpu] x86/cpufeatures: Enumerate the " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2019-06-19 17:31 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 11:00:16AM -0700, Fenghua Yu wrote:
> AVX512 Vector Neural Network Instructions (VNNI) in Intel Deep Learning
> Boost support BFLOAT16 format (BF16).

That sentence is a mouthful and I have no clue what it means. Marketing
junk? If so, either rewrite it for mere mortals or kill it.

> BF16 is a short version of FP32

FP32?

Please write out.

> and has several advantages over FP16.

Ditto.

> BF16 offers more than enough range for
> deep learning training tasks and doesn't need to handle hardware exception
> as this is a performance optimization. FP32 accumulation after the
> multiply is essential to achieve sufficient numerical behavior on an
> application level.
> 
> AVX512 BFLOAT16 instructions can be enumerated by:
> 	CPUID.7.1:EAX[bit 5] AVX512_BF16
> 
> Use word 12, which is empty now, to hold features in CPUID.7.1:EAX
> including AVX512_BF16.

... because that leaf is features only, right?

> Leaf CPUID_DUMMY is renamed as CPUID_7_1_EAX.

That's obvious from the patch, ain't it?

> Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
> can be found in the latest Intel Architecture Instruction Set Extensions
> and Future Features Programming Reference.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11
  2019-06-17 18:00 ` [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11 Fenghua Yu
@ 2019-06-19 17:36   ` Borislav Petkov
  2019-06-19 21:24     ` Fenghua Yu
  2019-06-20 11:37   ` [tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word tip-bot for Fenghua Yu
  1 sibling, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2019-06-19 17:36 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 11:00:15AM -0700, Fenghua Yu wrote:
> @@ -832,33 +857,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>  		c->x86_capability[CPUID_D_1_EAX] = eax;
>  	}
>  
> -	/* Additional Intel-defined flags: level 0x0000000F */
> -	if (c->cpuid_level >= 0x0000000F) {
> -
> -		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
> -		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
> -		c->x86_capability[CPUID_F_0_EDX] = edx;
> -
> -		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
> -			/* will be overridden if occupancy monitoring exists */
> -			c->x86_cache_max_rmid = ebx;
> -
> -			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
> -			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
> -			c->x86_capability[CPUID_F_1_EDX] = edx;
> -
> -			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
> -			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
> -			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
> -				c->x86_cache_max_rmid = ecx;
> -				c->x86_cache_occ_scale = ebx;
> -			}
> -		} else {
> -			c->x86_cache_max_rmid = -1;
> -			c->x86_cache_occ_scale = -1;
> -		}
> -	}
> -
>  	/* AMD-defined flags: level 0x80000001 */
>  	eax = cpuid_eax(0x80000000);
>  	c->extended_cpuid_level = eax;

What I meant with having a separate patch doing the carve out is to have
a single patch doing *only* code movement - no changes, no nothing. So
that it is clear what happens. Intermixing code movement and changes is
a bad idea and hard to review.

IOW, I did this:

---
From cef4f58a3da0465bbff33b2d669cc600b775f3ba Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Wed, 19 Jun 2019 17:24:34 +0200
Subject: [PATCH] x86/cpufeatures: Carve out CQM features retrieval

... into a separate function for better readability. Split out from a
patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
sole code movement separate for easy review.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
---
 arch/x86/kernel/cpu/common.c | 60 ++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..fe6ed9696467 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -801,6 +801,38 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
 	}
 }
 
+static void init_cqm(struct cpuinfo_x86 *c)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/* Additional Intel-defined flags: level 0x0000000F */
+	if (c->cpuid_level >= 0x0000000F) {
+
+		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
+		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[CPUID_F_0_EDX] = edx;
+
+		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
+			/* will be overridden if occupancy monitoring exists */
+			c->x86_cache_max_rmid = ebx;
+
+			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
+			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
+			c->x86_capability[CPUID_F_1_EDX] = edx;
+
+			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
+			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
+			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
+				c->x86_cache_max_rmid = ecx;
+				c->x86_cache_occ_scale = ebx;
+			}
+		} else {
+			c->x86_cache_max_rmid = -1;
+			c->x86_cache_occ_scale = -1;
+		}
+	}
+}
+
 void get_cpu_cap(struct cpuinfo_x86 *c)
 {
 	u32 eax, ebx, ecx, edx;
@@ -832,33 +864,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_D_1_EAX] = eax;
 	}
 
-	/* Additional Intel-defined flags: level 0x0000000F */
-	if (c->cpuid_level >= 0x0000000F) {
-
-		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
-		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[CPUID_F_0_EDX] = edx;
-
-		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
-			/* will be overridden if occupancy monitoring exists */
-			c->x86_cache_max_rmid = ebx;
-
-			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
-			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
-			c->x86_capability[CPUID_F_1_EDX] = edx;
-
-			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
-			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
-			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
-				c->x86_cache_max_rmid = ecx;
-				c->x86_cache_occ_scale = ebx;
-			}
-		} else {
-			c->x86_cache_max_rmid = -1;
-			c->x86_cache_occ_scale = -1;
-		}
-	}
-
 	/* AMD-defined flags: level 0x80000001 */
 	eax = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = eax;
@@ -889,6 +894,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 
 	init_scattered_cpuid_features(c);
 	init_speculation_control(c);
+	init_cqm(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, after probe.
-- 
2.21.0

---

This way you have *pure* code movement only.

And then your second patch turns into this, which shows *exactly* what
has been changed in init_cqm().

Please have a look and send me only the now third patch with corrected
commit message.

Thx.

---
From e33527b8cde8bef84cdc90651d1a1c7a9a5234d7 Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu@intel.com>
Date: Wed, 19 Jun 2019 18:51:09 +0200
Subject: [PATCH] x86/cpufeatures: Combine word 11 and 12 into a new
 scattered features word
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
whole feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be
added in word 11 in the future.

Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
Linux-defined leaf.

Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
name in the next patch when CPUID.7.1:EAX occupies world 12.

Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
---
 arch/x86/include/asm/cpufeature.h  |  4 ++--
 arch/x86/include/asm/cpufeatures.h | 17 +++++++------
 arch/x86/kernel/cpu/common.c       | 38 ++++++++++++------------------
 arch/x86/kernel/cpu/cpuid-deps.c   |  3 +++
 arch/x86/kernel/cpu/scattered.c    |  4 ++++
 arch/x86/kvm/cpuid.h               |  2 --
 6 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
 	CPUID_LNX_3,
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
-	CPUID_F_0_EDX,
-	CPUID_F_1_EDX,
+	CPUID_LNX_4,
+	CPUID_DUMMY,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1017b9c7dfe0..be858b86023a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -271,13 +271,16 @@
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
 
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC		(11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC	(12*32+ 0) /* LLC occupancy monitoring */
-#define X86_FEATURE_CQM_MBM_TOTAL	(12*32+ 1) /* LLC Total MBM monitoring */
-#define X86_FEATURE_CQM_MBM_LOCAL	(12*32+ 2) /* LLC Local MBM monitoring */
+/*
+ * Extended auxiliary flags: Linux defined - for features scattered in various
+ * CPUID levels like 0xf, etc.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC		(11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC	(11*32+ 1) /* LLC occupancy monitoring */
+#define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
+#define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fe6ed9696467..efb114298cfb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -803,33 +803,25 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
 
 static void init_cqm(struct cpuinfo_x86 *c)
 {
-	u32 eax, ebx, ecx, edx;
-
-	/* Additional Intel-defined flags: level 0x0000000F */
-	if (c->cpuid_level >= 0x0000000F) {
+	if (!cpu_has(c, X86_FEATURE_CQM_LLC)) {
+		c->x86_cache_max_rmid  = -1;
+		c->x86_cache_occ_scale = -1;
+		return;
+	}
 
-		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
-		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[CPUID_F_0_EDX] = edx;
+	/* will be overridden if occupancy monitoring exists */
+	c->x86_cache_max_rmid = cpuid_ebx(0xf);
 
-		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
-			/* will be overridden if occupancy monitoring exists */
-			c->x86_cache_max_rmid = ebx;
+	if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
+	    cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
+	    cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+		u32 eax, ebx, ecx, edx;
 
-			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
-			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
-			c->x86_capability[CPUID_F_1_EDX] = edx;
+		/* QoS sub-leaf, EAX=0Fh, ECX=1 */
+		cpuid_count(0xf, 1, &eax, &ebx, &ecx, &edx);
 
-			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
-			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
-			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
-				c->x86_cache_max_rmid = ecx;
-				c->x86_cache_occ_scale = ebx;
-			}
-		} else {
-			c->x86_cache_max_rmid = -1;
-			c->x86_cache_occ_scale = -1;
-		}
+		c->x86_cache_max_rmid  = ecx;
+		c->x86_cache_occ_scale = ebx;
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..fa07a224e7b9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_4VNNIW,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_4FMAPS,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..adf9b71386ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,10 @@ struct cpuid_bit {
 static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_APERFMPERF,       CPUID_ECX,  0, 0x00000006, 0 },
 	{ X86_FEATURE_EPB,		CPUID_ECX,  3, 0x00000006, 0 },
+	{ X86_FEATURE_CQM_LLC,		CPUID_EDX,  1, 0x0000000f, 0 },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	CPUID_EDX,  0, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	CPUID_EDX,  1, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	CPUID_EDX,  2, 0x0000000f, 1 },
 	{ X86_FEATURE_CAT_L3,		CPUID_EBX,  1, 0x00000010, 0 },
 	{ X86_FEATURE_CAT_L2,		CPUID_EBX,  2, 0x00000010, 0 },
 	{ X86_FEATURE_CDP_L3,		CPUID_ECX,  2, 0x00000010, 1 },
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5b6d1f..d78a61408243 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -47,8 +47,6 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0001_ECX] = {0x80000001, 0, CPUID_ECX},
 	[CPUID_7_0_EBX]       = {         7, 0, CPUID_EBX},
 	[CPUID_D_1_EAX]       = {       0xd, 1, CPUID_EAX},
-	[CPUID_F_0_EDX]       = {       0xf, 0, CPUID_EDX},
-	[CPUID_F_1_EDX]       = {       0xf, 1, CPUID_EDX},
 	[CPUID_8000_0008_EBX] = {0x80000008, 0, CPUID_EBX},
 	[CPUID_6_EAX]         = {         6, 0, CPUID_EAX},
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
-- 
2.21.0


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11
  2019-06-19 17:36   ` Borislav Petkov
@ 2019-06-19 21:24     ` Fenghua Yu
  0 siblings, 0 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-06-19 21:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jun 19, 2019 at 07:36:28PM +0200, Borislav Petkov wrote:
> On Mon, Jun 17, 2019 at 11:00:15AM -0700, Fenghua Yu wrote:
> > @@ -832,33 +857,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
> >  		c->x86_capability[CPUID_D_1_EAX] = eax;
> > -	/* Additional Intel-defined flags: level 0x0000000F */
> > -	if (c->cpuid_level >= 0x0000000F) {
> What I meant with having a separate patch doing the carve out is to have
> a single patch doing *only* code movement - no changes, no nothing. So
> that it is clear what happens. Intermixing code movement and changes is
> a bad idea and hard to review.
> 
> IOW, I did this:
> 
> ---
> From cef4f58a3da0465bbff33b2d669cc600b775f3ba Mon Sep 17 00:00:00 2001
> From: Borislav Petkov <bp@suse.de>
> Date: Wed, 19 Jun 2019 17:24:34 +0200
> Subject: [PATCH] x86/cpufeatures: Carve out CQM features retrieval
> 
> ... into a separate function for better readability. Split out from a
> patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
> sole code movement separate for easy review.
> 
> No functional changes.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: x86@kernel.org
> ---
>  arch/x86/kernel/cpu/common.c | 60 ++++++++++++++++++++----------------
>  1 file changed, 33 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2c57fffebf9b..fe6ed9696467 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -801,6 +801,38 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>  	}
>  }
>  
> +static void init_cqm(struct cpuinfo_x86 *c)
> +{
> +	u32 eax, ebx, ecx, edx;
> 
> This way you have *pure* code movement only.
> 
> And then your second patch turns into this, which shows *exactly* what
> has been changed in init_cqm().

Yes, the added patch makes this patch set more clear and readable.

> 
> Please have a look and send me only the now third patch with corrected
> commit message.
> 
> From e33527b8cde8bef84cdc90651d1a1c7a9a5234d7 Mon Sep 17 00:00:00 2001
> From: Fenghua Yu <fenghua.yu@intel.com>
> Date: Wed, 19 Jun 2019 18:51:09 +0200
> Subject: [PATCH] x86/cpufeatures: Combine word 11 and 12 into a new
>  scattered features word
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
> whole feature bits words. To better utilize feature words, re-define
> word 11 to host scattered features and move the four X86_FEATURE_CQM_*
> features into Linux defined word 11. More scattered features can be
> added in word 11 in the future.

I checked and tested the updated patch set (now three patches). They
look much better than v2.

I will send you the now third patch with corrected commit message
in the other email thread.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions
  2019-06-19 17:31   ` Borislav Petkov
@ 2019-06-19 21:34     ` Fenghua Yu
  2019-06-20 10:37       ` Borislav Petkov
  0 siblings, 1 reply; 11+ messages in thread
From: Fenghua Yu @ 2019-06-19 21:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jun 19, 2019 at 07:31:40PM +0200, Borislav Petkov wrote:
> On Mon, Jun 17, 2019 at 11:00:16AM -0700, Fenghua Yu wrote:
> > AVX512 Vector Neural Network Instructions (VNNI) in Intel Deep Learning
> > Boost support BFLOAT16 format (BF16).
> 
> That sentence is a mouthful and I have no clue what it means. Marketing
> junk? If so, either rewrite it for mere mortals or kill it.
> 
> > BF16 is a short version of FP32
> 
> FP32?
> 
> Please write out.
> 
> > and has several advantages over FP16.
> 
> Ditto.
> 
> > BF16 offers more than enough range for
> > deep learning training tasks and doesn't need to handle hardware exception
> > as this is a performance optimization. FP32 accumulation after the
> > multiply is essential to achieve sufficient numerical behavior on an
> > application level.
> > 
> > AVX512 BFLOAT16 instructions can be enumerated by:
> > 	CPUID.7.1:EAX[bit 5] AVX512_BF16
> > 
> > Use word 12, which is empty now, to hold features in CPUID.7.1:EAX
> > including AVX512_BF16.
> 
> ... because that leaf is features only, right?
> 
> > Leaf CPUID_DUMMY is renamed as CPUID_7_1_EAX.
> 
> That's obvious from the patch, ain't it?
> 

Hi, Boris,

I corrected the commit message per your comment. Except the commit
message, I didn't change anything else.

Now I send the updated patch here. Is the patch right now?
Should I send the patch to you in another thread?

Thanks.

-Fenghua


From 6265ae0a0fac5ecf7e67b8bfc4b6791afb88e56c Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu@intel.com>
Date: Tue, 11 Jun 2019 23:06:05 +0000
Subject: [PATCH v3 3/3] x86/cpufeatures: Enumerate new AVX512 BFLOAT16
 instructions

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

BF16 is a short version of 32-bit single-precision floating-point
format (FP32) and has several advantages over 16-bit half-precision
floating-point format (FP16). BF16 keeps FP32 accumulation after
multiplication without loss of precision, offers more than enough
range for deep learning training tasks, and doesn't need to handle
hardware exception.

AVX512 BFLOAT16 instructions can be enumerated by:
	CPUID.7.1:EAX[bit 5] AVX512_BF16

CPUID.7.1:EAX contains only feature bits. Re-define currently empty
word 12 as a pure features word to hold the feature bits including
AVV512_BF16.

Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeature.h  | 2 +-
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 3 +++
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 403f70c2e431..58acda503817 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -23,7 +23,7 @@ enum cpuid_leafs
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
 	CPUID_LNX_4,
-	CPUID_DUMMY,
+	CPUID_7_1_EAX,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 7103007abe17..a49498834c6c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -280,6 +280,9 @@
 #define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
+/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
+#define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_IRPERF		(13*32+ 1) /* Instructions Retired Count */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index efb114298cfb..4910cb421b82 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -847,6 +847,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_7_0_EBX] = ebx;
 		c->x86_capability[CPUID_7_ECX] = ecx;
 		c->x86_capability[CPUID_7_EDX] = edx;
+
+		cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[CPUID_7_1_EAX] = eax;
 	}
 
 	/* Extended state features: level 0x0000000d */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index fa07a224e7b9..a444028d8145 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -62,6 +62,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_AVX512_BF16,	X86_FEATURE_AVX512VL  },
 	{}
 };
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions
  2019-06-19 21:34     ` Fenghua Yu
@ 2019-06-20 10:37       ` Borislav Petkov
  2019-06-20 14:06         ` Fenghua Yu
  0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2019-06-20 10:37 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jun 19, 2019 at 02:34:04PM -0700, Fenghua Yu wrote:
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index efb114298cfb..4910cb421b82 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -847,6 +847,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>  		c->x86_capability[CPUID_7_0_EBX] = ebx;
>  		c->x86_capability[CPUID_7_ECX] = ecx;
>  		c->x86_capability[CPUID_7_EDX] = edx;
> +
> +		cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
> +		c->x86_capability[CPUID_7_1_EAX] = eax;
>  	}

You need to test the sub-leaf index validity here before accessing
subleaf 1:

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4910cb421b82..dad20bc891d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -848,8 +848,11 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_7_ECX] = ecx;
 		c->x86_capability[CPUID_7_EDX] = edx;
 
-		cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[CPUID_7_1_EAX] = eax;
+		/* Check valid sub-leaf index before accessing it */
+		if (eax >= 1) {
+			cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
+			c->x86_capability[CPUID_7_1_EAX] = eax;
+		}
 	}
 
 	/* Extended state features: level 0x0000000d */

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
  2019-06-17 18:00 ` [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11 Fenghua Yu
  2019-06-19 17:36   ` Borislav Petkov
@ 2019-06-20 11:37   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-20 11:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: pfeiner, mingo, ravi.v.shankar, Thomas.Lendacky, linux-kernel,
	hpa, sean.j.christopherson, luto, rafael.j.wysocki, mhiramat,
	jgross, yamada.masahiro, fenghua.yu, chang.seok.bae, peterz,
	pasha.tatashin, kvm, frederic, babu.moger, aaronlewis, bp,
	konrad.wilk, x86, sherry.hurwitz, pbonzini, jannh, namit, mingo,
	tglx, rkrcmar

Commit-ID:  acec0ce081de0c36459eea91647faf99296445a3
Gitweb:     https://git.kernel.org/tip/acec0ce081de0c36459eea91647faf99296445a3
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:51:09 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Thu, 20 Jun 2019 12:38:44 +0200

x86/cpufeatures: Combine word 11 and 12 into a new scattered features word

It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
whole feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be
added in word 11 in the future.

Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
Linux-defined leaf.

Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
name in the next patch when CPUID.7.1:EAX occupies world 12.

Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
---
 arch/x86/include/asm/cpufeature.h  |  4 ++--
 arch/x86/include/asm/cpufeatures.h | 17 ++++++++++-------
 arch/x86/kernel/cpu/common.c       | 38 +++++++++++++++-----------------------
 arch/x86/kernel/cpu/cpuid-deps.c   |  3 +++
 arch/x86/kernel/cpu/scattered.c    |  4 ++++
 arch/x86/kvm/cpuid.h               |  2 --
 6 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
 	CPUID_LNX_3,
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
-	CPUID_F_0_EDX,
-	CPUID_F_1_EDX,
+	CPUID_LNX_4,
+	CPUID_DUMMY,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1017b9c7dfe0..be858b86023a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -271,13 +271,16 @@
 #define X86_FEATURE_XGETBV1		(10*32+ 2) /* XGETBV with ECX = 1 instruction */
 #define X86_FEATURE_XSAVES		(10*32+ 3) /* XSAVES/XRSTORS instructions */
 
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC		(11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC	(12*32+ 0) /* LLC occupancy monitoring */
-#define X86_FEATURE_CQM_MBM_TOTAL	(12*32+ 1) /* LLC Total MBM monitoring */
-#define X86_FEATURE_CQM_MBM_LOCAL	(12*32+ 2) /* LLC Local MBM monitoring */
+/*
+ * Extended auxiliary flags: Linux defined - for features scattered in various
+ * CPUID levels like 0xf, etc.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC		(11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC	(11*32+ 1) /* LLC occupancy monitoring */
+#define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
+#define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fe6ed9696467..efb114298cfb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -803,33 +803,25 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
 
 static void init_cqm(struct cpuinfo_x86 *c)
 {
-	u32 eax, ebx, ecx, edx;
-
-	/* Additional Intel-defined flags: level 0x0000000F */
-	if (c->cpuid_level >= 0x0000000F) {
+	if (!cpu_has(c, X86_FEATURE_CQM_LLC)) {
+		c->x86_cache_max_rmid  = -1;
+		c->x86_cache_occ_scale = -1;
+		return;
+	}
 
-		/* QoS sub-leaf, EAX=0Fh, ECX=0 */
-		cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[CPUID_F_0_EDX] = edx;
+	/* will be overridden if occupancy monitoring exists */
+	c->x86_cache_max_rmid = cpuid_ebx(0xf);
 
-		if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
-			/* will be overridden if occupancy monitoring exists */
-			c->x86_cache_max_rmid = ebx;
+	if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
+	    cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
+	    cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+		u32 eax, ebx, ecx, edx;
 
-			/* QoS sub-leaf, EAX=0Fh, ECX=1 */
-			cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
-			c->x86_capability[CPUID_F_1_EDX] = edx;
+		/* QoS sub-leaf, EAX=0Fh, ECX=1 */
+		cpuid_count(0xf, 1, &eax, &ebx, &ecx, &edx);
 
-			if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
-			      ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
-			       (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
-				c->x86_cache_max_rmid = ecx;
-				c->x86_cache_occ_scale = ebx;
-			}
-		} else {
-			c->x86_cache_max_rmid = -1;
-			c->x86_cache_occ_scale = -1;
-		}
+		c->x86_cache_max_rmid  = ecx;
+		c->x86_cache_occ_scale = ebx;
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..fa07a224e7b9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,9 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_AVX512_4VNNIW,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_4FMAPS,	X86_FEATURE_AVX512F   },
 	{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..adf9b71386ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,10 @@ struct cpuid_bit {
 static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_APERFMPERF,       CPUID_ECX,  0, 0x00000006, 0 },
 	{ X86_FEATURE_EPB,		CPUID_ECX,  3, 0x00000006, 0 },
+	{ X86_FEATURE_CQM_LLC,		CPUID_EDX,  1, 0x0000000f, 0 },
+	{ X86_FEATURE_CQM_OCCUP_LLC,	CPUID_EDX,  0, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_TOTAL,	CPUID_EDX,  1, 0x0000000f, 1 },
+	{ X86_FEATURE_CQM_MBM_LOCAL,	CPUID_EDX,  2, 0x0000000f, 1 },
 	{ X86_FEATURE_CAT_L3,		CPUID_EBX,  1, 0x00000010, 0 },
 	{ X86_FEATURE_CAT_L2,		CPUID_EBX,  2, 0x00000010, 0 },
 	{ X86_FEATURE_CDP_L3,		CPUID_ECX,  2, 0x00000010, 1 },
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5b6d1f..d78a61408243 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -47,8 +47,6 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0001_ECX] = {0x80000001, 0, CPUID_ECX},
 	[CPUID_7_0_EBX]       = {         7, 0, CPUID_EBX},
 	[CPUID_D_1_EAX]       = {       0xd, 1, CPUID_EAX},
-	[CPUID_F_0_EDX]       = {       0xf, 0, CPUID_EDX},
-	[CPUID_F_1_EDX]       = {       0xf, 1, CPUID_EDX},
 	[CPUID_8000_0008_EBX] = {0x80000008, 0, CPUID_EBX},
 	[CPUID_6_EAX]         = {         6, 0, CPUID_EAX},
 	[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip:x86/cpu] x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions
  2019-06-17 18:00 ` [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions Fenghua Yu
  2019-06-19 17:31   ` Borislav Petkov
@ 2019-06-20 11:38   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-20 11:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, bp, robert.hu, Thomas.Lendacky, chang.seok.bae,
	pasha.tatashin, linux-kernel, tglx, frederic, mpe, mingo, hpa,
	jannh, pfeiner, rafael.j.wysocki, sean.j.christopherson, x86,
	yamada.masahiro, mingo, pbonzini, ravi.v.shankar, namit, rkrcmar

Commit-ID:  b302e4b176d00e1cbc80148c5d0aee36751f7480
Gitweb:     https://git.kernel.org/tip/b302e4b176d00e1cbc80148c5d0aee36751f7480
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Mon, 17 Jun 2019 11:00:16 -0700
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Thu, 20 Jun 2019 12:38:49 +0200

x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

BF16 is a short version of 32-bit single-precision floating-point
format (FP32) and has several advantages over 16-bit half-precision
floating-point format (FP16). BF16 keeps FP32 accumulation after
multiplication without loss of precision, offers more than enough
range for deep learning training tasks, and doesn't need to handle
hardware exception.

AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
AVX512_BF16.

CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
word 12 as a pure features word to hold the feature bits including
AVX512_BF16.

Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.

 [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: Robert Hoo <robert.hu@linux.intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
---
 arch/x86/include/asm/cpufeature.h  | 2 +-
 arch/x86/include/asm/cpufeatures.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 6 ++++++
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 403f70c2e431..58acda503817 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -23,7 +23,7 @@ enum cpuid_leafs
 	CPUID_7_0_EBX,
 	CPUID_D_1_EAX,
 	CPUID_LNX_4,
-	CPUID_DUMMY,
+	CPUID_7_1_EAX,
 	CPUID_8000_0008_EBX,
 	CPUID_6_EAX,
 	CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index be858b86023a..8ecd9fac97c3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -282,6 +282,9 @@
 #define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 
+/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
+#define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
 #define X86_FEATURE_IRPERF		(13*32+ 1) /* Instructions Retired Count */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index efb114298cfb..dad20bc891d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -847,6 +847,12 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_7_0_EBX] = ebx;
 		c->x86_capability[CPUID_7_ECX] = ecx;
 		c->x86_capability[CPUID_7_EDX] = edx;
+
+		/* Check valid sub-leaf index before accessing it */
+		if (eax >= 1) {
+			cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
+			c->x86_capability[CPUID_7_1_EAX] = eax;
+		}
 	}
 
 	/* Extended state features: level 0x0000000d */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index fa07a224e7b9..a444028d8145 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -62,6 +62,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,	X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,	X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_AVX512_BF16,	X86_FEATURE_AVX512VL  },
 	{}
 };
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions
  2019-06-20 10:37       ` Borislav Petkov
@ 2019-06-20 14:06         ` Fenghua Yu
  0 siblings, 0 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-06-20 14:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Christopherson Sean J, Paolo Bonzini, Radim Krcmar,
	Ravi V Shankar, linux-kernel, x86

On Thu, Jun 20, 2019 at 12:37:20PM +0200, Borislav Petkov wrote:
> On Wed, Jun 19, 2019 at 02:34:04PM -0700, Fenghua Yu wrote:
> You need to test the sub-leaf index validity here before accessing
> subleaf 1:
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 4910cb421b82..dad20bc891d5 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -848,8 +848,11 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>  		c->x86_capability[CPUID_7_ECX] = ecx;
>  		c->x86_capability[CPUID_7_EDX] = edx;
>  
> -		cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
> -		c->x86_capability[CPUID_7_1_EAX] = eax;
> +		/* Check valid sub-leaf index before accessing it */
> +		if (eax >= 1) {
> +			cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
> +			c->x86_capability[CPUID_7_1_EAX] = eax;
> +		}

You are right.

I tested the three patches in tip tree and they work as expected.

Thank you very much!

-Fenghua

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-06-20 14:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-17 18:00 [PATCH v2 0/2] x86/cpufeatures: Re-arrange a few features and enumerate AVX512 BFLOAT16 intructions Fenghua Yu
2019-06-17 18:00 ` [PATCH v2 1/2] x86/cpufeatures: Combine word 11 and 12 into new scattered features word 11 Fenghua Yu
2019-06-19 17:36   ` Borislav Petkov
2019-06-19 21:24     ` Fenghua Yu
2019-06-20 11:37   ` [tip:x86/cpu] x86/cpufeatures: Combine word 11 and 12 into a new scattered features word tip-bot for Fenghua Yu
2019-06-17 18:00 ` [PATCH v2 2/2] x86/cpufeatures: Enumerate new AVX512 BFLOAT16 instructions Fenghua Yu
2019-06-19 17:31   ` Borislav Petkov
2019-06-19 21:34     ` Fenghua Yu
2019-06-20 10:37       ` Borislav Petkov
2019-06-20 14:06         ` Fenghua Yu
2019-06-20 11:38   ` [tip:x86/cpu] x86/cpufeatures: Enumerate the " tip-bot for Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).