linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
@ 2018-12-18  4:22 Aubrey Li
  2018-12-18  4:22 ` [PATCH v6 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status Aubrey Li
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Aubrey Li @ 2018-12-18  4:22 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, aubrey.li, linux-kernel, Aubrey Li

User space tools which do automated task placement need information
about AVX-512 usage of tasks, because AVX-512 usage could cause core
turbo frequency drop and impact the running task on the sibling CPU.

The XSAVE hardware structure has bits that indicate when valid state
is present in registers unique to AVX-512 use.  Use these bits to
indicate when AVX-512 has been in use and add per-task AVX-512 state
timestamp tracking to context switch.

Well-written AVX-512 applications are expected to clear the AVX-512
state when not actively using AVX-512 registers, so the tracking
mechanism is imprecise and can theoretically miss AVX-512 usage during
context switch. But it has been measured to be precise enough to be
useful under real-world workloads like tensorflow and linpack.

If higher precision is required, suggest user space tools to use the
PMU-based mechanisms in combination.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/fpu/internal.h | 7 +++++++
 arch/x86/include/asm/fpu/types.h    | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index a38bf5a1e37a..8778ac172255 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
 {
 	if (likely(use_xsave())) {
 		copy_xregs_to_kernel(&fpu->state.xsave);
+
+		/*
+		 * AVX512 state is tracked here because its use is
+		 * known to slow the max clock speed of the core.
+		 */
+		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+			fpu->avx512_timestamp = jiffies_64;
 		return 1;
 	}
 
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecf..81393dabdb46 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -302,6 +302,13 @@ struct fpu {
 	 */
 	unsigned char			initialized;
 
+	/*
+	 * @avx512_timestamp:
+	 *
+	 * Records the timestamp of AVX512 use during last context switch.
+	 */
+	u64				avx512_timestamp;
+
 	/*
 	 * @state:
 	 *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status
  2018-12-18  4:22 [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Aubrey Li
@ 2018-12-18  4:22 ` Aubrey Li
  2018-12-18  4:22 ` [PATCH v6 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
  2018-12-18 14:14 ` [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Aubrey Li @ 2018-12-18  4:22 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, aubrey.li, linux-kernel, Aubrey Li

AVX-512 components use could cause core turbo frequency drop. So
it's useful to expose AVX-512 usage elapsed time as a heuristic hint
for the user space job scheduler to cluster the AVX-512 using tasks
together.

Example:
$ cat /proc/pid/status | grep AVX512_elapsed_ms
AVX512_elapsed_ms:	1020

The number '1020' denotes 1020 millisecond elapsed since last time
context switch the off-CPU task using AVX-512 components, thus the
task could cause core frequency drop.

Or:
$ cat /proc/pid/status | grep AVX512_elapsed_ms
AVX512_elapsed_ms:	-1

The number '-1' indicates the task didn't use AVX-512 components
before thus unlikely has frequency drop issue.

User space tools may want to further check by:

$ perf stat --pid <pid> -e core_power.lvl2_turbo_license -- sleep 1

 Performance counter stats for process id '3558':

     3,251,565,961      core_power.lvl2_turbo_license

       1.004031387 seconds time elapsed

Non-zero counter value confirms that the task causes frequency drop.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/kernel/fpu/xstate.c | 34 ++++++++++++++++++++++++++++++++++
 fs/proc/array.c              |  5 +++++
 2 files changed, 39 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7642d3..d084b1dc80a6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/mman.h>
 #include <linux/pkeys.h>
+#include <linux/seq_file.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/internal.h>
@@ -1245,3 +1246,36 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 
 	return 0;
 }
+
+/*
+ * Report the amount of time elapsed in millisecond since last AVX512
+ * use in the task.
+ */
+void avx512_state(struct seq_file *m, struct task_struct *task)
+{
+	u64	timestamp = task->thread.fpu.avx512_timestamp;
+	s64	delta;
+
+	if (!timestamp)
+		delta = -1;
+	else {
+		WARN_ON_ONCE(jiffies_64 < timestamp);
+		delta = div_u64(jiffies64_to_nsecs(jiffies_64 - timestamp),
+				NSEC_PER_MSEC);
+	}
+
+	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
+	seq_putc(m, '\n');
+}
+
+/*
+ * Report CPU specific thread state
+ */
+void arch_task_state(struct seq_file *m, struct task_struct *task)
+{
+	/*
+	 * Report AVX512 state if the processor and build option supported.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
+		avx512_state(m, task);
+}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 0ceb3b6b37e7..dd88c2219f08 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -392,6 +392,10 @@ static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm)
 	seq_putc(m, '\n');
 }
 
+void __weak arch_task_state(struct seq_file *m, struct task_struct *task)
+{
+}
+
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
@@ -414,6 +418,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 	task_cpus_allowed(m, task);
 	cpuset_task_status_allowed(m, task);
 	task_context_switch_counts(m, task);
+	arch_task_state(m, task);
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms
  2018-12-18  4:22 [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Aubrey Li
  2018-12-18  4:22 ` [PATCH v6 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status Aubrey Li
@ 2018-12-18  4:22 ` Aubrey Li
  2018-12-18 14:14 ` [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Aubrey Li @ 2018-12-18  4:22 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, aubrey.li, linux-kernel, Aubrey Li

Added AVX512_elapsed_ms in /proc/<pid>/status. Report it
in Documentation/filesystems/proc.txt

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/filesystems/proc.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 520f6a84cf50..c4be304bce69 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -197,6 +197,7 @@ read the file /proc/PID/status:
   Seccomp:        0
   voluntary_ctxt_switches:        0
   nonvoluntary_ctxt_switches:     1
+  AVX512_elapsed_ms:	1020
 
 This shows you nearly the same information you would get if you viewed it with
 the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
@@ -214,7 +215,7 @@ asynchronous manner and the value may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
 
-Table 1-2: Contents of the status files (as of 4.8)
+Table 1-2: Contents of the status files (as of 4.21)
 ..............................................................................
  Field                       Content
  Name                        filename of the executable
@@ -275,6 +276,7 @@ Table 1-2: Contents of the status files (as of 4.8)
  Mems_allowed_list           Same as previous, but in "list format"
  voluntary_ctxt_switches     number of voluntary context switches
  nonvoluntary_ctxt_switches  number of non voluntary context switches
+ AVX512_elapsed_ms           time elapsed since last AVX512 use in millisecond
 ..............................................................................
 
 Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18  4:22 [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Aubrey Li
  2018-12-18  4:22 ` [PATCH v6 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status Aubrey Li
  2018-12-18  4:22 ` [PATCH v6 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
@ 2018-12-18 14:14 ` Thomas Gleixner
  2018-12-18 15:11   ` Li, Aubrey
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2018-12-18 14:14 UTC (permalink / raw)
  To: Aubrey Li
  Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan,
	linux-kernel, Aubrey Li

On Tue, 18 Dec 2018, Aubrey Li wrote:
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index a38bf5a1e37a..8778ac172255 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
>  {
>  	if (likely(use_xsave())) {
>  		copy_xregs_to_kernel(&fpu->state.xsave);
> +
> +		/*
> +		 * AVX512 state is tracked here because its use is
> +		 * known to slow the max clock speed of the core.
> +		 */
> +		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
> +			fpu->avx512_timestamp = jiffies_64;

Even if unlikely this is incorrect when running a 32 bit kernel because
there jiffies_64 cannot be atomically loaded vs. a concurrent update. See
the comment in include/linux/jiffies.h right above the jiffies_64
declaration.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 14:14 ` [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Thomas Gleixner
@ 2018-12-18 15:11   ` Li, Aubrey
  2018-12-18 15:32     ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Li, Aubrey @ 2018-12-18 15:11 UTC (permalink / raw)
  To: Thomas Gleixner, Aubrey Li
  Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, linux-kernel

On 2018/12/18 22:14, Thomas Gleixner wrote:
> On Tue, 18 Dec 2018, Aubrey Li wrote:
>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>> index a38bf5a1e37a..8778ac172255 100644
>> --- a/arch/x86/include/asm/fpu/internal.h
>> +++ b/arch/x86/include/asm/fpu/internal.h
>> @@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
>>  {
>>  	if (likely(use_xsave())) {
>>  		copy_xregs_to_kernel(&fpu->state.xsave);
>> +
>> +		/*
>> +		 * AVX512 state is tracked here because its use is
>> +		 * known to slow the max clock speed of the core.
>> +		 */
>> +		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
>> +			fpu->avx512_timestamp = jiffies_64;
> 
> Even if unlikely this is incorrect when running a 32 bit kernel because
> there jiffies_64 cannot be atomically loaded vs. a concurrent update. See
> the comment in include/linux/jiffies.h right above the jiffies_64
> declaration.
> 
Yeah, I noticed this, because this is under use_xsave() condition, also need
valid AVX512 state, so a 32 bit kernel won't enter this branch.

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 15:11   ` Li, Aubrey
@ 2018-12-18 15:32     ` Thomas Gleixner
  2018-12-18 16:28       ` Li, Aubrey
  2018-12-18 17:14       ` Dave Hansen
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2018-12-18 15:32 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Aubrey Li, mingo, peterz, hpa, ak, tim.c.chen, dave.hansen,
	arjan, linux-kernel

On Tue, 18 Dec 2018, Li, Aubrey wrote:

> On 2018/12/18 22:14, Thomas Gleixner wrote:
> > On Tue, 18 Dec 2018, Aubrey Li wrote:
> >> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> >> index a38bf5a1e37a..8778ac172255 100644
> >> --- a/arch/x86/include/asm/fpu/internal.h
> >> +++ b/arch/x86/include/asm/fpu/internal.h
> >> @@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
> >>  {
> >>  	if (likely(use_xsave())) {
> >>  		copy_xregs_to_kernel(&fpu->state.xsave);
> >> +
> >> +		/*
> >> +		 * AVX512 state is tracked here because its use is
> >> +		 * known to slow the max clock speed of the core.
> >> +		 */
> >> +		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
> >> +			fpu->avx512_timestamp = jiffies_64;
> > 
> > Even if unlikely this is incorrect when running a 32 bit kernel because
> > there jiffies_64 cannot be atomically loaded vs. a concurrent update. See
> > the comment in include/linux/jiffies.h right above the jiffies_64
> > declaration.
> > 
> Yeah, I noticed this, because this is under use_xsave() condition, also need
> valid AVX512 state, so a 32 bit kernel won't enter this branch.

What exactly prevents a 32bit kernel from having the AVX512 feature bit
set? And if it cannot be set on 32bit, then why are you compiling that code
in at all?

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 15:32     ` Thomas Gleixner
@ 2018-12-18 16:28       ` Li, Aubrey
  2018-12-18 21:38         ` Andi Kleen
  2018-12-18 17:14       ` Dave Hansen
  1 sibling, 1 reply; 14+ messages in thread
From: Li, Aubrey @ 2018-12-18 16:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Aubrey Li, mingo, peterz, hpa, ak, tim.c.chen, dave.hansen,
	arjan, linux-kernel

On 2018/12/18 23:32, Thomas Gleixner wrote:
> On Tue, 18 Dec 2018, Li, Aubrey wrote:
> 
>> On 2018/12/18 22:14, Thomas Gleixner wrote:
>>> On Tue, 18 Dec 2018, Aubrey Li wrote:
>>>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>>>> index a38bf5a1e37a..8778ac172255 100644
>>>> --- a/arch/x86/include/asm/fpu/internal.h
>>>> +++ b/arch/x86/include/asm/fpu/internal.h
>>>> @@ -411,6 +411,13 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
>>>>  {
>>>>  	if (likely(use_xsave())) {
>>>>  		copy_xregs_to_kernel(&fpu->state.xsave);
>>>> +
>>>> +		/*
>>>> +		 * AVX512 state is tracked here because its use is
>>>> +		 * known to slow the max clock speed of the core.
>>>> +		 */
>>>> +		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
>>>> +			fpu->avx512_timestamp = jiffies_64;
>>>
>>> Even if unlikely this is incorrect when running a 32 bit kernel because
>>> there jiffies_64 cannot be atomically loaded vs. a concurrent update. See
>>> the comment in include/linux/jiffies.h right above the jiffies_64
>>> declaration.
>>>
>> Yeah, I noticed this, because this is under use_xsave() condition, also need
>> valid AVX512 state, so a 32 bit kernel won't enter this branch.
> 
> What exactly prevents a 32bit kernel from having the AVX512 feature bit
> set? And if it cannot be set on 32bit, then why are you compiling that code
> in at all?
> 

I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
includes jiffies_lock ops so not good in context switch. So I want to use raw
jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
Other time source are expensive here.

Should I limit the code only running on 64bit kernel? 

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 15:32     ` Thomas Gleixner
  2018-12-18 16:28       ` Li, Aubrey
@ 2018-12-18 17:14       ` Dave Hansen
  2018-12-18 23:23         ` Li, Aubrey
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2018-12-18 17:14 UTC (permalink / raw)
  To: Thomas Gleixner, Li, Aubrey
  Cc: Aubrey Li, mingo, peterz, hpa, ak, tim.c.chen, arjan, linux-kernel

On 12/18/18 7:32 AM, Thomas Gleixner wrote:
> What exactly prevents a 32bit kernel from having the AVX512 feature bit
> set? And if it cannot be set on 32bit, then why are you compiling that code
> in at all?

There are three different AVX-512 states (and three bits) which Aubrey's
patch checks.  All three have different rules.  Here's a summary along
with some relevant SDM quotes from Vol1-13.6.

Opmask state: All opmask registers can be set in 32-bit mode.
ZMM_Hi256 state: "An execution of XRSTOR or XRSTORS outside 64-bit mode
		  does not update ZMM8_H–ZMM15_H."  This implies that
		  ZMM0_H-ZMM7_H *are* updated in 32-bit mode.
Hi16_ZMM state: "Outside 64-bit mode, Hi16_ZMM state is always in its
		 initial configuration."

All of Hi16_ZMM and *part* of ZMM_Hi256 can not be practically used in
32-bit mode.  But, even using part of ZMM_Hi256 means the xfeature bit
will be set.

So, 2/3 of the features can be used in 32-bit mode.  Nothing that I can
find _prevents_ those features from being used in 32-bit mode.

Aubrey, do you have information to the contrary?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 16:28       ` Li, Aubrey
@ 2018-12-18 21:38         ` Andi Kleen
  2018-12-18 21:44           ` Dave Hansen
  2018-12-19  0:26           ` Li, Aubrey
  0 siblings, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2018-12-18 21:38 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Thomas Gleixner, Aubrey Li, mingo, peterz, hpa, tim.c.chen,
	dave.hansen, arjan, linux-kernel

> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
> kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
> includes jiffies_lock ops so not good in context switch. So I want to use raw
> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
> Other time source are expensive here.
> 
> Should I limit the code only running on 64bit kernel? 

Yes making it 64bit only should be fine.

Other alternative would be to use 32bit jiffies on 32bit. I assume
wrapping is not that big a problem here.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 21:38         ` Andi Kleen
@ 2018-12-18 21:44           ` Dave Hansen
  2018-12-18 22:05             ` Andi Kleen
  2018-12-19  0:26           ` Li, Aubrey
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2018-12-18 21:44 UTC (permalink / raw)
  To: Andi Kleen, Li, Aubrey
  Cc: Thomas Gleixner, Aubrey Li, mingo, peterz, hpa, tim.c.chen,
	arjan, linux-kernel

On 12/18/18 1:38 PM, Andi Kleen wrote:
>> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
>> kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
>> includes jiffies_lock ops so not good in context switch. So I want to use raw
>> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
>> Other time source are expensive here.
>>
>> Should I limit the code only running on 64bit kernel? 
> Yes making it 64bit only should be fine.

I think I'd rather just disable AVX512 itself on 32-bit and be done with
it.  I think more than half of the ~2k of XSAVE space that it consumes
in *every* *task* is just pure waste because it has to be 0's.

This ~2k of extra space is also lowmem, which makes it even more valuable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 21:44           ` Dave Hansen
@ 2018-12-18 22:05             ` Andi Kleen
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2018-12-18 22:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Li, Aubrey, Thomas Gleixner, Aubrey Li, mingo, peterz, hpa,
	tim.c.chen, arjan, linux-kernel

On Tue, Dec 18, 2018 at 01:44:41PM -0800, Dave Hansen wrote:
> On 12/18/18 1:38 PM, Andi Kleen wrote:
> >> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
> >> kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
> >> includes jiffies_lock ops so not good in context switch. So I want to use raw
> >> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
> >> Other time source are expensive here.
> >>
> >> Should I limit the code only running on 64bit kernel? 
> > Yes making it 64bit only should be fine.
> 
> I think I'd rather just disable AVX512 itself on 32-bit and be done with
> it.  I think more than half of the ~2k of XSAVE space that it consumes
> in *every* *task* is just pure waste because it has to be 0's.
> 
> This ~2k of extra space is also lowmem, which makes it even more valuable.

That will actually break programs.

If someone compiled binaries with -march=native on a system with AVX512
they wouldn't work anymore.

Don't think we can do it.


-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 17:14       ` Dave Hansen
@ 2018-12-18 23:23         ` Li, Aubrey
  0 siblings, 0 replies; 14+ messages in thread
From: Li, Aubrey @ 2018-12-18 23:23 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner
  Cc: Aubrey Li, mingo, peterz, hpa, ak, tim.c.chen, arjan, linux-kernel

On 2018/12/19 1:14, Dave Hansen wrote:
> On 12/18/18 7:32 AM, Thomas Gleixner wrote:
>> What exactly prevents a 32bit kernel from having the AVX512 feature bit
>> set? And if it cannot be set on 32bit, then why are you compiling that code
>> in at all?
> 
> There are three different AVX-512 states (and three bits) which Aubrey's
> patch checks.  All three have different rules.  Here's a summary along
> with some relevant SDM quotes from Vol1-13.6.
> 
> Opmask state: All opmask registers can be set in 32-bit mode.
> ZMM_Hi256 state: "An execution of XRSTOR or XRSTORS outside 64-bit mode
> 		  does not update ZMM8_H–ZMM15_H."  This implies that
> 		  ZMM0_H-ZMM7_H *are* updated in 32-bit mode.
> Hi16_ZMM state: "Outside 64-bit mode, Hi16_ZMM state is always in its
> 		 initial configuration."
> 
> All of Hi16_ZMM and *part* of ZMM_Hi256 can not be practically used in
> 32-bit mode.  But, even using part of ZMM_Hi256 means the xfeature bit
> will be set.
> 
> So, 2/3 of the features can be used in 32-bit mode.  Nothing that I can
> find _prevents_ those features from being used in 32-bit mode.
> 
> Aubrey, do you have information to the contrary?
> 
Thanks Dave for the summary, similar info I have from SDM, so yes, 32bit
kernel can have xfeature bits set.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-18 21:38         ` Andi Kleen
  2018-12-18 21:44           ` Dave Hansen
@ 2018-12-19  0:26           ` Li, Aubrey
  2018-12-19  9:45             ` Thomas Gleixner
  1 sibling, 1 reply; 14+ messages in thread
From: Li, Aubrey @ 2018-12-19  0:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Aubrey Li, mingo, peterz, hpa, tim.c.chen,
	dave.hansen, arjan, linux-kernel

On 2018/12/19 5:38, Andi Kleen wrote:
>> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
>> kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
>> includes jiffies_lock ops so not good in context switch. So I want to use raw
>> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
>> Other time source are expensive here.
>>
>> Should I limit the code only running on 64bit kernel? 
> 
> Yes making it 64bit only should be fine.
> 


> Other alternative would be to use 32bit jiffies on 32bit. I assume
> wrapping is not that big a problem here.
> 

Thomas, is this acceptable?

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks
  2018-12-19  0:26           ` Li, Aubrey
@ 2018-12-19  9:45             ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2018-12-19  9:45 UTC (permalink / raw)
  To: Li, Aubrey
  Cc: Andi Kleen, Aubrey Li, mingo, peterz, hpa, tim.c.chen,
	dave.hansen, arjan, linux-kernel

On Wed, 19 Dec 2018, Li, Aubrey wrote:
> On 2018/12/19 5:38, Andi Kleen wrote:
> >> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
> >> kernel can use AVX512, but not sure if anyone use it like this. get_jiffies_64()
> >> includes jiffies_lock ops so not good in context switch. So I want to use raw
> >> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow issue.
> >> Other time source are expensive here.
> >>
> >> Should I limit the code only running on 64bit kernel? 
> > 
> > Yes making it 64bit only should be fine.
> > 
> > Other alternative would be to use 32bit jiffies on 32bit. I assume
> > wrapping is not that big a problem here.
> > 
> Thomas, is this acceptable?

Just do the math. jiffies on 32bit wrap around depending on HZ:

 HZ=100	    248 days
 HZ=1000     24 days

So, yes it takes quite some time, but from then on the information is
bogus. Whether that matters or not is a different question. At least it
needs proper documentation.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-19  9:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  4:22 [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Aubrey Li
2018-12-18  4:22 ` [PATCH v6 2/3] proc: add AVX-512 usage elapsed time to /proc/pid/status Aubrey Li
2018-12-18  4:22 ` [PATCH v6 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
2018-12-18 14:14 ` [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks Thomas Gleixner
2018-12-18 15:11   ` Li, Aubrey
2018-12-18 15:32     ` Thomas Gleixner
2018-12-18 16:28       ` Li, Aubrey
2018-12-18 21:38         ` Andi Kleen
2018-12-18 21:44           ` Dave Hansen
2018-12-18 22:05             ` Andi Kleen
2018-12-19  0:26           ` Li, Aubrey
2018-12-19  9:45             ` Thomas Gleixner
2018-12-18 17:14       ` Dave Hansen
2018-12-18 23:23         ` Li, Aubrey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).