All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
@ 2019-04-16  6:32 Aubrey Li
  2019-04-16  6:32 ` [PATCH v15 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time Aubrey Li
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Aubrey Li @ 2019-04-16  6:32 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, adobriyan, akpm, aubrey.li,
	linux-api, linux-kernel, Aubrey Li

The architecture specific information of the running processes could
be useful to the userland. Add support to examine process architecture
specific information externally.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 fs/proc/array.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 2edbb657f859..87bc7e882d35 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -96,6 +96,11 @@
 #include <asm/processor.h>
 #include "internal.h"
 
+/* Add support for architecture specific output in /proc/pid/status */
+#ifndef	arch_proc_pid_status
+#define	arch_proc_pid_status(m, task)
+#endif
+
 void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)
 {
 	char *buf;
@@ -424,6 +429,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 	task_cpus_allowed(m, task);
 	cpuset_task_status_allowed(m, task);
 	task_context_switch_counts(m, task);
+	arch_proc_pid_status(m, task);
 	return 0;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v15 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time
  2019-04-16  6:32 [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Aubrey Li
@ 2019-04-16  6:32 ` Aubrey Li
  2019-04-16  6:32 ` [PATCH v15 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
  2019-04-16 23:01 ` [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Andrew Morton
  2 siblings, 0 replies; 8+ messages in thread
From: Aubrey Li @ 2019-04-16  6:32 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, adobriyan, akpm, aubrey.li,
	linux-api, linux-kernel, Aubrey Li

AVX-512 components use could cause core turbo frequency drop. So
it's useful to expose AVX-512 usage elapsed time as a heuristic hint
for the user space job scheduler to cluster the AVX-512 using tasks
together.

Tensorflow example:
$ while [ 1 ]; do cat /proc/tid/status | grep AVX; sleep 1; done
AVX512_elapsed_ms:      4
AVX512_elapsed_ms:      8
AVX512_elapsed_ms:      4

This means that 4 milliseconds have elapsed since the AVX512 usage
of tensorflow task was detected when the task was scheduled out.

Or:
$ cat /proc/tid/status | grep AVX512_elapsed_ms
AVX512_elapsed_ms:      -1

The number '-1' indicates that no AVX512 usage recorded before
thus the task unlikely has frequency drop issue.

User space tools may want to further check by:

$ perf stat --pid <pid> -e core_power.lvl2_turbo_license -- sleep 1

 Performance counter stats for process id '3558':

     3,251,565,961      core_power.lvl2_turbo_license

       1.004031387 seconds time elapsed

Non-zero counter value confirms that the task causes frequency drop.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 arch/x86/include/asm/processor.h |  4 +++
 arch/x86/kernel/fpu/xstate.c     | 42 ++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2bb3a648fc12..5a7271ab78d8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -991,4 +991,8 @@ enum l1tf_mitigations {
 
 extern enum l1tf_mitigations l1tf_mitigation;
 
+/* Add support for architecture specific output in /proc/pid/status */
+void arch_proc_pid_status(struct seq_file *m, struct task_struct *task);
+#define arch_proc_pid_status arch_proc_pid_status
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d7432c2b1051..5e55ed9584ab 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -7,6 +7,8 @@
 #include <linux/cpu.h>
 #include <linux/mman.h>
 #include <linux/pkeys.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
 
 #include <asm/fpu/api.h>
 #include <asm/fpu/internal.h>
@@ -1243,3 +1245,43 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 
 	return 0;
 }
+
+/*
+ * Report the amount of time elapsed in millisecond since last AVX512
+ * use in the task.
+ */
+static void avx512_status(struct seq_file *m, struct task_struct *task)
+{
+	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+	long delta;
+
+	if (!timestamp) {
+		/*
+		 * Report -1 if no AVX512 usage
+		 */
+		delta = -1;
+	} else {
+		delta = (long)(jiffies - timestamp);
+		/*
+		 * Cap to LONG_MAX if time difference > LONG_MAX
+		 */
+		if (delta < 0)
+			delta = LONG_MAX;
+		delta = jiffies_to_msecs(delta);
+	}
+
+	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
+	seq_putc(m, '\n');
+}
+
+/*
+ * Report architecture specific information
+ */
+void arch_proc_pid_status(struct seq_file *m, struct task_struct *task)
+{
+	/*
+	 * Report AVX512 state if the processor and build option supported.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
+		avx512_status(m, task);
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v15 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms
  2019-04-16  6:32 [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Aubrey Li
  2019-04-16  6:32 ` [PATCH v15 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time Aubrey Li
@ 2019-04-16  6:32 ` Aubrey Li
  2019-04-16 23:01 ` [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Andrew Morton
  2 siblings, 0 replies; 8+ messages in thread
From: Aubrey Li @ 2019-04-16  6:32 UTC (permalink / raw)
  To: tglx, mingo, peterz, hpa
  Cc: ak, tim.c.chen, dave.hansen, arjan, adobriyan, akpm, aubrey.li,
	linux-api, linux-kernel, Aubrey Li

Added AVX512_elapsed_ms in /proc/<pid>/status. Report it
in Documentation/filesystems/proc.txt

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/filesystems/proc.txt | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 66cad5c86171..c4a9e48681ad 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -207,6 +207,7 @@ read the file /proc/PID/status:
   Speculation_Store_Bypass:       thread vulnerable
   voluntary_ctxt_switches:        0
   nonvoluntary_ctxt_switches:     1
+  AVX512_elapsed_ms:	8
 
 This shows you nearly the same information you would get if you viewed it with
 the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
@@ -224,7 +225,7 @@ asynchronous manner and the value may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
 
-Table 1-2: Contents of the status files (as of 4.19)
+Table 1-2: Contents of the status files (as of 5.1)
 ..............................................................................
  Field                       Content
  Name                        filename of the executable
@@ -289,6 +290,32 @@ Table 1-2: Contents of the status files (as of 4.19)
  Mems_allowed_list           Same as previous, but in "list format"
  voluntary_ctxt_switches     number of voluntary context switches
  nonvoluntary_ctxt_switches  number of non voluntary context switches
+ AVX512_elapsed_ms           time elapsed since last AVX512 usage recorded
+
+ AVX512_elapsed_ms:
+ ------------------
+  If AVX512 is supported on the machine, this entry shows the milliseconds
+  elapsed since the last time AVX512 usage was recorded. The recording
+  happens on a best effort basis when a task is scheduled out. This means
+  that the value depends on two factors:
+
+    1) The time which the task spent on the CPU without being scheduled
+       out. With CPU isolation and a single runnable task this can take
+       several seconds.
+
+    2) The time since the task was scheduled out last. Depending on the
+       reason for being scheduled out (time slice exhausted, syscall ...)
+       this can be arbitrary long time.
+
+  As a consequence the value cannot be considered precise and authoritative
+  information. The application which uses this information has to be aware
+  of the overall scenario on the system in order to determine whether a
+  task is a real AVX512 user or not.
+
+  A special value of '-1' indicates that no AVX512 usage was recorded, thus
+  the task is unlikely an AVX512 user, but depends on the workload and the
+  scheduling scenario, it also could be a false negative mentioned above.
+
 ..............................................................................
 
 Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
  2019-04-16  6:32 [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Aubrey Li
  2019-04-16  6:32 ` [PATCH v15 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time Aubrey Li
  2019-04-16  6:32 ` [PATCH v15 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
@ 2019-04-16 23:01 ` Andrew Morton
  2019-04-17 10:07   ` Li, Aubrey
  2019-04-17 19:45   ` Andy Lutomirski
  2 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2019-04-16 23:01 UTC (permalink / raw)
  To: Aubrey Li
  Cc: tglx, mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan,
	adobriyan, aubrey.li, linux-api, linux-kernel

On Tue, 16 Apr 2019 14:32:48 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:

> The architecture specific information of the running processes could
> be useful to the userland. Add support to examine process architecture
> specific information externally.

The implementation looks just fine to me.  Have you had any feedback on
the overall desirability of adding this feature?

> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -96,6 +96,11 @@
>  #include <asm/processor.h>
>  #include "internal.h"
>  
> +/* Add support for architecture specific output in /proc/pid/status */
> +#ifndef	arch_proc_pid_status
> +#define	arch_proc_pid_status(m, task)
> +#endif

To this I suggest adding

/* arch_proc_pid_status() must be defined in asm/processor.h */

Because we've regularly had different architectures defining such things
in different headers, resulting in a mess.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
  2019-04-16 23:01 ` [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Andrew Morton
@ 2019-04-17 10:07   ` Li, Aubrey
  2019-04-17 19:45   ` Andy Lutomirski
  1 sibling, 0 replies; 8+ messages in thread
From: Li, Aubrey @ 2019-04-17 10:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tglx, mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan,
	adobriyan, aubrey.li, linux-api, linux-kernel

On 2019/4/17 7:01, Andrew Morton wrote:
> On Tue, 16 Apr 2019 14:32:48 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:
> 
>> The architecture specific information of the running processes could
>> be useful to the userland. Add support to examine process architecture
>> specific information externally.
> 
> The implementation looks just fine to me.  Have you had any feedback on
> the overall desirability of adding this feature?
> 
This is orientated by the customer's complain. When the customer colocated
their latency critical tasks with AVX512 tasks, the latency critical tasks
were affected due to core frequency slowing down by AVX512 use. So we propose
this interface for the user space tools to identify AVX512 using task and
apply user space dispatching policy. We may have subsequent effort based on
this proposal.

>> --- a/fs/proc/array.c
>> +++ b/fs/proc/array.c
>> @@ -96,6 +96,11 @@
>>  #include <asm/processor.h>
>>  #include "internal.h"
>>  
>> +/* Add support for architecture specific output in /proc/pid/status */
>> +#ifndef	arch_proc_pid_status
>> +#define	arch_proc_pid_status(m, task)
>> +#endif
> 
> To this I suggest adding
> 
> /* arch_proc_pid_status() must be defined in asm/processor.h */
> 
> Because we've regularly had different architectures defining such things
> in different headers, resulting in a mess.

Thanks, I'll add it in the next version.
-Aubrey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
  2019-04-16 23:01 ` [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Andrew Morton
  2019-04-17 10:07   ` Li, Aubrey
@ 2019-04-17 19:45   ` Andy Lutomirski
  2019-04-18 13:00     ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2019-04-17 19:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aubrey Li, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Andi Kleen, Tim Chen, Dave Hansen,
	Arjan van de Ven, Alexey Dobriyan, aubrey.li, Linux API, LKML

On Tue, Apr 16, 2019 at 4:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 16 Apr 2019 14:32:48 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:
>
> > The architecture specific information of the running processes could
> > be useful to the userland. Add support to examine process architecture
> > specific information externally.
>
> The implementation looks just fine to me.  Have you had any feedback on
> the overall desirability of adding this feature?

I think I've been the most outspoken, and my not-all-that-strong
opinion is that I don't really like it.  /proc/PID/status is already a
bit of a mess, and I don't think we really want it to be a mess that
is different on different architectures.  Hence my suggestion of
/proc/PID/x86_status instead.  Or we could do /proc/PID/arch_status, I
suppose, and make sure that everything in it is namespaced.  (We could
easily end up with a situation where status fields from more than one
architecture are present.  Think i386 + x86_64.  Thata's also the case
where qemu userspace emulation is used to run binaries meant for one
architecture on a different architecture's kernel.)

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
  2019-04-17 19:45   ` Andy Lutomirski
@ 2019-04-18 13:00     ` Thomas Gleixner
  2019-04-19 11:42       ` Li, Aubrey
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2019-04-18 13:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Morton, Aubrey Li, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Andi Kleen, Tim Chen, Dave Hansen,
	Arjan van de Ven, Alexey Dobriyan, aubrey.li, Linux API, LKML

On Wed, 17 Apr 2019, Andy Lutomirski wrote:

> On Tue, Apr 16, 2019 at 4:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 16 Apr 2019 14:32:48 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:
> >
> > > The architecture specific information of the running processes could
> > > be useful to the userland. Add support to examine process architecture
> > > specific information externally.
> >
> > The implementation looks just fine to me.  Have you had any feedback on
> > the overall desirability of adding this feature?
> 
> I think I've been the most outspoken, and my not-all-that-strong
> opinion is that I don't really like it.  /proc/PID/status is already a
> bit of a mess, and I don't think we really want it to be a mess that
> is different on different architectures.  Hence my suggestion of
> /proc/PID/x86_status instead.  Or we could do /proc/PID/arch_status, I

arch_status looks like the right thing to do.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output
  2019-04-18 13:00     ` Thomas Gleixner
@ 2019-04-19 11:42       ` Li, Aubrey
  0 siblings, 0 replies; 8+ messages in thread
From: Li, Aubrey @ 2019-04-19 11:42 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Andi Kleen, Tim Chen, Dave Hansen, Arjan van de Ven,
	Alexey Dobriyan, aubrey.li, Linux API, LKML

On 2019/4/18 21:00, Thomas Gleixner wrote:
> On Wed, 17 Apr 2019, Andy Lutomirski wrote:
> 
>> On Tue, Apr 16, 2019 at 4:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>
>>> On Tue, 16 Apr 2019 14:32:48 +0800 Aubrey Li <aubrey.li@linux.intel.com> wrote:
>>>
>>>> The architecture specific information of the running processes could
>>>> be useful to the userland. Add support to examine process architecture
>>>> specific information externally.
>>>
>>> The implementation looks just fine to me.  Have you had any feedback on
>>> the overall desirability of adding this feature?
>>
>> I think I've been the most outspoken, and my not-all-that-strong
>> opinion is that I don't really like it.  /proc/PID/status is already a
>> bit of a mess, and I don't think we really want it to be a mess that
>> is different on different architectures.  Hence my suggestion of
>> /proc/PID/x86_status instead.  Or we could do /proc/PID/arch_status, I
> 
> arch_status looks like the right thing to do.

Thanks Andy and Thomas, let me change the patch to use arch_status instead.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-19 19:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16  6:32 [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Aubrey Li
2019-04-16  6:32 ` [PATCH v15 2/3] x86,/proc/pid/status: Add AVX-512 usage elapsed time Aubrey Li
2019-04-16  6:32 ` [PATCH v15 3/3] Documentation/filesystems/proc.txt: add AVX512_elapsed_ms Aubrey Li
2019-04-16 23:01 ` [PATCH v15 1/3] /proc/pid/status: Add support for architecture specific output Andrew Morton
2019-04-17 10:07   ` Li, Aubrey
2019-04-17 19:45   ` Andy Lutomirski
2019-04-18 13:00     ` Thomas Gleixner
2019-04-19 11:42       ` Li, Aubrey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.