linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space
@ 2018-03-22 16:08 Alexey Budankov
  2018-03-22 16:11 ` [PATCH v2 1/3] perf/core: store context switch out type into Perf trace Alexey Budankov
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-03-22 16:08 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen


Implementation of exposing context-switch-out type event as a part 
of PERF_RECORD_SWITCH[_CPU_WIDE] record.

Introduced types of events assumed to be:
a) preempt: when task->state == TASK_RUNNING
b) yield: !preempt, encoding is done using new bit 
   PERF_RECORD_MISC_SWITCH_OUT_YIELD like this:
   
   event_header->misc &= 
	PERF_RECORD_MISC_SWITCH_OUT|PERF_RECORD_MISC_SWITCH_OUT_YIELD

Perf tool report and script commands output has been extended to decode 
new yield bit and the updated output looks like in the examples below.

The documentation has been updated to mention yield switch out events 
and its decoding symbols in perf script output.

The changes have been manually tested on Fedora 27 with the patched kernel:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core

perf report -D -i system-wide.perf | grep _SWITCH: 

1 113807080924003 0x26db26 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid: 20495/20495
1 113807080925644 0x26db4e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
1 113807080937266 0x26db76 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:    16/16   
1 113807080938445 0x26db9e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 20495/20495
1 113807080945455 0x26dbc6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid: 20495/20495

perf script --show-switch-events -F +misc -I -i system-wide.perf:

  rcu_sched     8 [003]       113800.748548: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     0/0    
       perf 31479 [000] S     113800.748548: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid:    59/59   
kworker/0:1    59 [000]       113800.748549: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
  rcu_sched     8 [003] Sy    113800.748551: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:     0/0    
    swapper     0 [003]       113800.748551: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     8/8

---
 Alexey Budankov (3):
	perf/core: store context switch out type into Perf trace
	perf report: extend raw dump (-D) out with switch out event type
	perf script: extend misc field decoding with switch out event type

 include/uapi/linux/perf_event.h          |  5 +++++
 kernel/events/core.c                     |  4 +++-
 tools/include/uapi/linux/perf_event.h    |  5 +++++
 tools/perf/Documentation/perf-script.txt | 17 +++++++++--------
 tools/perf/builtin-script.c              |  5 ++++-
 tools/perf/util/event.c                  |  4 +++-
 6 files changed, 29 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] perf/core: store context switch out type into Perf trace
  2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
@ 2018-03-22 16:11 ` Alexey Budankov
  2018-03-22 16:13 ` [PATCH v1 2/3] perf report: extend raw dump (-D) out with switch out event type Alexey Budankov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-03-22 16:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen


Store thread context-switch-out event type into Perf trace as a part of 
PERF_RECORD_SWITCH[_CPU_WIDE] records.

Introduced types of switch-out events assumed to be 
a) preempt: task->state == TASK_RUNNING and b) yield: !preempt;

New yield event type is encoded using special 
PERF_RECORD_MISC_SWITCH_OUT_YIELD bit extending PERF_RECORD_MISC_SWITCH_OUT 
meaning traditional preemption switch out event:

    misc &= PERF_RECORD_MISC_SWITCH_OUT | PERF_RECORD_MISC_SWITCH_OUT_YIELD
	
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 include/uapi/linux/perf_event.h       | 5 +++++
 kernel/events/core.c                  | 4 +++-
 tools/include/uapi/linux/perf_event.h | 5 +++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 912b85b52344..f5dd823d0ff1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -655,6 +655,11 @@ struct perf_event_mmap_page {
  * perf_event_attr::precise_ip.
  */
 #define PERF_RECORD_MISC_EXACT_IP		(1 << 14)
+/*
+ * Indicates that thread explicitly yielded cpu due to
+ * a call of some synchronization API e.g. futex system call
+ */
+#define PERF_RECORD_MISC_SWITCH_OUT_YIELD	(1 << 14)
 /*
  * Reserve the last bit to indicate some extended misc field
  */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 74a6e8f12a3c..f15af15af474 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7556,6 +7556,8 @@ static void perf_event_switch(struct task_struct *task,
 			      struct task_struct *next_prev, bool sched_in)
 {
 	struct perf_switch_event switch_event;
+	__u16 switch_type = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT |
+		(task->state == TASK_RUNNING ? 0 : PERF_RECORD_MISC_SWITCH_OUT_YIELD);
 
 	/* N.B. caller checks nr_switch_events != 0 */
 
@@ -7565,7 +7567,7 @@ static void perf_event_switch(struct task_struct *task,
 		.event_id	= {
 			.header = {
 				/* .type */
-				.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
+				.misc = switch_type,
 				/* .size */
 			},
 			/* .next_prev_pid */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 912b85b52344..f5dd823d0ff1 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -655,6 +655,11 @@ struct perf_event_mmap_page {
  * perf_event_attr::precise_ip.
  */
 #define PERF_RECORD_MISC_EXACT_IP		(1 << 14)
+/*
+ * Indicates that thread explicitly yielded cpu due to
+ * a call of some synchronization API e.g. futex system call
+ */
+#define PERF_RECORD_MISC_SWITCH_OUT_YIELD	(1 << 14)
 /*
  * Reserve the last bit to indicate some extended misc field
  */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 2/3] perf report: extend raw dump (-D) out with switch out event type
  2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
  2018-03-22 16:11 ` [PATCH v2 1/3] perf/core: store context switch out type into Perf trace Alexey Budankov
@ 2018-03-22 16:13 ` Alexey Budankov
  2018-03-22 16:16 ` [PATCH v2 3/3] perf script: extend misc field decoding " Alexey Budankov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-03-22 16:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen


Print additional 'yield' tag for PERF_RECORD_SWITCH[_CPU_WIDE] OUT records when
event header misc field contains PERF_RECORD_MISC_SWITCH_OUT_YIELD bit set 
designating synchronization context switch out event:

perf report -D -i system-wide.perf | grep _SWITCH: 

1 113807080924003 0x26db26 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid: 20495/20495
1 113807080925644 0x26db4e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
1 113807080937266 0x26db76 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:    16/16   
1 113807080938445 0x26db9e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 20495/20495
1 113807080945455 0x26dbc6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid: 20495/20495

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/event.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index f0a6cbd033cc..324f44f02e66 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1421,7 +1421,9 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp)
 size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp)
 {
 	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
-	const char *in_out = out ? "OUT" : "IN ";
+	const char *in_out = !out ? "IN       " :
+		!(event->header.misc & PERF_RECORD_MISC_SWITCH_OUT_YIELD) ?
+				"OUT      " : "OUT yield";
 
 	if (event->header.type == PERF_RECORD_SWITCH)
 		return fprintf(fp, " %s\n", in_out);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] perf script: extend misc field decoding with switch out event type
  2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
  2018-03-22 16:11 ` [PATCH v2 1/3] perf/core: store context switch out type into Perf trace Alexey Budankov
  2018-03-22 16:13 ` [PATCH v1 2/3] perf report: extend raw dump (-D) out with switch out event type Alexey Budankov
@ 2018-03-22 16:16 ` Alexey Budankov
  2018-03-23  9:24 ` [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Jiri Olsa
  2018-03-23 18:05 ` Peter Zijlstra
  4 siblings, 0 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-03-22 16:16 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen


Append 'y' sign to 'S' tag designating the type of context switch out event so 
'S' means preemption context switch and 'Sy' means synchronization context 
switch. Documentation is extended to cover new presentation changes.

perf script --show-switch-events -F +misc -I -i system-wide.perf:

  rcu_sched     8 [003]       113800.748548: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     0/0    
       perf 31479 [000] S     113800.748548: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid:    59/59   
kworker/0:1    59 [000]       113800.748549: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
  rcu_sched     8 [003] Sy    113800.748551: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:     0/0    
    swapper     0 [003]       113800.748551: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     8/8

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt | 17 +++++++++--------
 tools/perf/builtin-script.c              |  5 ++++-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 36ec0257f8d3..80e94cbbf520 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -228,14 +228,15 @@ OPTIONS
 	For sample events it's possible to display misc field with -F +misc option,
 	following letters are displayed for each bit:
 
-	  PERF_RECORD_MISC_KERNEL        K
-	  PERF_RECORD_MISC_USER          U
-	  PERF_RECORD_MISC_HYPERVISOR    H
-	  PERF_RECORD_MISC_GUEST_KERNEL  G
-	  PERF_RECORD_MISC_GUEST_USER    g
-	  PERF_RECORD_MISC_MMAP_DATA*    M
-	  PERF_RECORD_MISC_COMM_EXEC     E
-	  PERF_RECORD_MISC_SWITCH_OUT    S
+	  PERF_RECORD_MISC_KERNEL               K
+	  PERF_RECORD_MISC_USER                 U
+	  PERF_RECORD_MISC_HYPERVISOR           H
+	  PERF_RECORD_MISC_GUEST_KERNEL         G
+	  PERF_RECORD_MISC_GUEST_USER           g
+	  PERF_RECORD_MISC_MMAP_DATA*           M
+	  PERF_RECORD_MISC_COMM_EXEC            E
+	  PERF_RECORD_MISC_SWITCH_OUT           S
+	  PERF_RECORD_MISC_SWITCH_OUT_YIELD     Sy
 
 	  $ perf script -F +misc ...
 	   sched-messaging  1414 K     28690.636582:       4590 cycles ...
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 313c42423393..c0a3a7297c8a 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -657,8 +657,11 @@ static int perf_sample__fprintf_start(struct perf_sample *sample,
 			break;
 		case PERF_RECORD_SWITCH:
 		case PERF_RECORD_SWITCH_CPU_WIDE:
-			if (has(SWITCH_OUT))
+			if (has(SWITCH_OUT)) {
 				ret += fprintf(fp, "S");
+				if (sample->misc & PERF_RECORD_MISC_SWITCH_OUT_YIELD)
+					ret += fprintf(fp, "y");
+			}
 		default:
 			break;
 		}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space
  2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
                   ` (2 preceding siblings ...)
  2018-03-22 16:16 ` [PATCH v2 3/3] perf script: extend misc field decoding " Alexey Budankov
@ 2018-03-23  9:24 ` Jiri Olsa
  2018-03-23 18:05 ` Peter Zijlstra
  4 siblings, 0 replies; 7+ messages in thread
From: Jiri Olsa @ 2018-03-23  9:24 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, linux-kernel, Andi Kleen

On Thu, Mar 22, 2018 at 07:08:25PM +0300, Alexey Budankov wrote:
> 
> Implementation of exposing context-switch-out type event as a part 
> of PERF_RECORD_SWITCH[_CPU_WIDE] record.
> 
> Introduced types of events assumed to be:
> a) preempt: when task->state == TASK_RUNNING
> b) yield: !preempt, encoding is done using new bit 
>    PERF_RECORD_MISC_SWITCH_OUT_YIELD like this:
>    
>    event_header->misc &= 
> 	PERF_RECORD_MISC_SWITCH_OUT|PERF_RECORD_MISC_SWITCH_OUT_YIELD
> 
> Perf tool report and script commands output has been extended to decode 
> new yield bit and the updated output looks like in the examples below.
> 
> The documentation has been updated to mention yield switch out events 
> and its decoding symbols in perf script output.
> 
> The changes have been manually tested on Fedora 27 with the patched kernel:
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> 
> perf report -D -i system-wide.perf | grep _SWITCH: 
> 
> 1 113807080924003 0x26db26 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid: 20495/20495
> 1 113807080925644 0x26db4e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
> 1 113807080937266 0x26db76 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:    16/16   
> 1 113807080938445 0x26db9e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 20495/20495
> 1 113807080945455 0x26dbc6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid: 20495/20495
> 
> perf script --show-switch-events -F +misc -I -i system-wide.perf:
> 
>   rcu_sched     8 [003]       113800.748548: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     0/0    
>        perf 31479 [000] S     113800.748548: PERF_RECORD_SWITCH_CPU_WIDE OUT        next pid/tid:    59/59   
> kworker/0:1    59 [000]       113800.748549: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid: 31479/31479
>   rcu_sched     8 [003] Sy    113800.748551: PERF_RECORD_SWITCH_CPU_WIDE OUT yield  next pid/tid:     0/0    
>     swapper     0 [003]       113800.748551: PERF_RECORD_SWITCH_CPU_WIDE IN         prev pid/tid:     8/8

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
> ---
>  Alexey Budankov (3):
> 	perf/core: store context switch out type into Perf trace
> 	perf report: extend raw dump (-D) out with switch out event type
> 	perf script: extend misc field decoding with switch out event type
> 
>  include/uapi/linux/perf_event.h          |  5 +++++
>  kernel/events/core.c                     |  4 +++-
>  tools/include/uapi/linux/perf_event.h    |  5 +++++
>  tools/perf/Documentation/perf-script.txt | 17 +++++++++--------
>  tools/perf/builtin-script.c              |  5 ++++-
>  tools/perf/util/event.c                  |  4 +++-
>  6 files changed, 29 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space
  2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
                   ` (3 preceding siblings ...)
  2018-03-23  9:24 ` [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Jiri Olsa
@ 2018-03-23 18:05 ` Peter Zijlstra
  2018-03-23 20:38   ` Alexey Budankov
  4 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2018-03-23 18:05 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen

On Thu, Mar 22, 2018 at 07:08:25PM +0300, Alexey Budankov wrote:
> 
> Implementation of exposing context-switch-out type event as a part 
> of PERF_RECORD_SWITCH[_CPU_WIDE] record.
> 
> Introduced types of events assumed to be:
> a) preempt: when task->state == TASK_RUNNING
> b) yield: !preempt, encoding is done using new bit 
>    PERF_RECORD_MISC_SWITCH_OUT_YIELD like this:

A !preempt context switch isn't nessecarily a yield; please don't use
that name, it means something quite specific and this isn't it.

Specifically, on Linux yield() doesn't actually change task->state, so
when task->state is set !0 it _cannot_ have been yield.

I would invert the thing and call the preempt one SWITCH_OUT_PREEMPT.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space
  2018-03-23 18:05 ` Peter Zijlstra
@ 2018-03-23 20:38   ` Alexey Budankov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-03-23 20:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-kernel, Andi Kleen

On 23.03.2018 21:05, Peter Zijlstra wrote:
> On Thu, Mar 22, 2018 at 07:08:25PM +0300, Alexey Budankov wrote:
>>
>> Implementation of exposing context-switch-out type event as a part 
>> of PERF_RECORD_SWITCH[_CPU_WIDE] record.
>>
>> Introduced types of events assumed to be:
>> a) preempt: when task->state == TASK_RUNNING
>> b) yield: !preempt, encoding is done using new bit 
>>    PERF_RECORD_MISC_SWITCH_OUT_YIELD like this:
> 
> A !preempt context switch isn't nessecarily a yield; please don't use
> that name, it means something quite specific and this isn't it.
> 
> Specifically, on Linux yield() doesn't actually change task->state, so
> when task->state is set !0 it _cannot_ have been yield.
> 
> I would invert the thing and call the preempt one SWITCH_OUT_PREEMPT.
> 

Make sense. This way it names the thing exactly what it is. 
Let me take care of that.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-23 20:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 16:08 [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Alexey Budankov
2018-03-22 16:11 ` [PATCH v2 1/3] perf/core: store context switch out type into Perf trace Alexey Budankov
2018-03-22 16:13 ` [PATCH v1 2/3] perf report: extend raw dump (-D) out with switch out event type Alexey Budankov
2018-03-22 16:16 ` [PATCH v2 3/3] perf script: extend misc field decoding " Alexey Budankov
2018-03-23  9:24 ` [PATCH v2 0/3] perf/core: expose thread context switch out event type to user space Jiri Olsa
2018-03-23 18:05 ` Peter Zijlstra
2018-03-23 20:38   ` Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).