linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf sched: fix wrong conversion of task state
@ 2016-07-27 12:54 Tomoki Sekiyama
  2016-07-27 14:18 ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Tomoki Sekiyama @ 2016-07-27 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: ltc-kernel, masumi.moritani.ju, Tomoki Sekiyama, Jiri Olsa,
	David Ahern, Namhyung Kim, Peter Zijlstra, Masami Hiramatsu

sched_out_state() converts the prev_state u64 bitmask to a char in
a wrong way, which may cause wrong results of 'perf sched latency'.
This patch fixes the conversion.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/builtin-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0dfe8df..eb2f7f4 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -71,6 +71,7 @@ struct sched_atom {
 };
 
 #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
+#define TASK_STATE_MASK 0x7ff
 
 enum thread_state {
 	THREAD_SLEEPING = 0,
@@ -899,7 +900,7 @@ static char sched_out_state(u64 prev_state)
 {
 	const char *str = TASK_STATE_TO_CHAR_STR;
 
-	return str[prev_state];
+	return str[ffs(prev_state & TASK_STATE_MASK)];
 }
 
 static int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-27 12:54 [PATCH] perf sched: fix wrong conversion of task state Tomoki Sekiyama
@ 2016-07-27 14:18 ` David Ahern
  2016-07-27 15:58   ` 関山友輝 / SEKIYAMA,TOMOKI
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2016-07-27 14:18 UTC (permalink / raw)
  To: Tomoki Sekiyama, linux-kernel
  Cc: ltc-kernel, masumi.moritani.ju, Jiri Olsa, Namhyung Kim,
	Peter Zijlstra, Masami Hiramatsu

On 7/27/16 6:54 AM, Tomoki Sekiyama wrote:
> sched_out_state() converts the prev_state u64 bitmask to a char in
> a wrong way, which may cause wrong results of 'perf sched latency'.
> This patch fixes the conversion.
>
> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> ---
>  tools/perf/builtin-sched.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 0dfe8df..eb2f7f4 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -71,6 +71,7 @@ struct sched_atom {
>  };
>
>  #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> +#define TASK_STATE_MASK 0x7ff

The mask should not be needed and looking at top of tree there are 2 new 
states (N and n) that need to be added.

>
>  enum thread_state {
>  	THREAD_SLEEPING = 0,
> @@ -899,7 +900,7 @@ static char sched_out_state(u64 prev_state)
>  {
>  	const char *str = TASK_STATE_TO_CHAR_STR;
>
> -	return str[prev_state];
> +	return str[ffs(prev_state & TASK_STATE_MASK)];
>  }
>
>  static int
>

Handle unknown bits with '?' like the kernel does (see task_state_char 
and sched_show_task).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-27 14:18 ` David Ahern
@ 2016-07-27 15:58   ` 関山友輝 / SEKIYAMA,TOMOKI
  2016-07-27 16:03     ` 関山友輝 / SEKIYAMA,TOMOKI
  2016-07-27 16:50     ` David Ahern
  0 siblings, 2 replies; 9+ messages in thread
From: 関山友輝 / SEKIYAMA,TOMOKI @ 2016-07-27 15:58 UTC (permalink / raw)
  To: David Ahern, linux-kernel
  Cc: ltc-kernel,
	森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu

Hi David,

Thank you for the review.

On 2016/7/27 23:18, David Ahern wrote:
> On 7/27/16 6:54 AM, Tomoki Sekiyama wrote:
>> sched_out_state() converts the prev_state u64 bitmask to a char in
>> a wrong way, which may cause wrong results of 'perf sched latency'.
>> This patch fixes the conversion.
>>
>> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> Cc: David Ahern <dsahern@gmail.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> ---
>>  tools/perf/builtin-sched.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index 0dfe8df..eb2f7f4 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -71,6 +71,7 @@ struct sched_atom {
>>  };
>>
>>  #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
>> +#define TASK_STATE_MASK 0x7ff
>
>The mask should not be needed and looking at top of tree there are 2 new 
>states (N and n) that need to be added.

I couldn't find the state for 'n'. Where can I find it?
I have rechecked the code and found that __trace_sched_switch_state() uses
TASK_STATE_MAX for preempted tasks.
I will add 'N', and '+' (instead of "R+" ftrace shows) for preempted tasks.
(Also add_sched_out_event should set THREAD_WAIT_CPU for preempted tasks.)


>>
>>  enum thread_state {
>>  	THREAD_SLEEPING = 0,
>> @@ -899,7 +900,7 @@ static char sched_out_state(u64 prev_state)
>>  {
>>  	const char *str = TASK_STATE_TO_CHAR_STR;
>>
>> -	return str[prev_state];
>> +	return str[ffs(prev_state & TASK_STATE_MASK)];
>>  }
>>
>>  static int
>>
>
> Handle unknown bits with '?' like the kernel does (see task_state_char 
> and sched_show_task).

OK, I will make it return '?' for unknown states.

Will update the patch soon.

Thanks,
Tomoki Sekiyama

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-27 15:58   ` 関山友輝 / SEKIYAMA,TOMOKI
@ 2016-07-27 16:03     ` 関山友輝 / SEKIYAMA,TOMOKI
  2016-07-27 16:50     ` David Ahern
  1 sibling, 0 replies; 9+ messages in thread
From: 関山友輝 / SEKIYAMA,TOMOKI @ 2016-07-27 16:03 UTC (permalink / raw)
  To: David Ahern, linux-kernel
  Cc: 森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu, ltc-kernel

Also 'X' and 'Z' in TASK_STATE_TO_CHAR_STR need to be swapped:
  https://patchwork.kernel.org/patch/4218021/

-----Original Message-----
Hi David,

Thank you for the review.

On 2016/7/27 23:18, David Ahern wrote:
> On 7/27/16 6:54 AM, Tomoki Sekiyama wrote:
>> sched_out_state() converts the prev_state u64 bitmask to a char in
>> a wrong way, which may cause wrong results of 'perf sched latency'.
>> This patch fixes the conversion.
>>
>> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> Cc: David Ahern <dsahern@gmail.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> ---
>>  tools/perf/builtin-sched.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index 0dfe8df..eb2f7f4 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -71,6 +71,7 @@ struct sched_atom {
>>  };
>>
>>  #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
>> +#define TASK_STATE_MASK 0x7ff
>
>The mask should not be needed and looking at top of tree there are 2 new 
>states (N and n) that need to be added.

I couldn't find the state for 'n'. Where can I find it?
I have rechecked the code and found that __trace_sched_switch_state() uses
TASK_STATE_MAX for preempted tasks.
I will add 'N', and '+' (instead of "R+" ftrace shows) for preempted tasks.
(Also add_sched_out_event should set THREAD_WAIT_CPU for preempted tasks.)


>>
>>  enum thread_state {
>>  	THREAD_SLEEPING = 0,
>> @@ -899,7 +900,7 @@ static char sched_out_state(u64 prev_state)
>>  {
>>  	const char *str = TASK_STATE_TO_CHAR_STR;
>>
>> -	return str[prev_state];
>> +	return str[ffs(prev_state & TASK_STATE_MASK)];
>>  }
>>
>>  static int
>>
>
> Handle unknown bits with '?' like the kernel does (see task_state_char 
> and sched_show_task).

OK, I will make it return '?' for unknown states.

Will update the patch soon.

Thanks,
Tomoki Sekiyama

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-27 15:58   ` 関山友輝 / SEKIYAMA,TOMOKI
  2016-07-27 16:03     ` 関山友輝 / SEKIYAMA,TOMOKI
@ 2016-07-27 16:50     ` David Ahern
  2016-07-28  3:10       ` Tomoki Sekiyama
  1 sibling, 1 reply; 9+ messages in thread
From: David Ahern @ 2016-07-27 16:50 UTC (permalink / raw)
  To: 関山友輝 / SEKIYAMA,TOMOKI, linux-kernel
  Cc: ltc-kernel,
	森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu

On 7/27/16 9:58 AM, 関山友輝 / SEKIYAMA,TOMOKI wrote:
>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>> index 0dfe8df..eb2f7f4 100644
>>> --- a/tools/perf/builtin-sched.c
>>> +++ b/tools/perf/builtin-sched.c
>>> @@ -71,6 +71,7 @@ struct sched_atom {
>>>  };
>>>
>>>  #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
>>> +#define TASK_STATE_MASK 0x7ff
>>
>> The mask should not be needed and looking at top of tree there are 2 new
>> states (N and n) that need to be added.
>
> I couldn't find the state for 'n'. Where can I find it?

Check Linus' tree -- top of tree; commit 7dc603c9028ea.


> I have rechecked the code and found that __trace_sched_switch_state() uses
> TASK_STATE_MAX for preempted tasks.

since TASK_STATE_MAX is kernel version dependent perf should not rely on it.

> I will add 'N', and '+' (instead of "R+" ftrace shows) for preempted tasks.
> (Also add_sched_out_event should set THREAD_WAIT_CPU for preempted tasks.)
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-27 16:50     ` David Ahern
@ 2016-07-28  3:10       ` Tomoki Sekiyama
  2016-07-28 14:26         ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Tomoki Sekiyama @ 2016-07-28  3:10 UTC (permalink / raw)
  To: 'David Ahern', linux-kernel
  Cc: ltc-kernel,
	森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu

On 2016/7/28 1:50, David Ahern wrote:
>>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>>> index 0dfe8df..eb2f7f4 100644
>>>> --- a/tools/perf/builtin-sched.c
>>>> +++ b/tools/perf/builtin-sched.c
>>>> @@ -71,6 +71,7 @@ struct sched_atom {
>>>>  };
>>>>
>>>>  #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
>>>> +#define TASK_STATE_MASK 0x7ff
>>>
>>> The mask should not be needed and looking at top of tree there are 2 new
>>> states (N and n) that need to be added.
>>
>> I couldn't find the state for 'n'. Where can I find it?
>
> Check Linus' tree -- top of tree; commit 7dc603c9028ea.

Thanks, I was failing to git pull by some network reason..
Will add 'n' too.

>> I have rechecked the code and found that __trace_sched_switch_state()
uses
>> TASK_STATE_MAX for preempted tasks.
>
>since TASK_STATE_MAX is kernel version dependent perf should not rely on
it.

Hmm, that is true (actually 'n' is added recently).
But that means we cannot handle preemption correctly as far as
sched:sched_switch
event uses TASK_STATE_MAX to mark preempted tasks.

Should we stop using TASK_STATE_MAX for preempted tasks in ftrace and
use (1 << 63) or something that doesn't change on kernel version instead?


This causes missing some context switches information by 'perf sched
latency'.
For example, running 2 'yes' commands (inifinitely call write(2)) on the
same
cpu by following steps results in the incomplete result:

$ taskset -c 0 yes > /dev/null &
$ taskset -c 0 yes > /dev/null &
# perf sched record sleep 1
# perf sched latency -p -C 0

like:

----------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Average delay ms |
 ----------------------------------------------------------------------
  yes:14187             |    499.705 ms |       17 | avg:   12.981 ms |
  yes:14188             |    500.350 ms |       14 | avg:   12.023 ms |
  gnome-terminal-:12722 |      0.285 ms |        3 | avg:    0.025 ms |
...

where avg delay * switches for yes commands don't cover expected total delay
500 ms although one make another delayed to be executed.
(12.981 ms * 17 = 220.677 ms << 500 ms)

With the patch, perf sched latency shows:
Now 12.838 ms * 39 = 500.682 ms as expected.

----------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Average delay ms |
 ----------------------------------------------------------------------
  yes:14187             |    499.705 ms |       39 | avg:   12.838 ms |
  yes:14188             |    500.350 ms |       40 | avg:   12.506 ms |
  gnome-terminal-:12722 |      0.285 ms |        3 | avg:    0.025 ms |
...


Thanks,
Tomoki Sekiyama

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-28  3:10       ` Tomoki Sekiyama
@ 2016-07-28 14:26         ` David Ahern
  2016-07-29  2:30           ` 関山友輝 / SEKIYAMA,TOMOKI
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2016-07-28 14:26 UTC (permalink / raw)
  To: Tomoki Sekiyama, linux-kernel
  Cc: ltc-kernel,
	森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu

On 7/27/16 9:10 PM, Tomoki Sekiyama wrote:
> But that means we cannot handle preemption correctly as far as
> sched:sched_switch
> event uses TASK_STATE_MAX to mark preempted tasks.
>
> Should we stop using TASK_STATE_MAX for preempted tasks in ftrace and
> use (1 << 63) or something that doesn't change on kernel version instead?

ftrace printing is within the kernel so it has consistency.

It is the save data and analyze later use case that can not assume  
TASK_STATE_MAX means preemption since the value of TASK_STATE_MAX is  
kernel version dependent.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] perf sched: fix wrong conversion of task state
  2016-07-28 14:26         ` David Ahern
@ 2016-07-29  2:30           ` 関山友輝 / SEKIYAMA,TOMOKI
  0 siblings, 0 replies; 9+ messages in thread
From: 関山友輝 / SEKIYAMA,TOMOKI @ 2016-07-29  2:30 UTC (permalink / raw)
  To: David Ahern
  Cc: linux-kernel, ltc-kernel,
	森谷真寿美 /
	MORITANI,MASUMI, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	Masami Hiramatsu

On 2016/07/28 23:26、David Ahern wrote:
>> But that means we cannot handle preemption correctly as far as
>> sched:sched_switch
>> event uses TASK_STATE_MAX to mark preempted tasks.
>> 
>> Should we stop using TASK_STATE_MAX for preempted tasks in ftrace and
>> use (1 << 63) or something that doesn't change on kernel version instead?
> 
> ftrace printing is within the kernel so it has consistency.
Agreed.

> It is the save data and analyze later use case that can not assume TASK_STATE_MAX means preemption since the value of TASK_STATE_MAX is kernel version dependent.

That means we cannot measure delay by preemption using perf sched latency. :(
It might be helpful if the preemption mark becomes kernel-version-independent at least for future kernel version..

Anyway, this time I will post the patch to fix only invalid conversion issue of task state with this patch.

Thanks,
Tomoki Sekiyama

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] perf sched: fix wrong conversion of task state
@ 2016-07-27 12:52 Tomoki Sekiyama
  0 siblings, 0 replies; 9+ messages in thread
From: Tomoki Sekiyama @ 2016-07-27 12:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: ltc-kernel, Tomoki Sekiyama, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Masami Hiramatsu

sched_out_state() converts the prev_state u64 bitmask to a char in
a wrong way, which may cause wrong results of 'perf sched latency'.
This patch fixes the conversion.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/builtin-sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0dfe8df..eb2f7f4 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -71,6 +71,7 @@ struct sched_atom {
 };
 
 #define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
+#define TASK_STATE_MASK 0x7ff
 
 enum thread_state {
 	THREAD_SLEEPING = 0,
@@ -899,7 +900,7 @@ static char sched_out_state(u64 prev_state)
 {
 	const char *str = TASK_STATE_TO_CHAR_STR;
 
-	return str[prev_state];
+	return str[ffs(prev_state & TASK_STATE_MASK)];
 }
 
 static int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-29  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27 12:54 [PATCH] perf sched: fix wrong conversion of task state Tomoki Sekiyama
2016-07-27 14:18 ` David Ahern
2016-07-27 15:58   ` 関山友輝 / SEKIYAMA,TOMOKI
2016-07-27 16:03     ` 関山友輝 / SEKIYAMA,TOMOKI
2016-07-27 16:50     ` David Ahern
2016-07-28  3:10       ` Tomoki Sekiyama
2016-07-28 14:26         ` David Ahern
2016-07-29  2:30           ` 関山友輝 / SEKIYAMA,TOMOKI
  -- strict thread matches above, loose matches on Subject: below --
2016-07-27 12:52 Tomoki Sekiyama

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).