[PATCH 0/9] perf sched replay: Make some improvements and fixes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/9] perf sched replay: Make some improvements and fixes
@ 2015-03-31 13:46 Yunlong Song
  2015-03-31 13:46 ` [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Yunlong Song
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Hi,
  Found some functions to improve and bugs to fix in perf sched replay.

Yunlong Song (9):
  perf sched replay: Use struct task_desc instead of struct task_task
    for     correct meaning
  perf sched replay: Increase the MAX_PID value to fix assertion failure
        problem
  perf sched replay: Alloc the memory of pid_to_task dynamically to
    adapt     to the unexpected change of pid_max
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt
        to the different pid_max configurations
  perf sched replay: Fix the segmentation fault problem caused by pr_err
        in threads
  perf sched replay: Handle the dead halt of sem_wait when
    create_tasks()     fails for any task
  perf sched replay: Fix the EMFILE error caused by the limitation of
    the     maximum open files
  perf sched replay: Support using -f to override perf.data file
    ownership
  perf sched replay: Use replay_repeat to calculate the runavg of cpu   
     usage instead of the default value 10

 tools/perf/builtin-sched.c | 67 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 15 deletions(-)

-- 
1.8.5.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Yunlong Song
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

There is no struct task_task at all, thus it is a typo error in the
old commits, now fix it to what it should be in order to avoid
unnecessary misunderstanding.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3b3a5bb..a1893e8 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -346,7 +346,7 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 
 	sched->pid_to_task[pid] = task;
 	sched->nr_tasks++;
-	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
+	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_desc *));
 	BUG_ON(!sched->tasks);
 	sched->tasks[task->nr] = task;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
  2015-03-31 13:46 ` [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-03-31 14:25   ` David Ahern
  2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Yunlong Song
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Current MAX_PID is only 65536, which will cause assertion failure problem
when CPU cores are more than 64 in x86_64. This is because the pid_max
value in x86_64 is at least PIDS_PER_CPU_DEFAULT * num_possible_cpus()
(see function pidmap_init defined in kernel/pid.c), where
PIDS_PER_CPU_DEFAULT is 1024 (defined in include/linux/threads.h). Thus
for MAX_PID = 65536, the correspoinding CPU cores are 65536/1024=64.
This is obviously not enough at all for x86_64, and will cause an
assertion failure problem due to BUG_ON(pid >= MAX_PID) in the codes.

We increase MAX_PID value from 65536 to 1024*1000, which can be used in
x86_64 with 1000 cores. This number is finally decided according to the
limitation of stack size of calling process. Use 'ulimit -a', the result
shows the stack size of any process is 8192 Kbytes, which is defined in
include/uapi/linux/resource.h (#define _STK_LIM (8*1024*1024)). Thus we
choose a large enough value for MAX_PID, and make it satisfy to the
limitation of the stack size, i.e., making the perf process take up a
memory space just smaller than 8192 Kbytes. We have calculated and
tested that 1024*1000 is OK for MAX_PID. This means perf sched replay
can now be used with at most 1000 cores in x86_64 without any assertion
failure problem.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840

Before this patch:

 $ perf sched replay
 run measurement overhead: 240 nsecs
 sleep measurement overhead: 55379 nsecs
 the run test took 1000004 nsecs
 the sleep test took 1059424 nsecs
 perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
 failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55397 nsecs
 the run test took 999920 nsecs
 the sleep test took 1053313 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index a1893e8..c466104 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -28,7 +28,7 @@
 #define MAX_CPUS		4096
 #define COMM_LEN		20
 #define SYM_LEN			129
-#define MAX_PID			65536
+#define MAX_PID			1024000
 
 struct sched_atom;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
  2015-03-31 13:46 ` [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Yunlong Song
  2015-03-31 13:46 ` [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-03-31 14:32   ` David Ahern
  2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Yunlong Song
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:

Problem 1: If the pid_max, which is the max number of pids in the system,
is much smaller than MAX_PID (1024*1000), then it causes a waste of stack
memory. This may happen in the case where the number of cpu cores is
much smaller than 1000.

Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling process.

Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ echo 1025000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 1025000

Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55480 nsecs
 the run test took 1000008 nsecs
 the sleep test took 1063151 nsecs
 perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
 failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55435 nsecs
 the run test took 1000004 nsecs
 the sleep test took 1059312 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c466104..20d887b 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -23,6 +23,7 @@
 #include <semaphore.h>
 #include <pthread.h>
 #include <math.h>
+#include <api/fs/fs.h>
 
 #define PR_SET_NAME		15               /* Set process name */
 #define MAX_CPUS		4096
@@ -124,7 +125,7 @@ struct perf_sched {
 	struct perf_tool tool;
 	const char	 *sort_order;
 	unsigned long	 nr_tasks;
-	struct task_desc *pid_to_task[MAX_PID];
+	struct task_desc **pid_to_task;
 	struct task_desc **tasks;
 	const struct trace_sched_handler *tp_handler;
 	pthread_mutex_t	 start_work_mutex;
@@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 				      unsigned long pid, const char *comm)
 {
 	struct task_desc *task;
+	static int pid_max;
 
-	BUG_ON(pid >= MAX_PID);
+	if (sched->pid_to_task == NULL) {
+		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
+			pid_max = MAX_PID;
+		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
+	}
+	BUG_ON(pid >= (unsigned long)pid_max);
 
 	task = sched->pid_to_task[pid];
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (2 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 5/9] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Yunlong Song
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Although the memory of pid_to_task can be allocated via calloc according
to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
pid_max is changed after 'perf sched record' has created its perf.data.
If the new pid_max configured in 'perf sched replay' is smaller than the
old pid_max configured in 'perf sched record', then it will cause the
assertion failure problem. To solve this problem, we realloc the memory
of pid_to_task stepwise once the passed-in pid parameter in register_pid
is larger than the current pid_max.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ perf sched record ls
 $ echo 5000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 5000

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55356 nsecs
 the run test took 1000011 nsecs
 the sleep test took 1060940 nsecs
 perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
 long)pid_max)' failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55611 nsecs
 the run test took 1000026 nsecs
 the sleep test took 1060486 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 20d887b..dd71481 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -334,7 +334,12 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 			pid_max = MAX_PID;
 		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
 	}
-	BUG_ON(pid >= (unsigned long)pid_max);
+	if (pid >= (unsigned long)pid_max) {
+		BUG_ON((sched->pid_to_task = realloc(sched->pid_to_task, (pid + 1) *
+			sizeof(struct task_desc *))) == NULL);
+		while (pid >= (unsigned long)pid_max)
+			sched->pid_to_task[pid_max++] = NULL;
+	}
 
 	task = sched->pid_to_task[pid];
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/9] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (3 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Yunlong Song
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

The pr_err in self_open_counters() prints error message to stderr.
Unlike stdout, stderr uses memory buffer on the stack of each calling
process. The pr_err in self_open_counters() works in a thread called
thread_func created in function create_tasks, which concurrently creates
sched->nr_tasks threads. If the error happens and pr_err prints the
error message in each of these threads, the stack size of the perf
process (default is 8192 kbytes) will quickly run out and the
segmentation fault will happen then. To solve this problem, pr_err with
self_open_counters() should be moved from newly created threads to the
old main thread of the perf process. Then the pr_err can work in a
stable situation without the strange segmentation fault problem.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Segmentation fault

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ...

As shown above, the result continues without any segmentation fault.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index dd71481..7fe3b3c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -472,6 +472,7 @@ static u64 get_cpu_usage_nsec_self(int fd)
 struct sched_thread_parms {
 	struct task_desc  *task;
 	struct perf_sched *sched;
+	int fd;
 };
 
 static void *thread_func(void *ctx)
@@ -482,13 +483,12 @@ static void *thread_func(void *ctx)
 	u64 cpu_usage_0, cpu_usage_1;
 	unsigned long i, ret;
 	char comm2[22];
-	int fd;
+	int fd = parms->fd;
 
 	zfree(&parms);
 
 	sprintf(comm2, ":%s", this_task->comm);
 	prctl(PR_SET_NAME, comm2);
-	fd = self_open_counters();
 	if (fd < 0)
 		return NULL;
 again:
@@ -540,6 +540,7 @@ static void create_tasks(struct perf_sched *sched)
 		BUG_ON(parms == NULL);
 		parms->task = task = sched->tasks[i];
 		parms->sched = sched;
+		parms->fd = self_open_counters();
 		sem_init(&task->sleep_sem, 0, 0);
 		sem_init(&task->ready_for_work, 0, 0);
 		sem_init(&task->work_done_sem, 0, 0);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (4 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 5/9] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Yunlong Song
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem). The sem_wait can continue only when
work_done_sem is greater than 0, or it will be blocked. For perf sched
replay, one task may sem_post the work_done_sem of another task, which
causes the work_done_sem of that task processed in a reasonable sequence,
e.g. sem_post, sem_wait, sem_wait, sem_post... This sequence simulates
the sched process of the running tasks at the time when perf sched
record runs. As a result, all the tasks are required and their threads
must be successfully created. If any one (task A) of the tasks fails to
create its thread, then another task (task B), whose work_done_sem needs
sem_post from that failed task A, may likely block itself due to seg_wait.
And this is a dead halt, since task B's thread_func cannot continue at
all. To solve this problem, perf sched replay should exit once any task
fails to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 ------------------------------------------------------------    <- dead halt

After this patch:

 $ perf sched replay
 ...
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 $

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7fe3b3c..3261300 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -451,10 +451,12 @@ static int self_open_counters(void)
 	fd = sys_perf_event_open(&attr, 0, -1, -1,
 				 perf_event_open_cloexec_flag());
 
-	if (fd < 0)
+	if (fd < 0) {
 		pr_err("Error: sys_perf_event_open() syscall returned "
 		       "with %d (%s)\n", fd,
 		       strerror_r(errno, sbuf, sizeof(sbuf)));
+		exit(EXIT_FAILURE);
+	}
 	return fd;
 }
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (5 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-07 16:49   ` David Ahern
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-03-31 13:46 ` [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership Yunlong Song
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h. Both INR_OPEN_CUR
and INR_OPEN_MAX are used to limit the value of RLIMIT_NOFILE in
include/asm-generic/resource.h. And the soft maximum number finally
decides the limitation of the maximum files which are allowed to be
opened. That is to say a process can use at most 1024 file descriptors
for its opened files, or an EMFILE error will happen. This error can be
fixed by increasing the soft maximum number, under the constraint that
the soft maximum number can not exceed the hard maximum number, or both
soft and hard maximum number should be increased simultaneously with
privilege.

For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events. That is to say each task needs a unique file descriptor. In
x86_64, there may be over 1024 or 4096 tasks correspoinding to the
record in perf.data, which causes that no enough file descriptors can be
used. As a result, EMFILE error happens and stops the replay process. To
solve this problem, we adaptively increase the soft and hard maximum
number of open files with a '-f' option.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ cat /proc/sys/fs/file-max
 6815744
 $ ulimit -Sn
 1024
 $ ulimit -Hn
 4096

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 Have a try with -f option

 $ perf sched replay -f
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
 #2  : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
 #3  : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
 #4  : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
 #5  : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
 #6  : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
 #7  : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
 #8  : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
 #9  : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
 #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3261300..5ab58c6 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -170,6 +170,7 @@ struct perf_sched {
 	u64		 cpu_last_switched[MAX_CPUS];
 	struct rb_root	 atom_root, sorted_atom_root;
 	struct list_head sort_list, cmp_pid;
+	bool force;
 };
 
 static u64 get_nsecs(void)
@@ -437,24 +438,43 @@ static u64 get_cpu_usage_nsec_parent(void)
 	return sum;
 }
 
-static int self_open_counters(void)
+static int self_open_counters(struct perf_sched *sched, unsigned long cur_task)
 {
 	struct perf_event_attr attr;
-	char sbuf[STRERR_BUFSIZE];
+	char sbuf[STRERR_BUFSIZE], info[STRERR_BUFSIZE];
 	int fd;
+	struct rlimit limit;
+	bool need_privilege = false;
 
 	memset(&attr, 0, sizeof(attr));
 
 	attr.type = PERF_TYPE_SOFTWARE;
 	attr.config = PERF_COUNT_SW_TASK_CLOCK;
 
+force_again:
 	fd = sys_perf_event_open(&attr, 0, -1, -1,
 				 perf_event_open_cloexec_flag());
 
 	if (fd < 0) {
+		if (errno == EMFILE) {
+			if (sched->force) {
+				BUG_ON(getrlimit(RLIMIT_NOFILE, &limit) == -1);
+				limit.rlim_cur += sched->nr_tasks - cur_task;
+				if (limit.rlim_cur > limit.rlim_max) {
+					limit.rlim_max = limit.rlim_cur;
+					need_privilege = true;
+				}
+				if (setrlimit(RLIMIT_NOFILE, &limit) == -1) {
+					if (need_privilege && errno == EPERM)
+						strcpy(info, "Need privilege\n");
+				} else
+					goto force_again;
+			} else
+				strcpy(info, "Have a try with -f option\n");
+		}
 		pr_err("Error: sys_perf_event_open() syscall returned "
-		       "with %d (%s)\n", fd,
-		       strerror_r(errno, sbuf, sizeof(sbuf)));
+		       "with %d (%s)\n%s", fd,
+		       strerror_r(errno, sbuf, sizeof(sbuf)), info);
 		exit(EXIT_FAILURE);
 	}
 	return fd;
@@ -542,7 +562,7 @@ static void create_tasks(struct perf_sched *sched)
 		BUG_ON(parms == NULL);
 		parms->task = task = sched->tasks[i];
 		parms->sched = sched;
-		parms->fd = self_open_counters();
+		parms->fd = self_open_counters(sched, i);
 		sem_init(&task->sleep_sem, 0, 0);
 		sem_init(&task->ready_for_work, 0, 0);
 		sem_init(&task->work_done_sem, 0, 0);
@@ -1700,6 +1720,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_END()
 	};
 	const struct option sched_options[] = {
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (6 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-09-21 13:54   ` [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer Yunlong Song
  2015-03-31 13:46 ` [PATCH 9/9] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Yunlong Song
  2015-04-07  3:20 ` [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
  9 siblings, 2 replies; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Enable to use perf.data when it is not owned by current user or root.

Example:

 $ ls -al perf.data
 -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
 $ sudo id
 uid=0(root) gid=0(root) groups=0(root),64(pkcs11)

Before this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 98 nsecs
 sleep measurement overhead: 52909 nsecs
 the run test took 1000015 nsecs
 the sleep test took 1054253 nsecs
 File perf.data not owned by current user or root (use -f to override)

As shown above, the -f option does not work at all.

After this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 40514 nsecs
 the run test took 1000003 nsecs
 the sleep test took 1056098 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 ...
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
 #2  : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
 #3  : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
 #4  : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
 #5  : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
 #6  : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
 #7  : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
 #8  : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
 #9  : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
 #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45

As shown above, the -f option really works now.

Besides for replay, -f option can also work for latency and map.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 5ab58c6..7b7b798 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1487,6 +1487,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
 	struct perf_data_file file = {
 		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
+		.force = sched->force,
 	};
 	int rc = -1;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 9/9] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (7 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership Yunlong Song
@ 2015-03-31 13:46 ` Yunlong Song
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
  2015-04-07  3:20 ` [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-03-31 13:46 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value. However, the replay_repeat can be changed
to other value by using -r option, so the calculation above should use
replay_repeat to achieve more accurate results instead of the default
value 10.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
---
 tools/perf/builtin-sched.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7b7b798..5275bab 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -607,13 +607,13 @@ static void wait_for_tasks(struct perf_sched *sched)
 	cpu_usage_1 = get_cpu_usage_nsec_parent();
 	if (!sched->runavg_cpu_usage)
 		sched->runavg_cpu_usage = sched->cpu_usage;
-	sched->runavg_cpu_usage = (sched->runavg_cpu_usage * 9 + sched->cpu_usage) / 10;
+	sched->runavg_cpu_usage = (sched->runavg_cpu_usage * (sched->replay_repeat - 1) + sched->cpu_usage) / sched->replay_repeat;
 
 	sched->parent_cpu_usage = cpu_usage_1 - cpu_usage_0;
 	if (!sched->runavg_parent_cpu_usage)
 		sched->runavg_parent_cpu_usage = sched->parent_cpu_usage;
-	sched->runavg_parent_cpu_usage = (sched->runavg_parent_cpu_usage * 9 +
-					 sched->parent_cpu_usage)/10;
+	sched->runavg_parent_cpu_usage = (sched->runavg_parent_cpu_usage * (sched->replay_repeat - 1) +
+					 sched->parent_cpu_usage)/sched->replay_repeat;
 
 	ret = pthread_mutex_lock(&sched->start_work_mutex);
 	BUG_ON(ret);
@@ -645,7 +645,7 @@ static void run_one_test(struct perf_sched *sched)
 	sched->sum_fluct += fluct;
 	if (!sched->run_avg)
 		sched->run_avg = delta;
-	sched->run_avg = (sched->run_avg * 9 + delta) / 10;
+	sched->run_avg = (sched->run_avg * (sched->replay_repeat - 1) + delta) / sched->replay_repeat;
 
 	printf("#%-3ld: %0.3f, ", sched->nr_runs, (double)delta / 1000000.0);
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
  2015-03-31 13:46 ` [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Yunlong Song
@ 2015-03-31 14:25   ` David Ahern
  2015-04-01  7:10     ` Yunlong Song
  2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Yunlong Song
  1 sibling, 1 reply; 35+ messages in thread
From: David Ahern @ 2015-03-31 14:25 UTC (permalink / raw)
  To: Yunlong Song, a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

On 3/31/15 7:46 AM, Yunlong Song wrote:
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index a1893e8..c466104 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -28,7 +28,7 @@
>   #define MAX_CPUS		4096
>   #define COMM_LEN		20
>   #define SYM_LEN			129
> -#define MAX_PID			65536
> +#define MAX_PID			1024000
>
>   struct sched_atom;

# cat /proc/sys/kernel/pid_max
1048576

so your proposed change is still not high enough for what I need.

It would be best to make it dynamic, not static, with run time 
reallocations as needed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 13:46 ` [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Yunlong Song
@ 2015-03-31 14:32   ` David Ahern
  2015-03-31 15:56     ` Arnaldo Carvalho de Melo
                       ` (2 more replies)
  2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
  1 sibling, 3 replies; 35+ messages in thread
From: David Ahern @ 2015-03-31 14:32 UTC (permalink / raw)
  To: Yunlong Song, a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

On 3/31/15 7:46 AM, Yunlong Song wrote:
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index c466104..20d887b 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -23,6 +23,7 @@
>   #include <semaphore.h>
>   #include <pthread.h>
>   #include <math.h>
> +#include <api/fs/fs.h>
>
>   #define PR_SET_NAME		15               /* Set process name */
>   #define MAX_CPUS		4096
> @@ -124,7 +125,7 @@ struct perf_sched {
>   	struct perf_tool tool;
>   	const char	 *sort_order;
>   	unsigned long	 nr_tasks;
> -	struct task_desc *pid_to_task[MAX_PID];
> +	struct task_desc **pid_to_task;
>   	struct task_desc **tasks;
>   	const struct trace_sched_handler *tp_handler;
>   	pthread_mutex_t	 start_work_mutex;
> @@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>   				      unsigned long pid, const char *comm)
>   {
>   	struct task_desc *task;
> +	static int pid_max;
>
> -	BUG_ON(pid >= MAX_PID);
> +	if (sched->pid_to_task == NULL) {
> +		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
> +			pid_max = MAX_PID;
> +		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
> +	}
> +	BUG_ON(pid >= (unsigned long)pid_max);
>

so why the previous patch bumping the MAX_PID count if you move to 
dynamic here? And shouldn't MAX_PID get dropped here as well?

So attached is what i put together last week; just have not had time to 
send it out.

[-- Attachment #2: 0003-perf-sched-Remove-max-pid-assumption-from-perf-sched.patch --]
[-- Type: text/plain, Size: 3770 bytes --]

>From 159dc732e0ad66d9151e93761bc9c685872e9fa4 Mon Sep 17 00:00:00 2001
From: David Ahern <david.ahern@oracle.com>
Date: Tue, 24 Mar 2015 16:57:10 -0400
Subject: [PATCH 3/5] perf sched: Remove max pid assumption from perf-sched

'perf sched replay' currently fails on sparc64:
    $ perf sched replay
    run measurement overhead: 2475 nsecs
    sleep measurement overhead: 56165 nsecs
    the run test took 999705 nsecs
    the sleep test took 1059270 nsecs
    perf: builtin-sched.c:384: register_pid: Assertion `!(pid >= 65536)' failed.
    Aborted

The max pid limitation is removed by converting pid_to_task from a
pid based array to an intlist (rblist) with the pid as the index
and task_desc stored in the priv element.

In the process pid is converted from a long int to int.

Signed-off-by: David Ahern <david.ahern@oracle.com>
---
 tools/perf/builtin-sched.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index cc52c993a1fa..858d85396d81 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -33,13 +33,12 @@
 #define MAX_CPUS		4096
 #define COMM_LEN		20
 #define SYM_LEN			129
-#define MAX_PID			65536
 
 struct sched_atom;
 
 struct task_desc {
 	unsigned long		nr;
-	unsigned long		pid;
+	int			pid;
 	char			comm[COMM_LEN];
 
 	unsigned long		nr_events;
@@ -129,7 +128,7 @@ struct perf_sched {
 	struct perf_tool tool;
 	const char	 *sort_order;
 	unsigned long	 nr_tasks;
-	struct task_desc *pid_to_task[MAX_PID];
+	struct intlist	 *pid_to_task;
 	struct task_desc **tasks;
 	const struct trace_sched_handler *tp_handler;
 	pthread_mutex_t	 start_work_mutex;
@@ -377,14 +376,18 @@ static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *ta
 }
 
 static struct task_desc *register_pid(struct perf_sched *sched,
-				      unsigned long pid, const char *comm)
+				      int pid, const char *comm)
 {
-	struct task_desc *task;
 
-	BUG_ON(pid >= MAX_PID);
+	struct int_node *node = intlist__findnew(sched->pid_to_task, pid);
+	struct task_desc *task;
 
-	task = sched->pid_to_task[pid];
+	if (node == NULL) {
+		pr_err("Failed to allocate entry for task\n");
+		return NULL;
+	}
 
+	task = (struct task_desc *) node->priv;
 	if (task)
 		return task;
 
@@ -392,20 +395,21 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 	task->pid = pid;
 	task->nr = sched->nr_tasks;
 	strcpy(task->comm, comm);
+
 	/*
 	 * every task starts in sleeping state - this gets ignored
 	 * if there's no wakeup pointing to this sleep state:
 	 */
 	add_sched_event_sleep(sched, task, 0, 0);
 
-	sched->pid_to_task[pid] = task;
+	node->priv = task;
 	sched->nr_tasks++;
 	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
 	BUG_ON(!sched->tasks);
 	sched->tasks[task->nr] = task;
 
 	if (verbose)
-		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
+		printf("registered task #%ld, PID %d (%s)\n", sched->nr_tasks, pid, comm);
 
 	return task;
 }
@@ -418,7 +422,7 @@ static void print_task_traces(struct perf_sched *sched)
 
 	for (i = 0; i < sched->nr_tasks; i++) {
 		task = sched->tasks[i];
-		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
+		printf("task %6ld (%20s:%10d), nr_events: %ld\n",
 			task->nr, task->comm, task->pid, task->nr_events);
 	}
 }
@@ -2981,6 +2985,12 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	};
 	unsigned int i;
 
+	sched.pid_to_task = intlist__new(NULL);
+	if (sched.pid_to_task == NULL) {
+		pr_err("Failed to allocate intlist for tracking tasks\n");
+		return -ENOMEM;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
 		sched.curr_pid[i] = -1;
 
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 14:32   ` David Ahern
@ 2015-03-31 15:56     ` Arnaldo Carvalho de Melo
  2015-04-01  7:06       ` Yunlong Song
  2015-03-31 20:25     ` Arnaldo Carvalho de Melo
  2015-04-01  7:23     ` Yunlong Song
  2 siblings, 1 reply; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-31 15:56 UTC (permalink / raw)
  To: David Ahern
  Cc: Yunlong Song, a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

Em Tue, Mar 31, 2015 at 08:32:37AM -0600, David Ahern escreveu:
> On 3/31/15 7:46 AM, Yunlong Song wrote:
> >-	BUG_ON(pid >= MAX_PID);
> >+	if (sched->pid_to_task == NULL) {
> >+		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
> >+			pid_max = MAX_PID;
> >+		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
> >+	}
> >+	BUG_ON(pid >= (unsigned long)pid_max);
 
> so why the previous patch bumping the MAX_PID count if you move to dynamic
> here? And shouldn't MAX_PID get dropped here as well?
 
> So attached is what i put together last week; just have not had time to send
> it out.

Yunlong, can you please check/Ack this?

- Arnaldo

> >From 159dc732e0ad66d9151e93761bc9c685872e9fa4 Mon Sep 17 00:00:00 2001
> From: David Ahern <david.ahern@oracle.com>
> Date: Tue, 24 Mar 2015 16:57:10 -0400
> Subject: [PATCH 3/5] perf sched: Remove max pid assumption from perf-sched
> 
> 'perf sched replay' currently fails on sparc64:
>     $ perf sched replay
>     run measurement overhead: 2475 nsecs
>     sleep measurement overhead: 56165 nsecs
>     the run test took 999705 nsecs
>     the sleep test took 1059270 nsecs
>     perf: builtin-sched.c:384: register_pid: Assertion `!(pid >= 65536)' failed.
>     Aborted
> 
> The max pid limitation is removed by converting pid_to_task from a
> pid based array to an intlist (rblist) with the pid as the index
> and task_desc stored in the priv element.
> 
> In the process pid is converted from a long int to int.
> 
> Signed-off-by: David Ahern <david.ahern@oracle.com>
> ---
>  tools/perf/builtin-sched.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index cc52c993a1fa..858d85396d81 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -33,13 +33,12 @@
>  #define MAX_CPUS		4096
>  #define COMM_LEN		20
>  #define SYM_LEN			129
> -#define MAX_PID			65536
>  
>  struct sched_atom;
>  
>  struct task_desc {
>  	unsigned long		nr;
> -	unsigned long		pid;
> +	int			pid;
>  	char			comm[COMM_LEN];
>  
>  	unsigned long		nr_events;
> @@ -129,7 +128,7 @@ struct perf_sched {
>  	struct perf_tool tool;
>  	const char	 *sort_order;
>  	unsigned long	 nr_tasks;
> -	struct task_desc *pid_to_task[MAX_PID];
> +	struct intlist	 *pid_to_task;
>  	struct task_desc **tasks;
>  	const struct trace_sched_handler *tp_handler;
>  	pthread_mutex_t	 start_work_mutex;
> @@ -377,14 +376,18 @@ static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *ta
>  }
>  
>  static struct task_desc *register_pid(struct perf_sched *sched,
> -				      unsigned long pid, const char *comm)
> +				      int pid, const char *comm)
>  {
> -	struct task_desc *task;
>  
> -	BUG_ON(pid >= MAX_PID);
> +	struct int_node *node = intlist__findnew(sched->pid_to_task, pid);
> +	struct task_desc *task;
>  
> -	task = sched->pid_to_task[pid];
> +	if (node == NULL) {
> +		pr_err("Failed to allocate entry for task\n");
> +		return NULL;
> +	}
>  
> +	task = (struct task_desc *) node->priv;
>  	if (task)
>  		return task;
>  
> @@ -392,20 +395,21 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>  	task->pid = pid;
>  	task->nr = sched->nr_tasks;
>  	strcpy(task->comm, comm);
> +
>  	/*
>  	 * every task starts in sleeping state - this gets ignored
>  	 * if there's no wakeup pointing to this sleep state:
>  	 */
>  	add_sched_event_sleep(sched, task, 0, 0);
>  
> -	sched->pid_to_task[pid] = task;
> +	node->priv = task;
>  	sched->nr_tasks++;
>  	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
>  	BUG_ON(!sched->tasks);
>  	sched->tasks[task->nr] = task;
>  
>  	if (verbose)
> -		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
> +		printf("registered task #%ld, PID %d (%s)\n", sched->nr_tasks, pid, comm);
>  
>  	return task;
>  }
> @@ -418,7 +422,7 @@ static void print_task_traces(struct perf_sched *sched)
>  
>  	for (i = 0; i < sched->nr_tasks; i++) {
>  		task = sched->tasks[i];
> -		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
> +		printf("task %6ld (%20s:%10d), nr_events: %ld\n",
>  			task->nr, task->comm, task->pid, task->nr_events);
>  	}
>  }
> @@ -2981,6 +2985,12 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>  	};
>  	unsigned int i;
>  
> +	sched.pid_to_task = intlist__new(NULL);
> +	if (sched.pid_to_task == NULL) {
> +		pr_err("Failed to allocate intlist for tracking tasks\n");
> +		return -ENOMEM;
> +	}
> +
>  	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
>  		sched.curr_pid[i] = -1;
>  
> -- 
> 2.3.0
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 14:32   ` David Ahern
  2015-03-31 15:56     ` Arnaldo Carvalho de Melo
@ 2015-03-31 20:25     ` Arnaldo Carvalho de Melo
  2015-03-31 22:26       ` David Ahern
  2015-04-01  7:23     ` Yunlong Song
  2 siblings, 1 reply; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-31 20:25 UTC (permalink / raw)
  To: David Ahern
  Cc: Yunlong Song, a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

Em Tue, Mar 31, 2015 at 08:32:37AM -0600, David Ahern escreveu:
> On 3/31/15 7:46 AM, Yunlong Song wrote:
> >diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> >index c466104..20d887b 100644
> >--- a/tools/perf/builtin-sched.c
> >+++ b/tools/perf/builtin-sched.c
> >@@ -23,6 +23,7 @@
> >  #include <semaphore.h>
> >  #include <pthread.h>
> >  #include <math.h>
> >+#include <api/fs/fs.h>
> >
> >  #define PR_SET_NAME		15               /* Set process name */
> >  #define MAX_CPUS		4096
> >@@ -124,7 +125,7 @@ struct perf_sched {
> >  	struct perf_tool tool;
> >  	const char	 *sort_order;
> >  	unsigned long	 nr_tasks;
> >-	struct task_desc *pid_to_task[MAX_PID];
> >+	struct task_desc **pid_to_task;
> >  	struct task_desc **tasks;
> >  	const struct trace_sched_handler *tp_handler;
> >  	pthread_mutex_t	 start_work_mutex;
> >@@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
> >  				      unsigned long pid, const char *comm)
> >  {
> >  	struct task_desc *task;
> >+	static int pid_max;
> >
> >-	BUG_ON(pid >= MAX_PID);
> >+	if (sched->pid_to_task == NULL) {
> >+		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
> >+			pid_max = MAX_PID;
> >+		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
> >+	}
> >+	BUG_ON(pid >= (unsigned long)pid_max);
> >
> 
> so why the previous patch bumping the MAX_PID count if you move to dynamic
> here? And shouldn't MAX_PID get dropped here as well?
> 
> So attached is what i put together last week; just have not had time to send
> it out.

Humm, we already have an rb_tree for each task, its called
machine->threads, and it has struct thread instances, that in turn have
a ->priv point, can't it be used here?

- Arnaldo

> >From 159dc732e0ad66d9151e93761bc9c685872e9fa4 Mon Sep 17 00:00:00 2001
> From: David Ahern <david.ahern@oracle.com>
> Date: Tue, 24 Mar 2015 16:57:10 -0400
> Subject: [PATCH 3/5] perf sched: Remove max pid assumption from perf-sched
> 
> 'perf sched replay' currently fails on sparc64:
>     $ perf sched replay
>     run measurement overhead: 2475 nsecs
>     sleep measurement overhead: 56165 nsecs
>     the run test took 999705 nsecs
>     the sleep test took 1059270 nsecs
>     perf: builtin-sched.c:384: register_pid: Assertion `!(pid >= 65536)' failed.
>     Aborted
> 
> The max pid limitation is removed by converting pid_to_task from a
> pid based array to an intlist (rblist) with the pid as the index
> and task_desc stored in the priv element.
> 
> In the process pid is converted from a long int to int.
> 
> Signed-off-by: David Ahern <david.ahern@oracle.com>
> ---
>  tools/perf/builtin-sched.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index cc52c993a1fa..858d85396d81 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -33,13 +33,12 @@
>  #define MAX_CPUS		4096
>  #define COMM_LEN		20
>  #define SYM_LEN			129
> -#define MAX_PID			65536
>  
>  struct sched_atom;
>  
>  struct task_desc {
>  	unsigned long		nr;
> -	unsigned long		pid;
> +	int			pid;
>  	char			comm[COMM_LEN];
>  
>  	unsigned long		nr_events;
> @@ -129,7 +128,7 @@ struct perf_sched {
>  	struct perf_tool tool;
>  	const char	 *sort_order;
>  	unsigned long	 nr_tasks;
> -	struct task_desc *pid_to_task[MAX_PID];
> +	struct intlist	 *pid_to_task;
>  	struct task_desc **tasks;
>  	const struct trace_sched_handler *tp_handler;
>  	pthread_mutex_t	 start_work_mutex;
> @@ -377,14 +376,18 @@ static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *ta
>  }
>  
>  static struct task_desc *register_pid(struct perf_sched *sched,
> -				      unsigned long pid, const char *comm)
> +				      int pid, const char *comm)
>  {
> -	struct task_desc *task;
>  
> -	BUG_ON(pid >= MAX_PID);
> +	struct int_node *node = intlist__findnew(sched->pid_to_task, pid);
> +	struct task_desc *task;
>  
> -	task = sched->pid_to_task[pid];
> +	if (node == NULL) {
> +		pr_err("Failed to allocate entry for task\n");
> +		return NULL;
> +	}
>  
> +	task = (struct task_desc *) node->priv;
>  	if (task)
>  		return task;
>  
> @@ -392,20 +395,21 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>  	task->pid = pid;
>  	task->nr = sched->nr_tasks;
>  	strcpy(task->comm, comm);
> +
>  	/*
>  	 * every task starts in sleeping state - this gets ignored
>  	 * if there's no wakeup pointing to this sleep state:
>  	 */
>  	add_sched_event_sleep(sched, task, 0, 0);
>  
> -	sched->pid_to_task[pid] = task;
> +	node->priv = task;
>  	sched->nr_tasks++;
>  	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
>  	BUG_ON(!sched->tasks);
>  	sched->tasks[task->nr] = task;
>  
>  	if (verbose)
> -		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
> +		printf("registered task #%ld, PID %d (%s)\n", sched->nr_tasks, pid, comm);
>  
>  	return task;
>  }
> @@ -418,7 +422,7 @@ static void print_task_traces(struct perf_sched *sched)
>  
>  	for (i = 0; i < sched->nr_tasks; i++) {
>  		task = sched->tasks[i];
> -		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
> +		printf("task %6ld (%20s:%10d), nr_events: %ld\n",
>  			task->nr, task->comm, task->pid, task->nr_events);
>  	}
>  }
> @@ -2981,6 +2985,12 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>  	};
>  	unsigned int i;
>  
> +	sched.pid_to_task = intlist__new(NULL);
> +	if (sched.pid_to_task == NULL) {
> +		pr_err("Failed to allocate intlist for tracking tasks\n");
> +		return -ENOMEM;
> +	}
> +
>  	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
>  		sched.curr_pid[i] = -1;
>  
> -- 
> 2.3.0
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 20:25     ` Arnaldo Carvalho de Melo
@ 2015-03-31 22:26       ` David Ahern
  2015-03-31 22:35         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2015-03-31 22:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Yunlong Song, a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

On 3/31/15 2:25 PM, Arnaldo Carvalho de Melo wrote:
>
> Humm, we already have an rb_tree for each task, its called
> machine->threads, and it has struct thread instances, that in turn have
> a ->priv point, can't it be used here?
>

I think that would require a lot of churn to the existing code. The 
command could definitely use some modernizing, but it will take time.

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 22:26       ` David Ahern
@ 2015-03-31 22:35         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-31 22:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Yunlong Song, a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

Em Tue, Mar 31, 2015 at 04:26:04PM -0600, David Ahern escreveu:
> On 3/31/15 2:25 PM, Arnaldo Carvalho de Melo wrote:
> >Humm, we already have an rb_tree for each task, its called
> >machine->threads, and it has struct thread instances, that in turn have
> >a ->priv point, can't it be used here?

> I think that would require a lot of churn to the existing code. The command
> could definitely use some modernizing, but it will take time.

yeah, I've been there, some of the infrastructure changes here and there
are related to this, i.e. how to make the core more useful for tools
like 'sched' :-)

I.e. at some point it should be just a struct thread descendant, i.e.
something like:

struct sched_thread {
	struc thread thread;
	sched specific fields;
};

or have the sched specific fields accessible via thread->priv.

The former may be better performance wise due to data locality, i.e.
better cacheline usage. This is something I did, for instance, for
perf_evsel/hists_evsel.

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 15:56     ` Arnaldo Carvalho de Melo
@ 2015-04-01  7:06       ` Yunlong Song
  2015-04-07 13:23         ` Yunlong Song
  0 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-04-01  7:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

On 2015/3/31 23:56, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 31, 2015 at 08:32:37AM -0600, David Ahern escreveu:
>> On 3/31/15 7:46 AM, Yunlong Song wrote:
>>> -	BUG_ON(pid >= MAX_PID);
>>> +	if (sched->pid_to_task == NULL) {
>>> +		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
>>> +			pid_max = MAX_PID;
>>> +		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
>>> +	}
>>> +	BUG_ON(pid >= (unsigned long)pid_max);
>  
>> so why the previous patch bumping the MAX_PID count if you move to dynamic
>> here? And shouldn't MAX_PID get dropped here as well?
>  
>> So attached is what i put together last week; just have not had time to send
>> it out.
> 
> Yunlong, can you please check/Ack this?
> 
> - Arnaldo
> 
>> >From 159dc732e0ad66d9151e93761bc9c685872e9fa4 Mon Sep 17 00:00:00 2001
>> From: David Ahern <david.ahern@oracle.com>
>> Date: Tue, 24 Mar 2015 16:57:10 -0400
>> Subject: [PATCH 3/5] perf sched: Remove max pid assumption from perf-sched
>>
>> 'perf sched replay' currently fails on sparc64:
>>     $ perf sched replay
>>     run measurement overhead: 2475 nsecs
>>     sleep measurement overhead: 56165 nsecs
>>     the run test took 999705 nsecs
>>     the sleep test took 1059270 nsecs
>>     perf: builtin-sched.c:384: register_pid: Assertion `!(pid >= 65536)' failed.
>>     Aborted
>>
>> The max pid limitation is removed by converting pid_to_task from a
>> pid based array to an intlist (rblist) with the pid as the index
>> and task_desc stored in the priv element.
>>
>> In the process pid is converted from a long int to int.
>>
>> Signed-off-by: David Ahern <david.ahern@oracle.com>
>> ---
>>  tools/perf/builtin-sched.c | 30 ++++++++++++++++++++----------
>>  1 file changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index cc52c993a1fa..858d85396d81 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -33,13 +33,12 @@
>>  #define MAX_CPUS		4096
>>  #define COMM_LEN		20
>>  #define SYM_LEN			129
>> -#define MAX_PID			65536
>>  
>>  struct sched_atom;
>>  
>>  struct task_desc {
>>  	unsigned long		nr;
>> -	unsigned long		pid;
>> +	int			pid;
>>  	char			comm[COMM_LEN];
>>  
>>  	unsigned long		nr_events;
>> @@ -129,7 +128,7 @@ struct perf_sched {
>>  	struct perf_tool tool;
>>  	const char	 *sort_order;
>>  	unsigned long	 nr_tasks;
>> -	struct task_desc *pid_to_task[MAX_PID];
>> +	struct intlist	 *pid_to_task;
>>  	struct task_desc **tasks;
>>  	const struct trace_sched_handler *tp_handler;
>>  	pthread_mutex_t	 start_work_mutex;
>> @@ -377,14 +376,18 @@ static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *ta
>>  }
>>  
>>  static struct task_desc *register_pid(struct perf_sched *sched,
>> -				      unsigned long pid, const char *comm)
>> +				      int pid, const char *comm)
>>  {
>> -	struct task_desc *task;
>>  
>> -	BUG_ON(pid >= MAX_PID);
>> +	struct int_node *node = intlist__findnew(sched->pid_to_task, pid);
>> +	struct task_desc *task;
>>  
>> -	task = sched->pid_to_task[pid];
>> +	if (node == NULL) {
>> +		pr_err("Failed to allocate entry for task\n");
>> +		return NULL;
>> +	}
>>  
>> +	task = (struct task_desc *) node->priv;
>>  	if (task)
>>  		return task;
>>  
>> @@ -392,20 +395,21 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>>  	task->pid = pid;
>>  	task->nr = sched->nr_tasks;
>>  	strcpy(task->comm, comm);
>> +
>>  	/*
>>  	 * every task starts in sleeping state - this gets ignored
>>  	 * if there's no wakeup pointing to this sleep state:
>>  	 */
>>  	add_sched_event_sleep(sched, task, 0, 0);
>>  
>> -	sched->pid_to_task[pid] = task;
>> +	node->priv = task;
>>  	sched->nr_tasks++;
>>  	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
>>  	BUG_ON(!sched->tasks);
>>  	sched->tasks[task->nr] = task;
>>  
>>  	if (verbose)
>> -		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
>> +		printf("registered task #%ld, PID %d (%s)\n", sched->nr_tasks, pid, comm);
>>  
>>  	return task;
>>  }
>> @@ -418,7 +422,7 @@ static void print_task_traces(struct perf_sched *sched)
>>  
>>  	for (i = 0; i < sched->nr_tasks; i++) {
>>  		task = sched->tasks[i];
>> -		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
>> +		printf("task %6ld (%20s:%10d), nr_events: %ld\n",
>>  			task->nr, task->comm, task->pid, task->nr_events);
>>  	}
>>  }
>> @@ -2981,6 +2985,12 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>>  	};
>>  	unsigned int i;
>>  
>> +	sched.pid_to_task = intlist__new(NULL);
>> +	if (sched.pid_to_task == NULL) {
>> +		pr_err("Failed to allocate intlist for tracking tasks\n");
>> +		return -ENOMEM;
>> +	}
>> +
>>  	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
>>  		sched.curr_pid[i] = -1;
>>  
>> -- 
>> 2.3.0
>>
> 
> 
> .
> 

I have checked David's patch, the difference with my patch is that David's patch
uses a rblist to sort and search the pid's task with a time complexity of O(log n),
while my patch uses the array (same as the original design) dynamically created with
a time complexity of O(1). For every simple, my patch does not need to traverse a
list and has nothing to do with the total number of tasks. The maximum value of
pid_max is PID_MAX_LIMIT, which equals to 4*1024*1024 in x86_64. Then for each simple,
David's patch has to cost O(log 4*1024*1024) in searching a pid's task, vs O(1) of
my patch. So I suggest to use array instead of rblist to solve this problem and save
unnecessary waste of time.

However, if you finally decide to take David's patch rather than my patch, please ignore
the following 3 patches:

[PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
[PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the
unexpected change of pid_max
[PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the
different pid_max configurations

The other 6 patches in these patch sets still need to be applied to make other improvements
and bug fixes.

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
  2015-03-31 14:25   ` David Ahern
@ 2015-04-01  7:10     ` Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: Yunlong Song @ 2015-04-01  7:10 UTC (permalink / raw)
  To: David Ahern, a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

On 2015/3/31 22:25, David Ahern wrote:
> On 3/31/15 7:46 AM, Yunlong Song wrote:
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index a1893e8..c466104 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -28,7 +28,7 @@
>>   #define MAX_CPUS        4096
>>   #define COMM_LEN        20
>>   #define SYM_LEN            129
>> -#define MAX_PID            65536
>> +#define MAX_PID            1024000
>>
>>   struct sched_atom;
> 
> # cat /proc/sys/kernel/pid_max
> 1048576
> 
> so your proposed change is still not high enough for what I need.
> 
> It would be best to make it dynamic, not static, with run time reallocations as needed.
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
Yes, please see my 3rd and 4th patch in the patch sets, which dynamic allocate the memory
in run time.

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 14:32   ` David Ahern
  2015-03-31 15:56     ` Arnaldo Carvalho de Melo
  2015-03-31 20:25     ` Arnaldo Carvalho de Melo
@ 2015-04-01  7:23     ` Yunlong Song
  2 siblings, 0 replies; 35+ messages in thread
From: Yunlong Song @ 2015-04-01  7:23 UTC (permalink / raw)
  To: David Ahern, a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

On 2015/3/31 22:32, David Ahern wrote:
> On 3/31/15 7:46 AM, Yunlong Song wrote:
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index c466104..20d887b 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -23,6 +23,7 @@
>>   #include <semaphore.h>
>>   #include <pthread.h>
>>   #include <math.h>
>> +#include <api/fs/fs.h>
>>
>>   #define PR_SET_NAME        15               /* Set process name */
>>   #define MAX_CPUS        4096
>> @@ -124,7 +125,7 @@ struct perf_sched {
>>       struct perf_tool tool;
>>       const char     *sort_order;
>>       unsigned long     nr_tasks;
>> -    struct task_desc *pid_to_task[MAX_PID];
>> +    struct task_desc **pid_to_task;
>>       struct task_desc **tasks;
>>       const struct trace_sched_handler *tp_handler;
>>       pthread_mutex_t     start_work_mutex;
>> @@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>>                         unsigned long pid, const char *comm)
>>   {
>>       struct task_desc *task;
>> +    static int pid_max;
>>
>> -    BUG_ON(pid >= MAX_PID);
>> +    if (sched->pid_to_task == NULL) {
>> +        if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
>> +            pid_max = MAX_PID;
>> +        BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
>> +    }
>> +    BUG_ON(pid >= (unsigned long)pid_max);
>>
> 
> so why the previous patch bumping the MAX_PID count if you move to dynamic here? And shouldn't MAX_PID get dropped here as well?
> 
> So attached is what i put together last week; just have not had time to send it out.

MAX_PID in the previous patch can handle the case when sysctl__read_int("kernel/pid_max", &pid_max) < 0
here in this patch, so I still keep it. However, maybe sysctl__read_int is very unlikely to fail and does
not need MAX_PID at all, but keeping MAX_PID here will not introduce any trouble, thus I choose not to drop
it.

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] perf sched replay: Make some improvements and fixes
  2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
                   ` (8 preceding siblings ...)
  2015-03-31 13:46 ` [PATCH 9/9] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Yunlong Song
@ 2015-04-07  3:20 ` Yunlong Song
  2015-04-07 13:53   ` Arnaldo Carvalho de Melo
  9 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-04-07  3:20 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

On 2015/3/31 21:46, Yunlong Song wrote:
> Hi,
>   Found some functions to improve and bugs to fix in perf sched replay.
> 
> Yunlong Song (9):
>   perf sched replay: Use struct task_desc instead of struct task_task
>     for     correct meaning
>   perf sched replay: Increase the MAX_PID value to fix assertion failure
>         problem
>   perf sched replay: Alloc the memory of pid_to_task dynamically to
>     adapt     to the unexpected change of pid_max
>   perf sched replay: Realloc the memory of pid_to_task stepwise to adapt
>         to the different pid_max configurations
>   perf sched replay: Fix the segmentation fault problem caused by pr_err
>         in threads
>   perf sched replay: Handle the dead halt of sem_wait when
>     create_tasks()     fails for any task
>   perf sched replay: Fix the EMFILE error caused by the limitation of
>     the     maximum open files
>   perf sched replay: Support using -f to override perf.data file
>     ownership
>   perf sched replay: Use replay_repeat to calculate the runavg of cpu   
>      usage instead of the default value 10
> 
>  tools/perf/builtin-sched.c | 67 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 52 insertions(+), 15 deletions(-)
> 

Ping...

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-04-01  7:06       ` Yunlong Song
@ 2015-04-07 13:23         ` Yunlong Song
  2015-04-07 15:02           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-04-07 13:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Ahern
  Cc: a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

On 2015/4/1 15:06, Yunlong Song wrote:
> On 2015/3/31 23:56, Arnaldo Carvalho de Melo wrote:
>> Em Tue, Mar 31, 2015 at 08:32:37AM -0600, David Ahern escreveu:
>>> On 3/31/15 7:46 AM, Yunlong Song wrote:
>>>> -	BUG_ON(pid >= MAX_PID);
>>>> +	if (sched->pid_to_task == NULL) {
>>>> +		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
>>>> +			pid_max = MAX_PID;
>>>> +		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
>>>> +	}
>>>> +	BUG_ON(pid >= (unsigned long)pid_max);
>>  
>>> so why the previous patch bumping the MAX_PID count if you move to dynamic
>>> here? And shouldn't MAX_PID get dropped here as well?
>>  
>>> So attached is what i put together last week; just have not had time to send
>>> it out.
>>
>> Yunlong, can you please check/Ack this?
>>
>> - Arnaldo
>>
>>> >From 159dc732e0ad66d9151e93761bc9c685872e9fa4 Mon Sep 17 00:00:00 2001
>>> From: David Ahern <david.ahern@oracle.com>
>>> Date: Tue, 24 Mar 2015 16:57:10 -0400
>>> Subject: [PATCH 3/5] perf sched: Remove max pid assumption from perf-sched
>>>
>>> 'perf sched replay' currently fails on sparc64:
>>>     $ perf sched replay
>>>     run measurement overhead: 2475 nsecs
>>>     sleep measurement overhead: 56165 nsecs
>>>     the run test took 999705 nsecs
>>>     the sleep test took 1059270 nsecs
>>>     perf: builtin-sched.c:384: register_pid: Assertion `!(pid >= 65536)' failed.
>>>     Aborted
>>>
>>> The max pid limitation is removed by converting pid_to_task from a
>>> pid based array to an intlist (rblist) with the pid as the index
>>> and task_desc stored in the priv element.
>>>
>>> In the process pid is converted from a long int to int.
>>>
>>> Signed-off-by: David Ahern <david.ahern@oracle.com>
>>> ---
>>>  tools/perf/builtin-sched.c | 30 ++++++++++++++++++++----------
>>>  1 file changed, 20 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>> index cc52c993a1fa..858d85396d81 100644
>>> --- a/tools/perf/builtin-sched.c
>>> +++ b/tools/perf/builtin-sched.c
>>> @@ -33,13 +33,12 @@
>>>  #define MAX_CPUS		4096
>>>  #define COMM_LEN		20
>>>  #define SYM_LEN			129
>>> -#define MAX_PID			65536
>>>  
>>>  struct sched_atom;
>>>  
>>>  struct task_desc {
>>>  	unsigned long		nr;
>>> -	unsigned long		pid;
>>> +	int			pid;
>>>  	char			comm[COMM_LEN];
>>>  
>>>  	unsigned long		nr_events;
>>> @@ -129,7 +128,7 @@ struct perf_sched {
>>>  	struct perf_tool tool;
>>>  	const char	 *sort_order;
>>>  	unsigned long	 nr_tasks;
>>> -	struct task_desc *pid_to_task[MAX_PID];
>>> +	struct intlist	 *pid_to_task;
>>>  	struct task_desc **tasks;
>>>  	const struct trace_sched_handler *tp_handler;
>>>  	pthread_mutex_t	 start_work_mutex;
>>> @@ -377,14 +376,18 @@ static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *ta
>>>  }
>>>  
>>>  static struct task_desc *register_pid(struct perf_sched *sched,
>>> -				      unsigned long pid, const char *comm)
>>> +				      int pid, const char *comm)
>>>  {
>>> -	struct task_desc *task;
>>>  
>>> -	BUG_ON(pid >= MAX_PID);
>>> +	struct int_node *node = intlist__findnew(sched->pid_to_task, pid);
>>> +	struct task_desc *task;
>>>  
>>> -	task = sched->pid_to_task[pid];
>>> +	if (node == NULL) {
>>> +		pr_err("Failed to allocate entry for task\n");
>>> +		return NULL;
>>> +	}
>>>  
>>> +	task = (struct task_desc *) node->priv;
>>>  	if (task)
>>>  		return task;
>>>  
>>> @@ -392,20 +395,21 @@ static struct task_desc *register_pid(struct perf_sched *sched,
>>>  	task->pid = pid;
>>>  	task->nr = sched->nr_tasks;
>>>  	strcpy(task->comm, comm);
>>> +
>>>  	/*
>>>  	 * every task starts in sleeping state - this gets ignored
>>>  	 * if there's no wakeup pointing to this sleep state:
>>>  	 */
>>>  	add_sched_event_sleep(sched, task, 0, 0);
>>>  
>>> -	sched->pid_to_task[pid] = task;
>>> +	node->priv = task;
>>>  	sched->nr_tasks++;
>>>  	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
>>>  	BUG_ON(!sched->tasks);
>>>  	sched->tasks[task->nr] = task;
>>>  
>>>  	if (verbose)
>>> -		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
>>> +		printf("registered task #%ld, PID %d (%s)\n", sched->nr_tasks, pid, comm);
>>>  
>>>  	return task;
>>>  }
>>> @@ -418,7 +422,7 @@ static void print_task_traces(struct perf_sched *sched)
>>>  
>>>  	for (i = 0; i < sched->nr_tasks; i++) {
>>>  		task = sched->tasks[i];
>>> -		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
>>> +		printf("task %6ld (%20s:%10d), nr_events: %ld\n",
>>>  			task->nr, task->comm, task->pid, task->nr_events);
>>>  	}
>>>  }
>>> @@ -2981,6 +2985,12 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>>>  	};
>>>  	unsigned int i;
>>>  
>>> +	sched.pid_to_task = intlist__new(NULL);
>>> +	if (sched.pid_to_task == NULL) {
>>> +		pr_err("Failed to allocate intlist for tracking tasks\n");
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>>  	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
>>>  		sched.curr_pid[i] = -1;
>>>  
>>> -- 
>>> 2.3.0
>>>
>>
>>
>> .
>>
> 
> I have checked David's patch, the difference with my patch is that David's patch
> uses a rblist to sort and search the pid's task with a time complexity of O(log n),
> while my patch uses the array (same as the original design) dynamically created with
> a time complexity of O(1). For every simple, my patch does not need to traverse a
> list and has nothing to do with the total number of tasks. The maximum value of
> pid_max is PID_MAX_LIMIT, which equals to 4*1024*1024 in x86_64. Then for each simple,
> David's patch has to cost O(log 4*1024*1024) in searching a pid's task, vs O(1) of
> my patch. So I suggest to use array instead of rblist to solve this problem and save
> unnecessary waste of time.
> 
> However, if you finally decide to take David's patch rather than my patch, please ignore
> the following 3 patches:
> 
> [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
> [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the
> unexpected change of pid_max
> [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the
> different pid_max configurations
> 
> The other 6 patches in these patch sets still need to be applied to make other improvements
> and bug fixes.
> 

These bugs in 'perf sched replay' reproduce one after another in x86_64 (with many cores), and
really need urgent fix.

-- 
Thanks,
Yunlong Song


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] perf sched replay: Make some improvements and fixes
  2015-04-07  3:20 ` [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
@ 2015-04-07 13:53   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-07 13:53 UTC (permalink / raw)
  To: Yunlong Song, David Ahern
  Cc: a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

Em Tue, Apr 07, 2015 at 11:20:42AM +0800, Yunlong Song escreveu:
> On 2015/3/31 21:46, Yunlong Song wrote:
> > Hi,
> >   Found some functions to improve and bugs to fix in perf sched replay.
> > 
> > Yunlong Song (9):
> >   perf sched replay: Use struct task_desc instead of struct task_task
> >     for     correct meaning
> >   perf sched replay: Increase the MAX_PID value to fix assertion failure
> >         problem
> >   perf sched replay: Alloc the memory of pid_to_task dynamically to
> >     adapt     to the unexpected change of pid_max
> >   perf sched replay: Realloc the memory of pid_to_task stepwise to adapt
> >         to the different pid_max configurations
> >   perf sched replay: Fix the segmentation fault problem caused by pr_err
> >         in threads
> >   perf sched replay: Handle the dead halt of sem_wait when
> >     create_tasks()     fails for any task
> >   perf sched replay: Fix the EMFILE error caused by the limitation of
> >     the     maximum open files
> >   perf sched replay: Support using -f to override perf.data file
> >     ownership
> >   perf sched replay: Use replay_repeat to calculate the runavg of cpu   
> >      usage instead of the default value 10
> > 
> >  tools/perf/builtin-sched.c | 67 +++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 52 insertions(+), 15 deletions(-)
> > 
> 
> Ping...

All looks reasonable, applied.

David, please holler if you still have any concerns, or either we can
work from here, i.e. improving things with follow on patches.

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-04-07 13:23         ` Yunlong Song
@ 2015-04-07 15:02           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-07 15:02 UTC (permalink / raw)
  To: Yunlong Song
  Cc: David Ahern, a.p.zijlstra, paulus, mingo, linux-kernel, wangnan0

Em Tue, Apr 07, 2015 at 09:23:46PM +0800, Yunlong Song escreveu:
> > The other 6 patches in these patch sets still need to be applied to make other improvements
> > and bug fixes.
> > 
> 
> These bugs in 'perf sched replay' reproduce one after another in x86_64 (with many cores), and
> really need urgent fix.

I already applied those, thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  2015-03-31 13:46 ` [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Yunlong Song
@ 2015-04-07 16:49   ` David Ahern
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
  1 sibling, 0 replies; 35+ messages in thread
From: David Ahern @ 2015-04-07 16:49 UTC (permalink / raw)
  To: Yunlong Song, a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel, wangnan0

On 3/31/15 7:46 AM, Yunlong Song wrote:
> ---
>   tools/perf/builtin-sched.c | 31 ++++++++++++++++++++++++++-----
>   1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 3261300..5ab58c6 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c

...

> @@ -1700,6 +1720,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>   		    "be more verbose (show symbol address, etc)"),
>   	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
>   		    "dump raw trace in ASCII"),
> +	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
>   	OPT_END()
>   	};
>   	const struct option sched_options[] = {
>

Please update the documenation with this new option, 
tools/perf/Documentation/perf-sched.txt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning
  2015-03-31 13:46 ` [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Yunlong Song
@ 2015-04-08 15:11   ` tip-bot for Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, yunlong.song, wangnan0, paulus, linux-kernel, tglx,
	a.p.zijlstra, mingo, hpa

Commit-ID:  0755bc4dc77a876aa60d4b3d33b5f6506f21f91b
Gitweb:     http://git.kernel.org/tip/0755bc4dc77a876aa60d4b3d33b5f6506f21f91b
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:28 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:19 -0300

perf sched replay: Use struct task_desc instead of struct task_task for correct meaning

There is no struct task_task at all, thus it is a typo error in the old
commits, now fix it to what it should be in order to avoid unnecessary
misunderstanding.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3b3a5bb..a1893e8 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -346,7 +346,7 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 
 	sched->pid_to_task[pid] = task;
 	sched->nr_tasks++;
-	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
+	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_desc *));
 	BUG_ON(!sched->tasks);
 	sched->tasks[task->nr] = task;
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Increase the MAX_PID value to fix assertion failure problem
  2015-03-31 13:46 ` [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Yunlong Song
  2015-03-31 14:25   ` David Ahern
@ 2015-04-08 15:11   ` tip-bot for Yunlong Song
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: wangnan0, yunlong.song, acme, tglx, hpa, a.p.zijlstra,
	linux-kernel, paulus, mingo

Commit-ID:  a35e27d0e5d801ff75481a8f639bb4d59ea1aafa
Gitweb:     http://git.kernel.org/tip/a35e27d0e5d801ff75481a8f639bb4d59ea1aafa
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:29 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:21 -0300

perf sched replay: Increase the MAX_PID value to fix assertion failure problem

Current MAX_PID is only 65536, which will cause assertion failure problem
when CPU cores are more than 64 in x86_64.

This is because the pid_max value in x86_64 is at least
PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
include/linux/threads.h).

Thus for MAX_PID = 65536, the correspoinding CPU cores are
65536/1024=64.  This is obviously not enough at all for x86_64, and will
cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
codes.

We increase MAX_PID value from 65536 to 1024*1000, which can be used in
x86_64 with 1000 cores.

This number is finally decided according to the limitation of stack size
of calling process.

Use 'ulimit -a', the result shows the stack size of any process is 8192
Kbytes, which is defined in include/uapi/linux/resource.h (#define
_STK_LIM (8*1024*1024)).

Thus we choose a large enough value for MAX_PID, and make it satisfy to
the limitation of the stack size, i.e., making the perf process take up
a memory space just smaller than 8192 Kbytes.

We have calculated and tested that 1024*1000 is OK for MAX_PID.

This means perf sched replay can now be used with at most 1000 cores in
x86_64 without any assertion failure problem.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840

Before this patch:

 $ perf sched replay
 run measurement overhead: 240 nsecs
 sleep measurement overhead: 55379 nsecs
 the run test took 1000004 nsecs
 the sleep test took 1059424 nsecs
 perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
 failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55397 nsecs
 the run test took 999920 nsecs
 the sleep test took 1053313 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index a1893e8..c466104 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -28,7 +28,7 @@
 #define MAX_CPUS		4096
 #define COMM_LEN		20
 #define SYM_LEN			129
-#define MAX_PID			65536
+#define MAX_PID			1024000

 struct sched_atom;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
  2015-03-31 13:46 ` [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Yunlong Song
  2015-03-31 14:32   ` David Ahern
@ 2015-04-08 15:12   ` tip-bot for Yunlong Song
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, mingo, paulus, wangnan0, a.p.zijlstra, hpa,
	yunlong.song, tglx

Commit-ID:  cb06ac256a16fc1a5ab063107c2b35b3b9e95102
Gitweb:     http://git.kernel.org/tip/cb06ac256a16fc1a5ab063107c2b35b3b9e95102
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:30 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:22 -0300

perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max

The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:

Problem 1: If the pid_max, which is the max number of pids in the
system, is much smaller than MAX_PID (1024*1000), then it causes a waste
of stack memory. This may happen in the case where the number of cpu
cores is much smaller than 1000.

Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling
process.

Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ echo 1025000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 1025000

Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55480 nsecs
 the run test took 1000008 nsecs
 the sleep test took 1063151 nsecs
 perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
 failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55435 nsecs
 the run test took 1000004 nsecs
 the sleep test took 1059312 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c466104..20d887b 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -23,6 +23,7 @@
 #include <semaphore.h>
 #include <pthread.h>
 #include <math.h>
+#include <api/fs/fs.h>
 
 #define PR_SET_NAME		15               /* Set process name */
 #define MAX_CPUS		4096
@@ -124,7 +125,7 @@ struct perf_sched {
 	struct perf_tool tool;
 	const char	 *sort_order;
 	unsigned long	 nr_tasks;
-	struct task_desc *pid_to_task[MAX_PID];
+	struct task_desc **pid_to_task;
 	struct task_desc **tasks;
 	const struct trace_sched_handler *tp_handler;
 	pthread_mutex_t	 start_work_mutex;
@@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 				      unsigned long pid, const char *comm)
 {
 	struct task_desc *task;
+	static int pid_max;
 
-	BUG_ON(pid >= MAX_PID);
+	if (sched->pid_to_task == NULL) {
+		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
+			pid_max = MAX_PID;
+		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
+	}
+	BUG_ON(pid >= (unsigned long)pid_max);
 
 	task = sched->pid_to_task[pid];
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  2015-03-31 13:46 ` [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Yunlong Song
@ 2015-04-08 15:12   ` tip-bot for Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, a.p.zijlstra, hpa, acme, yunlong.song, wangnan0, mingo,
	paulus, linux-kernel

Commit-ID:  3a423a5c36d1a28a258beaa7db855568b82d07ab
Gitweb:     http://git.kernel.org/tip/3a423a5c36d1a28a258beaa7db855568b82d07ab
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:31 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:23 -0300

perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations

Although the memory of pid_to_task can be allocated via calloc according
to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
pid_max is changed after 'perf sched record' has created its perf.data.

If the new pid_max configured in 'perf sched replay' is smaller than the
old pid_max configured in 'perf sched record', then it will cause the
assertion failure problem.

To solve this problem, we realloc the memory of pid_to_task stepwise
once the passed-in pid parameter in register_pid is larger than the
current pid_max.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ perf sched record ls
 $ echo 5000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 5000

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55356 nsecs
 the run test took 1000011 nsecs
 the sleep test took 1060940 nsecs
 perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
 long)pid_max)' failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55611 nsecs
 the run test took 1000026 nsecs
 the sleep test took 1060486 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 20d887b..dd71481 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -334,7 +334,12 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 			pid_max = MAX_PID;
 		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
 	}
-	BUG_ON(pid >= (unsigned long)pid_max);
+	if (pid >= (unsigned long)pid_max) {
+		BUG_ON((sched->pid_to_task = realloc(sched->pid_to_task, (pid + 1) *
+			sizeof(struct task_desc *))) == NULL);
+		while (pid >= (unsigned long)pid_max)
+			sched->pid_to_task[pid_max++] = NULL;
+	}
 
 	task = sched->pid_to_task[pid];
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  2015-03-31 13:46 ` [PATCH 5/9] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Yunlong Song
@ 2015-04-08 15:12   ` tip-bot for Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: yunlong.song, mingo, acme, tglx, linux-kernel, a.p.zijlstra,
	paulus, wangnan0, hpa

Commit-ID:  08097abc11bcee21355dd857852a807b2a30b79f
Gitweb:     http://git.kernel.org/tip/08097abc11bcee21355dd857852a807b2a30b79f
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:32 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:24 -0300

perf sched replay: Fix the segmentation fault problem caused by pr_err in threads

The pr_err in self_open_counters() prints error message to stderr.
Unlike stdout, stderr uses memory buffer on the stack of each calling
process.

The pr_err in self_open_counters() works in a thread called thread_func
created in function create_tasks, which concurrently creates
sched->nr_tasks threads.

If the error happens and pr_err prints the error message in each of
these threads, the stack size of the perf process (default is 8192
kbytes) will quickly run out and the segmentation fault will happen
then.

To solve this problem, pr_err with self_open_counters() should be moved
from newly created threads to the old main thread of the perf process.
Then the pr_err can work in a stable situation without the strange
segmentation fault problem.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Segmentation fault

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ...

As shown above, the result continues without any segmentation fault.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index dd71481..7fe3b3c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -472,6 +472,7 @@ static u64 get_cpu_usage_nsec_self(int fd)
 struct sched_thread_parms {
 	struct task_desc  *task;
 	struct perf_sched *sched;
+	int fd;
 };
 
 static void *thread_func(void *ctx)
@@ -482,13 +483,12 @@ static void *thread_func(void *ctx)
 	u64 cpu_usage_0, cpu_usage_1;
 	unsigned long i, ret;
 	char comm2[22];
-	int fd;
+	int fd = parms->fd;
 
 	zfree(&parms);
 
 	sprintf(comm2, ":%s", this_task->comm);
 	prctl(PR_SET_NAME, comm2);
-	fd = self_open_counters();
 	if (fd < 0)
 		return NULL;
 again:
@@ -540,6 +540,7 @@ static void create_tasks(struct perf_sched *sched)
 		BUG_ON(parms == NULL);
 		parms->task = task = sched->tasks[i];
 		parms->sched = sched;
+		parms->fd = self_open_counters();
 		sem_init(&task->sleep_sem, 0, 0);
 		sem_init(&task->ready_for_work, 0, 0);
 		sem_init(&task->work_done_sem, 0, 0);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  2015-03-31 13:46 ` [PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Yunlong Song
@ 2015-04-08 15:13   ` tip-bot for Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, acme, wangnan0, yunlong.song, a.p.zijlstra, hpa,
	linux-kernel, paulus

Commit-ID:  1aff59be53ef37aa9943fb5f772f03148f789bb6
Gitweb:     http://git.kernel.org/tip/1aff59be53ef37aa9943fb5f772f03148f789bb6
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:33 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:25 -0300

perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task

Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).

The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.

For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.

As a result, all the tasks are required and their threads must be
successfully created.

If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.

And this is a dead halt, since task B's thread_func cannot continue at
all.

To solve this problem, perf sched replay should exit once any task fails
to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 ------------------------------------------------------------    <- dead halt

After this patch:

 $ perf sched replay
 ...
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 $

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7fe3b3c..3261300 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -451,10 +451,12 @@ static int self_open_counters(void)
 	fd = sys_perf_event_open(&attr, 0, -1, -1,
 				 perf_event_open_cloexec_flag());

-	if (fd < 0)
+	if (fd < 0) {
 		pr_err("Error: sys_perf_event_open() syscall returned "
 		       "with %d (%s)\n", fd,
 		       strerror_r(errno, sbuf, sizeof(sbuf)));
+		exit(EXIT_FAILURE);
+	}
 	return fd;
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  2015-03-31 13:46 ` [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Yunlong Song
  2015-04-07 16:49   ` David Ahern
@ 2015-04-08 15:13   ` tip-bot for Yunlong Song
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: yunlong.song, acme, tglx, mingo, paulus, a.p.zijlstra,
	linux-kernel, wangnan0, hpa

Commit-ID:  939cda521a24ae4dbf3beec983abd519bce56231
Gitweb:     http://git.kernel.org/tip/939cda521a24ae4dbf3beec983abd519bce56231
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:34 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:26 -0300

perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files

The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.

Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.

And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.

That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.

This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.

For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.

That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.

As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ cat /proc/sys/fs/file-max
 6815744
 $ ulimit -Sn
 1024
 $ ulimit -Hn
 4096

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 Have a try with -f option

 $ perf sched replay -f
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
 #2  : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
 #3  : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
 #4  : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
 #5  : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
 #6  : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
 #7  : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
 #8  : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
 #9  : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
 #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 3261300..5ab58c6 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -170,6 +170,7 @@ struct perf_sched {
 	u64		 cpu_last_switched[MAX_CPUS];
 	struct rb_root	 atom_root, sorted_atom_root;
 	struct list_head sort_list, cmp_pid;
+	bool force;
 };
 
 static u64 get_nsecs(void)
@@ -437,24 +438,43 @@ static u64 get_cpu_usage_nsec_parent(void)
 	return sum;
 }
 
-static int self_open_counters(void)
+static int self_open_counters(struct perf_sched *sched, unsigned long cur_task)
 {
 	struct perf_event_attr attr;
-	char sbuf[STRERR_BUFSIZE];
+	char sbuf[STRERR_BUFSIZE], info[STRERR_BUFSIZE];
 	int fd;
+	struct rlimit limit;
+	bool need_privilege = false;
 
 	memset(&attr, 0, sizeof(attr));
 
 	attr.type = PERF_TYPE_SOFTWARE;
 	attr.config = PERF_COUNT_SW_TASK_CLOCK;
 
+force_again:
 	fd = sys_perf_event_open(&attr, 0, -1, -1,
 				 perf_event_open_cloexec_flag());
 
 	if (fd < 0) {
+		if (errno == EMFILE) {
+			if (sched->force) {
+				BUG_ON(getrlimit(RLIMIT_NOFILE, &limit) == -1);
+				limit.rlim_cur += sched->nr_tasks - cur_task;
+				if (limit.rlim_cur > limit.rlim_max) {
+					limit.rlim_max = limit.rlim_cur;
+					need_privilege = true;
+				}
+				if (setrlimit(RLIMIT_NOFILE, &limit) == -1) {
+					if (need_privilege && errno == EPERM)
+						strcpy(info, "Need privilege\n");
+				} else
+					goto force_again;
+			} else
+				strcpy(info, "Have a try with -f option\n");
+		}
 		pr_err("Error: sys_perf_event_open() syscall returned "
-		       "with %d (%s)\n", fd,
-		       strerror_r(errno, sbuf, sizeof(sbuf)));
+		       "with %d (%s)\n%s", fd,
+		       strerror_r(errno, sbuf, sizeof(sbuf)), info);
 		exit(EXIT_FAILURE);
 	}
 	return fd;
@@ -542,7 +562,7 @@ static void create_tasks(struct perf_sched *sched)
 		BUG_ON(parms == NULL);
 		parms->task = task = sched->tasks[i];
 		parms->sched = sched;
-		parms->fd = self_open_counters();
+		parms->fd = self_open_counters(sched, i);
 		sem_init(&task->sleep_sem, 0, 0);
 		sem_init(&task->ready_for_work, 0, 0);
 		sem_init(&task->work_done_sem, 0, 0);
@@ -1700,6 +1720,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
 	OPT_END()
 	};
 	const struct option sched_options[] = {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Support using -f to override perf.data file ownership
  2015-03-31 13:46 ` [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership Yunlong Song
@ 2015-04-08 15:13   ` tip-bot for Yunlong Song
  2015-09-21 13:54   ` [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer Yunlong Song
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.p.zijlstra, mingo, linux-kernel, yunlong.song, tglx, hpa,
	wangnan0, paulus, acme

Commit-ID:  f0dd330fdf07d295ac468660cf60341796d5d501
Gitweb:     http://git.kernel.org/tip/f0dd330fdf07d295ac468660cf60341796d5d501
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:35 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:26 -0300

perf sched replay: Support using -f to override perf.data file ownership

Enable to use perf.data when it is not owned by current user or root.

Example:

 $ ls -al perf.data
 -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
 $ sudo id
 uid=0(root) gid=0(root) groups=0(root),64(pkcs11)

Before this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 98 nsecs
 sleep measurement overhead: 52909 nsecs
 the run test took 1000015 nsecs
 the sleep test took 1054253 nsecs
 File perf.data not owned by current user or root (use -f to override)

As shown above, the -f option does not work at all.

After this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 40514 nsecs
 the run test took 1000003 nsecs
 the sleep test took 1056098 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 ...
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
 #2  : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
 #3  : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
 #4  : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
 #5  : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
 #6  : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
 #7  : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
 #8  : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
 #9  : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
 #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45

As shown above, the -f option really works now.

Besides for replay, -f option can also work for latency and map.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 5ab58c6..7b7b798 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1487,6 +1487,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
 	struct perf_data_file file = {
 		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
+		.force = sched->force,
 	};
 	int rc = -1;
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip:perf/core] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  2015-03-31 13:46 ` [PATCH 9/9] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Yunlong Song
@ 2015-04-08 15:13   ` tip-bot for Yunlong Song
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot for Yunlong Song @ 2015-04-08 15:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, wangnan0, paulus, a.p.zijlstra, yunlong.song,
	linux-kernel, tglx, acme, hpa

Commit-ID:  ff5f3bbd40bfb8632f826f1f83223d95363f36af
Gitweb:     http://git.kernel.org/tip/ff5f3bbd40bfb8632f826f1f83223d95363f36af
Author:     Yunlong Song <yunlong.song@huawei.com>
AuthorDate: Tue, 31 Mar 2015 21:46:36 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:27 -0300

perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10

Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value.

However, the replay_repeat can be changed to other value by using -r
option, so the calculation above should use replay_repeat to achieve
more accurate results instead of the default value 10.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7b7b798..5275bab 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -607,13 +607,13 @@ static void wait_for_tasks(struct perf_sched *sched)
 	cpu_usage_1 = get_cpu_usage_nsec_parent();
 	if (!sched->runavg_cpu_usage)
 		sched->runavg_cpu_usage = sched->cpu_usage;
-	sched->runavg_cpu_usage = (sched->runavg_cpu_usage * 9 + sched->cpu_usage) / 10;
+	sched->runavg_cpu_usage = (sched->runavg_cpu_usage * (sched->replay_repeat - 1) + sched->cpu_usage) / sched->replay_repeat;
 
 	sched->parent_cpu_usage = cpu_usage_1 - cpu_usage_0;
 	if (!sched->runavg_parent_cpu_usage)
 		sched->runavg_parent_cpu_usage = sched->parent_cpu_usage;
-	sched->runavg_parent_cpu_usage = (sched->runavg_parent_cpu_usage * 9 +
-					 sched->parent_cpu_usage)/10;
+	sched->runavg_parent_cpu_usage = (sched->runavg_parent_cpu_usage * (sched->replay_repeat - 1) +
+					 sched->parent_cpu_usage)/sched->replay_repeat;
 
 	ret = pthread_mutex_lock(&sched->start_work_mutex);
 	BUG_ON(ret);
@@ -645,7 +645,7 @@ static void run_one_test(struct perf_sched *sched)
 	sched->sum_fluct += fluct;
 	if (!sched->run_avg)
 		sched->run_avg = delta;
-	sched->run_avg = (sched->run_avg * 9 + delta) / 10;
+	sched->run_avg = (sched->run_avg * (sched->replay_repeat - 1) + delta) / sched->replay_repeat;
 
 	printf("#%-3ld: %0.3f, ", sched->nr_runs, (double)delta / 1000000.0);
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer
  2015-03-31 13:46 ` [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership Yunlong Song
  2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
@ 2015-09-21 13:54   ` Yunlong Song
  2015-09-21 15:12     ` Borislav Petkov
  1 sibling, 1 reply; 35+ messages in thread
From: Yunlong Song @ 2015-09-21 13:54 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme, rostedt; +Cc: linux-kernel, wangnan0

[Problem Background]

We want to run perf in daemon mode and collect the traces when the exception
(e.g., machine crashes, app performance goes down) appears. Perf may run for a
long time (from days to weeks or even months), since we do not know when the
exception will appear at all, however it will appear at some time (especially
for a beta product). If we simply use “perf record” as usual, here come two
problems as time goes by: 1 there will be amounts of IOs created for writing
perf.data which may affects the performance a lot; 2 the size of perf.data will
be larger and larger as well. Although we can use eBPF to reduce the traces in
normal case, but in our case, the perf runs in daemon mode for a long time and
that will accumulate the traces as time goes by.

[One Solution]

In fact, we only need to collect the sample info which is created for a while
just before the exception appears. We do not care about the other sample info in
other time. So perhaps we have to change the current way how perf makes its
perf.data as follows:
 1 Let perf allocate a user space ring buffer in a reasonable size, which is big
 enough to store all the tracing info we care about (for a while) before the
 exception appears;
 2 Dump the sample info to the user space ring buffer, the size of user space
 ring buffer is a constant value, so the newer sample info will replace the older
 sample info;
 3 After some kind of trigger (maybe via eBPF event, signal or socket
 communication) which is caused by the exception situation, the user space ring
 buffer should dump all its tracing info to perf.data.sample.TIME#

[Use Style]

We can add an option (such as “-M size” or “--memory size”) to define the
size of the user space ring buffer and active the user space ring buffer mode
described above. For convenience, we can add --daemon to make perf run as a
daemon.
# perf record -M size -e bpf.o -e cycles -g -F 100 -a sleep 1000000
Or
# perf record -M size -e bpf.o -e cycles -g -F 100 -a --daemon

When the exception appears, it sends a signal (may also use eBPF event or socket
communication) to perf
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1

When the 2nd exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2

......

When the nth exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 … perf.data.sample.TIMEn

We can user perf report or perf script to analyze each perf.data.sample.TIME#

Or finally, we can kill perf and combine perf.data.auxiliary with all the
perf.data.sample.TIME# to create all-in-one perf.data
# kill --SIGUSR2 1234
# ls
perf.data

[To Do]

If the idea mentioned above is OK, we want to realize it in the following steps:
1 Develop perf’s user space ring buffer, which can make newer sample info
replace older sample info.
2 Classify the tracing info into two kinds, one kind is just sample event, we
only need some of them which are created (for a while) just before the exception
appears, we can call the first kind of tracing info as Optional tracing info,
and perf should dump this info to the user space ring buffer; the second kind is
the tracing info which are required to analyze the sample events, such as
mmap_event to show the dso’s related info, we can call this second kind of
tracing info as Auxiliary tracing info, and perf should dump this info into
perf.data.auxiliary or just directly into perf.data as before.
3 Develop a trigger for perf, which can activate perf to dump its user space
ring buffer to perf.data.sample.TIME#, or just appends them into perf.data. The
trigger may have three interfaces, eBPF event, signal and socket communication.
4 Make perf report or perf script etc, have the ability to analyze the
perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
combined from perf.data.auxiliary and all the perf.data.sample.TIME#
5 For daemon mode, we should also let perf support its running in backend all
the time and its ending from a trigger.

[Conclusion]

In fact, we realize a mechanism to make perf’s tracing more refined and
efficient. We regard the size of perf.data and the cost of writing perf.data as
an expensive resource, which should be used in a more careful and
just-for-the-exception target way. This mechanism can be used both in daemon way
or in non-daemon way. This idea can be another way to filter the tracing events
compared to eBPF.

Thanks,
------
Yunlong Song

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer
  2015-09-21 13:54   ` [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer Yunlong Song
@ 2015-09-21 15:12     ` Borislav Petkov
  0 siblings, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2015-09-21 15:12 UTC (permalink / raw)
  To: Yunlong Song
  Cc: a.p.zijlstra, paulus, mingo, acme, rostedt, linux-kernel,
	wangnan0, Robert Richter

On Mon, Sep 21, 2015 at 09:54:39PM +0800, Yunlong Song wrote:
> [Problem Background]
> 
> We want to run perf in daemon mode and collect the traces when the exception
> (e.g., machine crashes, app performance goes down) appears. Perf may run for a
> long time (from days to weeks or even months), since we do not know when the
> exception will appear at all, however it will appear at some time (especially
> for a beta product). If we simply use “perf record” as usual, here come two

We do have patches to add perf persistent events which can run for
longer than the profiling session. We wanted to use those for RAS:

http://lwn.net/Articles/593655/

We just need to get them upstream. I guess due to lack of time and other, more
important issues, we get preempted each time ... :-\

(Leaving in the rest for reference).

> problems as time goes by: 1 there will be amounts of IOs created for writing
> perf.data which may affects the performance a lot; 2 the size of perf.data will
> be larger and larger as well. Although we can use eBPF to reduce the traces in
> normal case, but in our case, the perf runs in daemon mode for a long time and
> that will accumulate the traces as time goes by.
> 
> 
> [One Solution]
> 
> In fact, we only need to collect the sample info which is created for a while
> just before the exception appears. We do not care about the other sample info in
> other time. So perhaps we have to change the current way how perf makes its
> perf.data as follows:
>  1 Let perf allocate a user space ring buffer in a reasonable size, which is big
>  enough to store all the tracing info we care about (for a while) before the
>  exception appears;
>  2 Dump the sample info to the user space ring buffer, the size of user space
>  ring buffer is a constant value, so the newer sample info will replace the older
>  sample info;
>  3 After some kind of trigger (maybe via eBPF event, signal or socket
>  communication) which is caused by the exception situation, the user space ring
>  buffer should dump all its tracing info to perf.data.sample.TIME#
> 
> 
> [Use Style]
> 	
> We can add an option (such as “-M size” or “--memory size”) to define the
> size of the user space ring buffer and active the user space ring buffer mode
> described above. For convenience, we can add --daemon to make perf run as a
> daemon.
> # perf record -M size -e bpf.o -e cycles -g -F 100 -a sleep 1000000
> Or
> # perf record -M size -e bpf.o -e cycles -g -F 100 -a --daemon
> 
> When the exception appears, it sends a signal (may also use eBPF event or socket
> communication) to perf
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1
> 
> When the 2nd exception appears
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2
> 
> ......
> 
> When the nth exception appears
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 … perf.data.sample.TIMEn
> 
> We can user perf report or perf script to analyze each perf.data.sample.TIME#
> 
> Or finally, we can kill perf and combine perf.data.auxiliary with all the
> perf.data.sample.TIME# to create all-in-one perf.data
> # kill --SIGUSR2 1234
> # ls
> perf.data
> 
> 
> [To Do]
> 
> If the idea mentioned above is OK, we want to realize it in the following steps:
> 1 Develop perf’s user space ring buffer, which can make newer sample info
> replace older sample info.
> 2 Classify the tracing info into two kinds, one kind is just sample event, we
> only need some of them which are created (for a while) just before the exception
> appears, we can call the first kind of tracing info as Optional tracing info,
> and perf should dump this info to the user space ring buffer; the second kind is
> the tracing info which are required to analyze the sample events, such as
> mmap_event to show the dso’s related info, we can call this second kind of
> tracing info as Auxiliary tracing info, and perf should dump this info into
> perf.data.auxiliary or just directly into perf.data as before.
> 3 Develop a trigger for perf, which can activate perf to dump its user space
> ring buffer to perf.data.sample.TIME#, or just appends them into perf.data. The
> trigger may have three interfaces, eBPF event, signal and socket communication.
> 4 Make perf report or perf script etc, have the ability to analyze the
> perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
> combined from perf.data.auxiliary and all the perf.data.sample.TIME#
> 5 For daemon mode, we should also let perf support its running in backend all
> the time and its ending from a trigger.
> 
> 
> [Conclusion]
> 
> In fact, we realize a mechanism to make perf’s tracing more refined and
> efficient. We regard the size of perf.data and the cost of writing perf.data as
> an expensive resource, which should be used in a more careful and
> just-for-the-exception target way. This mechanism can be used both in daemon way
> or in non-daemon way. This idea can be another way to filter the tracing events
> compared to eBPF.
> 
> Thanks,
> ------
> Yunlong Song
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-09-21 15:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 13:46 [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
2015-03-31 13:46 ` [PATCH 1/9] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Yunlong Song
2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 2/9] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Yunlong Song
2015-03-31 14:25   ` David Ahern
2015-04-01  7:10     ` Yunlong Song
2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 3/9] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Yunlong Song
2015-03-31 14:32   ` David Ahern
2015-03-31 15:56     ` Arnaldo Carvalho de Melo
2015-04-01  7:06       ` Yunlong Song
2015-04-07 13:23         ` Yunlong Song
2015-04-07 15:02           ` Arnaldo Carvalho de Melo
2015-03-31 20:25     ` Arnaldo Carvalho de Melo
2015-03-31 22:26       ` David Ahern
2015-03-31 22:35         ` Arnaldo Carvalho de Melo
2015-04-01  7:23     ` Yunlong Song
2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 4/9] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Yunlong Song
2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 5/9] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Yunlong Song
2015-04-08 15:12   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 6/9] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Yunlong Song
2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 7/9] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Yunlong Song
2015-04-07 16:49   ` David Ahern
2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-03-31 13:46 ` [PATCH 8/9] perf sched replay: Support using -f to override perf.data file ownership Yunlong Song
2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-09-21 13:54   ` [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer Yunlong Song
2015-09-21 15:12     ` Borislav Petkov
2015-03-31 13:46 ` [PATCH 9/9] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Yunlong Song
2015-04-08 15:13   ` [tip:perf/core] " tip-bot for Yunlong Song
2015-04-07  3:20 ` [PATCH 0/9] perf sched replay: Make some improvements and fixes Yunlong Song
2015-04-07 13:53   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).