From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754532AbbDHPMj (ORCPT ); Wed, 8 Apr 2015 11:12:39 -0400 Received: from terminus.zytor.com ([198.137.202.10]:40271 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932268AbbDHPMf (ORCPT ); Wed, 8 Apr 2015 11:12:35 -0400 Date: Wed, 8 Apr 2015 08:12:11 -0700 From: tip-bot for Yunlong Song Message-ID: Cc: acme@redhat.com, linux-kernel@vger.kernel.org, mingo@kernel.org, paulus@samba.org, wangnan0@huawei.com, a.p.zijlstra@chello.nl, hpa@zytor.com, yunlong.song@huawei.com, tglx@linutronix.de Reply-To: hpa@zytor.com, yunlong.song@huawei.com, tglx@linutronix.de, a.p.zijlstra@chello.nl, mingo@kernel.org, linux-kernel@vger.kernel.org, paulus@samba.org, wangnan0@huawei.com, acme@redhat.com In-Reply-To: <1427809596-29559-4-git-send-email-yunlong.song@huawei.com> References: <1427809596-29559-4-git-send-email-yunlong.song@huawei.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:perf/core] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max Git-Commit-ID: cb06ac256a16fc1a5ab063107c2b35b3b9e95102 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: cb06ac256a16fc1a5ab063107c2b35b3b9e95102 Gitweb: http://git.kernel.org/tip/cb06ac256a16fc1a5ab063107c2b35b3b9e95102 Author: Yunlong Song AuthorDate: Tue, 31 Mar 2015 21:46:30 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 8 Apr 2015 09:07:22 -0300 perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max The current memory allocation of struct task_desc *pid_to_task[MAX_PID] is in a permanent and preset way, and it has two problems: Problem 1: If the pid_max, which is the max number of pids in the system, is much smaller than MAX_PID (1024*1000), then it causes a waste of stack memory. This may happen in the case where the number of cpu cores is much smaller than 1000. Problem 2: If the pid_max is changed from the default value to a value larger than MAX_PID, then it will cause assertion failure problem. The maximum value of pid_max can be set to pid_max_max (see pidmap_init defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64, PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This value is much larger than MAX_PID, and will take up 32768 Kbytes (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much larger than the default 8192 Kbytes of the stack size of calling process. Due to these two problems, we use calloc to allocate the memory of pid_to_task dynamically. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ echo 1025000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 1025000 Run some applications until the pid of some process is greater than the value of MAX_PID (1024*1000). Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55480 nsecs the run test took 1000008 nsecs the sleep test took 1063151 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55435 nsecs the run test took 1000004 nsecs the sleep test took 1059312 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Wang Nan Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-sched.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index c466104..20d887b 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -23,6 +23,7 @@ #include #include #include +#include #define PR_SET_NAME 15 /* Set process name */ #define MAX_CPUS 4096 @@ -124,7 +125,7 @@ struct perf_sched { struct perf_tool tool; const char *sort_order; unsigned long nr_tasks; - struct task_desc *pid_to_task[MAX_PID]; + struct task_desc **pid_to_task; struct task_desc **tasks; const struct trace_sched_handler *tp_handler; pthread_mutex_t start_work_mutex; @@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched, unsigned long pid, const char *comm) { struct task_desc *task; + static int pid_max; - BUG_ON(pid >= MAX_PID); + if (sched->pid_to_task == NULL) { + if (sysctl__read_int("kernel/pid_max", &pid_max) < 0) + pid_max = MAX_PID; + BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL); + } + BUG_ON(pid >= (unsigned long)pid_max); task = sched->pid_to_task[pid];