* [patch 2.6.0-test1] per cpu times @ 2003-07-18 16:35 Erich Focht 2003-07-18 18:18 ` [Lse-tech] " Mike Kravetz 0 siblings, 1 reply; 7+ messages in thread From: Erich Focht @ 2003-07-18 16:35 UTC (permalink / raw) To: LSE, linux-kernel [-- Attachment #1: Type: text/plain, Size: 236 bytes --] This patch brings back the per CPU user & system times which one was used to see in /proc/PID/cpu with 2.4 kernels. Useful for SMP and NUMA scheduler development, needed for reasonable output in numabench / numa_test. Regards, Erich [-- Attachment #2: cputimes_stat-2.6.0t1.patch --] [-- Type: text/x-diff, Size: 3786 bytes --] diff -urN 2.6.0-test1-ia64-0/fs/proc/array.c 2.6.0-test1-ia64-na/fs/proc/array.c --- 2.6.0-test1-ia64-0/fs/proc/array.c 2003-07-14 05:35:12.000000000 +0200 +++ 2.6.0-test1-ia64-na/fs/proc/array.c 2003-07-18 13:38:02.000000000 +0200 @@ -405,3 +405,26 @@ return sprintf(buffer,"%d %d %d %d %d %d %d\n", size, resident, shared, text, lib, data, 0); } + +#ifdef CONFIG_SMP +int proc_pid_cpu(struct task_struct *task, char * buffer) +{ + int i, len; + + len = sprintf(buffer, + "cpu %lu %lu\n", + jiffies_to_clock_t(task->utime), + jiffies_to_clock_t(task->stime)); + + for (i = 0 ; i < NR_CPUS; i++) { + if (cpu_online(i)) + len += sprintf(buffer + len, "cpu%d %lu %lu\n", + i, + jiffies_to_clock_t(task->per_cpu_utime[i]), + jiffies_to_clock_t(task->per_cpu_stime[i])); + + } + len += sprintf(buffer + len, "current_cpu %d\n",task_cpu(task)); + return len; +} +#endif diff -urN 2.6.0-test1-ia64-0/fs/proc/base.c 2.6.0-test1-ia64-na/fs/proc/base.c --- 2.6.0-test1-ia64-0/fs/proc/base.c 2003-07-14 05:35:15.000000000 +0200 +++ 2.6.0-test1-ia64-na/fs/proc/base.c 2003-07-18 13:38:02.000000000 +0200 @@ -56,6 +56,7 @@ PROC_PID_STAT, PROC_PID_STATM, PROC_PID_MAPS, + PROC_PID_CPU, PROC_PID_MOUNTS, PROC_PID_WCHAN, #ifdef CONFIG_SECURITY @@ -83,6 +84,9 @@ E(PROC_PID_CMDLINE, "cmdline", S_IFREG|S_IRUGO), E(PROC_PID_STAT, "stat", S_IFREG|S_IRUGO), E(PROC_PID_STATM, "statm", S_IFREG|S_IRUGO), +#ifdef CONFIG_SMP + E(PROC_PID_CPU, "cpu", S_IFREG|S_IRUGO), +#endif E(PROC_PID_MAPS, "maps", S_IFREG|S_IRUGO), E(PROC_PID_MEM, "mem", S_IFREG|S_IRUSR|S_IWUSR), E(PROC_PID_CWD, "cwd", S_IFLNK|S_IRWXUGO), @@ -1170,6 +1174,12 @@ inode->i_fop = &proc_info_file_operations; ei->op.proc_read = proc_pid_stat; break; +#ifdef CONFIG_SMP + case PROC_PID_CPU: + inode->i_fop = &proc_info_file_operations; + ei->op.proc_read = proc_pid_cpu; + break; +#endif case PROC_PID_CMDLINE: inode->i_fop = &proc_info_file_operations; ei->op.proc_read = proc_pid_cmdline; diff -urN 2.6.0-test1-ia64-0/include/linux/sched.h 2.6.0-test1-ia64-na/include/linux/sched.h --- 2.6.0-test1-ia64-0/include/linux/sched.h 2003-07-14 05:30:40.000000000 +0200 +++ 2.6.0-test1-ia64-na/include/linux/sched.h 2003-07-18 13:38:02.000000000 +0200 @@ -390,6 +390,9 @@ struct list_head posix_timers; /* POSIX.1b Interval Timers */ unsigned long utime, stime, cutime, cstime; u64 start_time; +#ifdef CONFIG_SMP + long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS]; +#endif /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */ unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap; /* process credentials */ diff -urN 2.6.0-test1-ia64-0/kernel/fork.c 2.6.0-test1-ia64-na/kernel/fork.c --- 2.6.0-test1-ia64-0/kernel/fork.c 2003-07-14 05:30:39.000000000 +0200 +++ 2.6.0-test1-ia64-na/kernel/fork.c 2003-07-18 13:38:02.000000000 +0200 @@ -861,6 +861,14 @@ p->tty_old_pgrp = 0; p->utime = p->stime = 0; p->cutime = p->cstime = 0; +#ifdef CONFIG_SMP + { + int i; + + for(i = 0; i < NR_CPUS; i++) + p->per_cpu_utime[i] = p->per_cpu_stime[i] = 0; + } +#endif p->array = NULL; p->lock_depth = -1; /* -1 = no lock */ p->start_time = get_jiffies_64(); diff -urN 2.6.0-test1-ia64-0/kernel/timer.c 2.6.0-test1-ia64-na/kernel/timer.c --- 2.6.0-test1-ia64-0/kernel/timer.c 2003-07-14 05:37:22.000000000 +0200 +++ 2.6.0-test1-ia64-na/kernel/timer.c 2003-07-18 13:38:02.000000000 +0200 @@ -720,6 +720,10 @@ void update_one_process(struct task_struct *p, unsigned long user, unsigned long system, int cpu) { +#ifdef CONFIG_SMP + p->per_cpu_utime[cpu] += user; + p->per_cpu_stime[cpu] += system; +#endif do_process_times(p, user, system); do_it_virt(p, user); do_it_prof(p); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-18 16:35 [patch 2.6.0-test1] per cpu times Erich Focht @ 2003-07-18 18:18 ` Mike Kravetz 2003-07-18 19:57 ` William Lee Irwin III ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Mike Kravetz @ 2003-07-18 18:18 UTC (permalink / raw) To: Erich Focht; +Cc: LSE, linux-kernel On Fri, Jul 18, 2003 at 06:35:42PM +0200, Erich Focht wrote: > > This patch brings back the per CPU user & system times which one was > used to see in /proc/PID/cpu with 2.4 kernels. Useful for SMP and NUMA > scheduler development, needed for reasonable output in numabench / > numa_test. > On a somewhat related note ... We (Big Blue) have a performance reporting application that would like to know how long a task sits on a runqueue before it is actually given the CPU. In other words, it wants to know how long the 'runnable task' was delayed due to contention for the CPU(s). Of course, one could get an overall feel for this based on total runqueue length. However, this app would really like this info on a per-task basis. Does anyone else think this type of info would be useful? A patch to compute/export this info should be straight forward to implement. -- Mike ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-18 18:18 ` [Lse-tech] " Mike Kravetz @ 2003-07-18 19:57 ` William Lee Irwin III 2003-07-18 20:53 ` Rick Lindsley 2003-07-21 4:47 ` Peter Chubb 2003-07-23 21:50 ` bill davidsen 2 siblings, 1 reply; 7+ messages in thread From: William Lee Irwin III @ 2003-07-18 19:57 UTC (permalink / raw) To: Mike Kravetz; +Cc: Erich Focht, LSE, linux-kernel On Fri, Jul 18, 2003 at 11:18:50AM -0700, Mike Kravetz wrote: > On a somewhat related note ... > We (Big Blue) have a performance reporting application that > would like to know how long a task sits on a runqueue before > it is actually given the CPU. In other words, it wants to > know how long the 'runnable task' was delayed due to contention > for the CPU(s). Of course, one could get an overall feel for > this based on total runqueue length. However, this app would > really like this info on a per-task basis. > Does anyone else think this type of info would be useful? > A patch to compute/export this info should be straight forward > to implement. I wrote something to collect the standard queueing statistics a while back but am not sure what I did with it. I think Rick Lindsley might still have a copy around. -- wli ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-18 19:57 ` William Lee Irwin III @ 2003-07-18 20:53 ` Rick Lindsley 0 siblings, 0 replies; 7+ messages in thread From: Rick Lindsley @ 2003-07-18 20:53 UTC (permalink / raw) To: William Lee Irwin III, Mike Kravetz, Erich Focht, LSE, linux-kernel I wrote something to collect the standard queueing statistics a while back but am not sure what I did with it. I think Rick Lindsley might still have a copy around. Actually, better than that -- it's in the mjb tree. But I'll repost it here for those who might find it generally useful. I've regenerated the patch below for 2.6.0-test1, but while it applies and compiles cleanly, I haven't really exercised it to any extent yet in 2.6. The code adds extra fields to the stat field for each process and each cpu. Note that per-process info is only available while that process is alive. Once a process exits, the specific information about it is lost. The extra fields for the cpu info in /proc decay over time instead of accumulating, so they are more of a "wait average" (similar to a load average) rather than a sum of wait times. Rick diff -rup linux-2.6.0-test1/fs/proc/array.c linux-2.6.0-qs/fs/proc/array.c --- linux-2.6.0-test1/fs/proc/array.c Sun Jul 13 20:35:12 2003 +++ linux-2.6.0-qs/fs/proc/array.c Fri Jul 18 15:51:33 2003 @@ -336,7 +336,7 @@ int proc_pid_stat(struct task_struct *ta read_unlock(&tasklist_lock); res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \ %lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %llu %lu %ld %lu %lu %lu %lu %lu \ -%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu\n", +%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %lu %lu %lu\n", task->pid, task->comm, state, @@ -382,7 +382,10 @@ int proc_pid_stat(struct task_struct *ta task->exit_signal, task_cpu(task), task->rt_priority, - task->policy); + task->policy, + jiffies_to_clock_t(task->sched_info.inter_arrival_time), + jiffies_to_clock_t(task->sched_info.service_time), + jiffies_to_clock_t(task->sched_info.response_time)); if(mm) mmput(mm); return res; diff -rup linux-2.6.0-test1/fs/proc/proc_misc.c linux-2.6.0-qs/fs/proc/proc_misc.c --- linux-2.6.0-test1/fs/proc/proc_misc.c Sun Jul 13 20:30:43 2003 +++ linux-2.6.0-qs/fs/proc/proc_misc.c Fri Jul 18 15:51:33 2003 @@ -401,14 +401,20 @@ static int kstat_read_proc(char *page, c jiffies_to_clock_t(idle), jiffies_to_clock_t(iowait)); for (i = 0 ; i < NR_CPUS; i++){ - if (!cpu_online(i)) continue; - len += sprintf(page + len, "cpu%d %u %u %u %u %u\n", + struct sched_info info; + if (!cpu_online(i)) + continue; + cpu_sched_info(&info, i); + len += sprintf(page + len, "cpu%d %u %u %u %u %u %u %u %u\n", i, jiffies_to_clock_t(kstat_cpu(i).cpustat.user), jiffies_to_clock_t(kstat_cpu(i).cpustat.nice), jiffies_to_clock_t(kstat_cpu(i).cpustat.system), jiffies_to_clock_t(kstat_cpu(i).cpustat.idle), - jiffies_to_clock_t(kstat_cpu(i).cpustat.iowait)); + jiffies_to_clock_t(kstat_cpu(i).cpustat.iowait), + (uint) jiffies_to_clock_t(info.inter_arrival_time), + (uint) jiffies_to_clock_t(info.service_time), + (uint) jiffies_to_clock_t(info.response_time)); } len += sprintf(page + len, "intr %u", sum); diff -rup linux-2.6.0-test1/include/linux/sched.h linux-2.6.0-qs/include/linux/sched.h --- linux-2.6.0-test1/include/linux/sched.h Sun Jul 13 20:30:40 2003 +++ linux-2.6.0-qs/include/linux/sched.h Fri Jul 18 15:51:33 2003 @@ -94,6 +94,9 @@ extern unsigned long nr_running(void); extern unsigned long nr_uninterruptible(void); extern unsigned long nr_iowait(void); +struct sched_info; +extern void cpu_sched_info(struct sched_info *, int); + #include <linux/time.h> #include <linux/param.h> #include <linux/resource.h> @@ -320,6 +323,13 @@ struct k_itimer { struct sigqueue *sigq; /* signal queue entry. */ }; +struct sched_info { + /* running averages */ + unsigned long response_time, inter_arrival_time, service_time; + + /* timestamps */ + unsigned long last_arrival, began_service; +}; struct io_context; /* See blkdev.h */ void exit_io_context(void); @@ -344,6 +354,8 @@ struct task_struct { unsigned long cpus_allowed; unsigned int time_slice, first_time_slice; + struct sched_info sched_info; + struct list_head tasks; struct list_head ptrace_children; struct list_head ptrace_list; diff -rup linux-2.6.0-test1/kernel/sched.c linux-2.6.0-qs/kernel/sched.c --- linux-2.6.0-test1/kernel/sched.c Sun Jul 13 20:37:14 2003 +++ linux-2.6.0-qs/kernel/sched.c Fri Jul 18 15:52:26 2003 @@ -59,6 +59,11 @@ #define TASK_USER_PRIO(p) USER_PRIO((p)->static_prio) #define MAX_USER_PRIO (USER_PRIO(MAX_PRIO)) +/* the FIXED_1 gunk is so running averages don't vanish prematurely */ +#define RAVG_WEIGHT 128 +#define RAVG_FACTOR (RAVG_WEIGHT*FIXED_1) +#define RUNNING_AVG(x,y) (((RAVG_WEIGHT-1)*(x)+RAVG_FACTOR*(y))/RAVG_WEIGHT) + /* * These are the 'tuning knobs' of the scheduler: * @@ -171,6 +176,8 @@ struct runqueue { struct list_head migration_queue; atomic_t nr_iowait; + + struct sched_info info; }; static DEFINE_PER_CPU(struct runqueue, runqueues); @@ -279,6 +286,74 @@ static inline void rq_unlock(runqueue_t spin_unlock_irq(&rq->lock); } +static inline void sched_info_arrive(task_t *t) +{ + unsigned long now = jiffies; + unsigned long diff = now - t->sched_info.last_arrival; + struct runqueue *rq = task_rq(t); + + t->sched_info.inter_arrival_time = + RUNNING_AVG(t->sched_info.inter_arrival_time, diff); + t->sched_info.last_arrival = now; + + if (!rq) + return; + diff = now - rq->info.last_arrival; + rq->info.inter_arrival_time = + RUNNING_AVG(rq->info.inter_arrival_time, diff); + rq->info.last_arrival = now; +} + +/* is this ever used? */ +static inline void sched_info_depart(task_t *t) +{ + struct runqueue *rq = task_rq(t); + unsigned long diff, now = jiffies; + + diff = now - t->sched_info.began_service; + t->sched_info.service_time = + RUNNING_AVG(t->sched_info.service_time, diff); + + if (!rq) + return; + diff = now - rq->info.began_service; + rq->info.service_time = + RUNNING_AVG(rq->info.service_time, diff); +} + +static inline void sched_info_switch(task_t *prev, task_t *next) +{ + struct runqueue *rq = task_rq(prev); + unsigned long diff, now = jiffies; + + /* prev now departs the cpu */ + sched_info_depart(prev); + + /* only for involuntary context switches */ + if (prev->state == TASK_RUNNING) + sched_info_arrive(prev); + + diff = now - next->sched_info.last_arrival; + next->sched_info.response_time = + RUNNING_AVG(next->sched_info.response_time, diff); + next->sched_info.began_service = now; + + if (!rq) + return; + /* yes, reusing next's service time is valid */ + rq->info.response_time = + RUNNING_AVG(rq->info.response_time, diff); + rq->info.began_service = now; + + if (prev->state != TASK_RUNNING) + return; + /* if prev arrived subtract rq's last arrival from its arrival */ + diff = now - rq->info.last_arrival; + rq->info.inter_arrival_time = + RUNNING_AVG(rq->info.inter_arrival_time, diff); + rq->info.last_arrival = now; +} + /* * Adding/removing a task to/from a priority array: */ @@ -492,15 +567,18 @@ repeat_lock_task: (p->cpus_allowed & (1UL << smp_processor_id())))) { set_task_cpu(p, smp_processor_id()); + sched_info_arrive(p); task_rq_unlock(rq, &flags); goto repeat_lock_task; } if (old_state == TASK_UNINTERRUPTIBLE) rq->nr_uninterruptible--; - if (sync) + if (sync) { + sched_info_arrive(p); __activate_task(p, rq); - else { + } else { activate_task(p, rq); + sched_info_arrive(p); if (p->prio < rq->curr->prio) resched_task(rq->curr); } @@ -554,6 +632,7 @@ void wake_up_forked_process(task_t * p) p->sleep_avg = p->sleep_avg * CHILD_PENALTY / 100; p->prio = effective_prio(p); set_task_cpu(p, smp_processor_id()); + sched_info_arrive(p); if (unlikely(!current->array)) __activate_task(p, rq); @@ -715,6 +794,11 @@ unsigned long nr_iowait(void) return sum; } +void cpu_sched_info(struct sched_info *info, int cpu) +{ + memcpy(info, &cpu_rq(cpu)->info, sizeof(struct sched_info)); +} + /* * double_rq_lock - safely lock two runqueues * @@ -1337,6 +1421,7 @@ switch_tasks: if (likely(prev != next)) { rq->nr_switches++; + sched_info_switch(prev, next); rq->curr = next; prepare_arch_switch(rq, next); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-18 18:18 ` [Lse-tech] " Mike Kravetz 2003-07-18 19:57 ` William Lee Irwin III @ 2003-07-21 4:47 ` Peter Chubb 2003-07-23 21:50 ` bill davidsen 2 siblings, 0 replies; 7+ messages in thread From: Peter Chubb @ 2003-07-21 4:47 UTC (permalink / raw) To: Mike Kravetz; +Cc: Erich Focht, LSE, linux-kernel >>>>> "Mike" == Mike Kravetz <kravetz@us.ibm.com> writes: Mike> On Fri, Jul 18, 2003 at 06:35:42PM +0200, Erich Focht wrote: >> This patch brings back the per CPU user & system times which one >> was used to see in /proc/PID/cpu with 2.4 kernels. Useful for SMP >> and NUMA scheduler development, needed for reasonable output in >> numabench / numa_test. >> Mike> On a somewhat related note ... Mike> We (Big Blue) have a performance reporting application that Mike> would like to know how long a task sits on a runqueue before it Mike> is actually given the CPU. In other words, it wants to know how Mike> long the 'runnable task' was delayed due to contention for the Mike> CPU(s). Of course, one could get an overall feel for this based Mike> on total runqueue length. However, this app would really like Mike> this info on a per-task basis. Mike> Does anyone else think this type of info would be useful? This is exactly what my microstate accounting patch does. Per task figures for: -- how long on CPU -- how long on active queue -- how long on expired queue -- how long sleeping for paging -- how long sleeping in other non-interruptible state -- how long sleeping interruptibly -- how much time stolen for interrupts -- how long in system call -- how long sleeping on Futex -- how long sleeping for epoll, poll or select I haven't yet added time spent handling traps, so the ONCPU time includes pagefault and other trap time; also I've implemented the low level timers only for X86 and IA64. It'd be pretty trivial to add other architectures. The most recent published version of the patch is at http://www.ussg.iu.edu/hypermail/linux/kernel/0306.3/0636.html (that one doesn't include all the states I mentioned, but the on-queue times *are* counted) There'll be another patch with more states soon. -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au You are lost in a maze of BitKeeper repositories, all slightly different. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-18 18:18 ` [Lse-tech] " Mike Kravetz 2003-07-18 19:57 ` William Lee Irwin III 2003-07-21 4:47 ` Peter Chubb @ 2003-07-23 21:50 ` bill davidsen 2003-07-23 23:32 ` Peter Chubb 2 siblings, 1 reply; 7+ messages in thread From: bill davidsen @ 2003-07-23 21:50 UTC (permalink / raw) To: linux-kernel In article <20030718111850.C1627@w-mikek2.beaverton.ibm.com>, Mike Kravetz <kravetz@us.ibm.com> wrote: | On a somewhat related note ... | | We (Big Blue) have a performance reporting application that | would like to know how long a task sits on a runqueue before | it is actually given the CPU. In other words, it wants to | know how long the 'runnable task' was delayed due to contention | for the CPU(s). Of course, one could get an overall feel for | this based on total runqueue length. However, this app would | really like this info on a per-task basis. This is certainly a useful number. It's easy to tell when the CPU is "in use," but it's not easy to tell when it's "busy" and processes are waiting for a CPU. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lse-tech] [patch 2.6.0-test1] per cpu times 2003-07-23 21:50 ` bill davidsen @ 2003-07-23 23:32 ` Peter Chubb 0 siblings, 0 replies; 7+ messages in thread From: Peter Chubb @ 2003-07-23 23:32 UTC (permalink / raw) To: bill davidsen; +Cc: linux-kernel, kravetz >>>>> "bill" == bill davidsen <davidsen@tmr.com> writes: >>>>> "Mike" == Mike Kravetz <kravetz@us.ibm.com> Mike> On a somewhat related note ... We (Big Blue) have a Mike> performance reporting application that would like to know how Mike> long a task sits on a runqueue before it is actually given the Mike> CPU. In other words, it wants to know how long the 'runnable Mike> task' was delayed due to contention for the CPU(s). Of Mike> course, one could get an overall feel for this based on total Mike> runqueue length. However, this app would really like this Mike> info on a per-task basis. bill> This is certainly a useful number. This is exactly what's measured by the microstate accounting patches I've been pushing to LKML, along with a few other useful statistics. If you try it, please let me know: see http://marc.theaimsgroup.com/?l=linux-kernel&m=105884469205748&w=2 -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au You are lost in a maze of BitKeeper repositories, all slightly different. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-07-23 23:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-07-18 16:35 [patch 2.6.0-test1] per cpu times Erich Focht 2003-07-18 18:18 ` [Lse-tech] " Mike Kravetz 2003-07-18 19:57 ` William Lee Irwin III 2003-07-18 20:53 ` Rick Lindsley 2003-07-21 4:47 ` Peter Chubb 2003-07-23 21:50 ` bill davidsen 2003-07-23 23:32 ` Peter Chubb
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.