* [PATCH] proc: speedup /proc/stat handling @ 2012-01-20 15:59 Eric Dumazet 2012-01-20 22:55 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Eric Dumazet @ 2012-01-20 15:59 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes, and is slow : # strace -T -o /tmp/STRACE cat /proc/stat | wc -c 5826 # grep "cpu " /tmp/STRACE read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> Thats partly because show_stat() must be called twice since initial buffer size is too small (4096 bytes for less than 32 possible cpus) Fix this by : 1) Taking into account nr_irqs in the initial buffer sizing. 2) Using ksize() to allow better filling of initial buffer. 3) Reduce the bloat on "intr ..." line : Dont output trailing " 0" values at the end of irq range. An alternative to 1) would be to remember the largest m->count reached in show_stat() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> --- fs/proc/stat.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..78db0fa 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -49,6 +49,20 @@ static u64 get_iowait_time(int cpu) return iowait; } +/* + * Most irqs at the end of the range are never used. + * Find the upper limit, to not output trailing " 0" values + */ +static int highest_irq_used(void) +{ + int nr; + + for (nr = nr_irqs; nr; nr--) + if (kstat_irqs(nr - 1)) + break; + return nr; +} + static int show_stat(struct seq_file *p, void *v) { int i, j; @@ -129,10 +143,10 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(guest_nice)); } seq_printf(p, "intr %llu", (unsigned long long)sum); - - /* sum again ? it could be updated? */ - for_each_irq_nr(j) - seq_printf(p, " %u", kstat_irqs(j)); + /* Note that the "sum" value can already be obsolete. */ + j = highest_irq_used(); + for (i = 0; i < j; i++) + seq_printf(p, " %u", kstat_irqs(i)); seq_printf(p, "\nctxt %llu\n" @@ -157,14 +171,17 @@ static int show_stat(struct seq_file *p, void *v) static int stat_open(struct inode *inode, struct file *file) { - unsigned size = 4096 * (1 + num_possible_cpus() / 32); + unsigned size = 1024 + 128 * num_possible_cpus(); char *buf; struct seq_file *m; int res; + /* minimum size to display a 0 count per interrupt : 2 bytes */ + size += 2 * nr_irqs; + /* don't ask for more than the kmalloc() max size */ - if (size > KMALLOC_MAX_SIZE) - size = KMALLOC_MAX_SIZE; + size = min_t(unsigned, size, KMALLOC_MAX_SIZE); + buf = kmalloc(size, GFP_KERNEL); if (!buf) return -ENOMEM; @@ -173,7 +190,7 @@ static int stat_open(struct inode *inode, struct file *file) if (!res) { m = file->private_data; m->buf = buf; - m->size = size; + m->size = ksize(buf); } else kfree(buf); return res; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] proc: speedup /proc/stat handling 2012-01-20 15:59 [PATCH] proc: speedup /proc/stat handling Eric Dumazet @ 2012-01-20 22:55 ` Andrew Morton 2012-01-23 10:16 ` KAMEZAWA Hiroyuki 2012-01-25 0:01 ` [PATCH v2] " Eric Dumazet 2 siblings, 0 replies; 26+ messages in thread From: Andrew Morton @ 2012-01-20 22:55 UTC (permalink / raw) To: Eric Dumazet Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Fri, 20 Jan 2012 16:59:24 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 > bytes, and is slow : > > # strace -T -o /tmp/STRACE cat /proc/stat | wc -c > 5826 > # grep "cpu " /tmp/STRACE > read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> > > > Thats partly because show_stat() must be called twice since initial > buffer size is too small (4096 bytes for less than 32 possible cpus) > > Fix this by : > > 1) Taking into account nr_irqs in the initial buffer sizing. > > 2) Using ksize() to allow better filling of initial buffer. > > 3) Reduce the bloat on "intr ..." line : > Dont output trailing " 0" values at the end of irq range. This one is worrisome. Mainly because the number of fields in the `intr' line can now increase over time (yes?). So if a monitoring program were to read this line and use the result to size an internal buffer then after a while it might start to drop information or to get buffer overruns. > An alternative to 1) would be to remember the largest m->count reached > in show_stat() > > > ... > > @@ -157,14 +171,17 @@ static int show_stat(struct seq_file *p, void *v) > > static int stat_open(struct inode *inode, struct file *file) > { > - unsigned size = 4096 * (1 + num_possible_cpus() / 32); > + unsigned size = 1024 + 128 * num_possible_cpus(); > char *buf; > struct seq_file *m; > int res; > > + /* minimum size to display a 0 count per interrupt : 2 bytes */ > + size += 2 * nr_irqs; > + > /* don't ask for more than the kmalloc() max size */ > - if (size > KMALLOC_MAX_SIZE) > - size = KMALLOC_MAX_SIZE; > + size = min_t(unsigned, size, KMALLOC_MAX_SIZE); The change looks reasonable, however the use of KMALLOC_MAX_SIZE in the existing code is worrisome. If `size' ever gets that large then there's a decent chance that the kmalloc() will simply fail and a better chance that it would cause tons of VM scanning activity, including disk writeout. But I've never seen anyone report problems in this area, so shrug. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] proc: speedup /proc/stat handling 2012-01-20 15:59 [PATCH] proc: speedup /proc/stat handling Eric Dumazet 2012-01-20 22:55 ` Andrew Morton @ 2012-01-23 10:16 ` KAMEZAWA Hiroyuki 2012-01-23 10:33 ` Glauber Costa 2012-01-25 0:01 ` [PATCH v2] " Eric Dumazet 2 siblings, 1 reply; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-23 10:16 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Fri, 20 Jan 2012 16:59:24 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 > bytes, and is slow : > > # strace -T -o /tmp/STRACE cat /proc/stat | wc -c > 5826 > # grep "cpu " /tmp/STRACE > read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> > > > Thats partly because show_stat() must be called twice since initial > buffer size is too small (4096 bytes for less than 32 possible cpus) > > Fix this by : > > 1) Taking into account nr_irqs in the initial buffer sizing. > > 2) Using ksize() to allow better filling of initial buffer. > > 3) Reduce the bloat on "intr ..." line : > Dont output trailing " 0" values at the end of irq range. > > An alternative to 1) would be to remember the largest m->count reached > in show_stat() > nice catch. But how about using usual seq_file rather than single_open() ? I just don't like multi-page buffer for this small file...very much. A rough patch here, maybe optimization will not be enough. (IOW, this may be slow.) == >From e6a82f75d690aba61234671bc301714fd1d0762a Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Date: Mon, 23 Jan 2012 19:18:31 +0900 Subject: [PATCH] proc: use usual seq_file ops for /proc/stat rather than single_open. Now, /proc/stat uses single_open() for showing information. This means the all data will be gathered and buffered once to a (big) buf. Now, /proc/stat shows stats per cpu and stats per IRQs. To get information in once-shot, it allocates a big buffer (until KMALLOC_MAX_SIZE). Eric Dumazet reported that the bufsize calculation doesn't take the numner of IRQ into account and the information cannot be got in one-shot. (By this, seq_read() will allocate buffer again and read the whole data gain...) This patch changes /proc/stat to use seq_open() rather than single_open() and provides ->start(), ->next(), ->stop(), ->show(). By this, /proc/stat will not need to take care of size of buffer. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- fs/proc/stat.c | 297 ++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 213 insertions(+), 84 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..e9f1716 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -49,22 +49,110 @@ static u64 get_iowait_time(int cpu) return iowait; } -static int show_stat(struct seq_file *p, void *v) +enum proc_stat_stage /*the numbers are used as *pos and iter->stage */ +{ + SHOW_TOTAL_CPU_STAT, + SHOW_PERCPU_STAT, + SHOW_TOTAL_IRQS, + SHOW_IRQ_DETAILS, + SHOW_TIMES, + SHOW_TOTAL_SOFTIRQ, + SHOW_SOFTIRQ_DETAILS, + SHOW_EOL, + END_STATS, +}; + +/* + * for reduce the number of calling ->next(), ->show() IRQ numbers are + * handled in batch. + */ +struct seq_stat_iter { + int stage; + unsigned long jiffies; + int cpu_iter; + int irq_iter; + int softirq_iter; + /* cached information */ + u64 irq_sum; + u64 softirq_sum; + u32 per_softirq_sums[NR_SOFTIRQS]; +}; + +void *proc_stat_start(struct seq_file *p, loff_t *pos) +{ + struct seq_stat_iter *iter = p->private; + + /* At lseek(), *pos==0 is passed.(see travers() in seq_file.c */ + if (!*pos) { + struct timespec boottime; + + memset(iter, 0, sizeof(*iter)); + iter->stage = SHOW_TOTAL_CPU_STAT; + getboottime(&boottime); + iter->jiffies = boottime.tv_sec; + } + if (iter->stage == END_STATS) + return NULL; + return iter; +} + +void proc_stat_stop(struct seq_file *p, void *v) +{ + return; +} + +void *proc_stat_next(struct seq_file *p, void *v, loff_t *pos) +{ + struct seq_stat_iter *iter = p->private; + int index; + + switch (iter->stage) { + case SHOW_TOTAL_CPU_STAT: + iter->stage = SHOW_PERCPU_STAT; + iter->cpu_iter = cpumask_first(cpu_online_mask); + break; + case SHOW_PERCPU_STAT: + index = cpumask_next(iter->cpu_iter, cpu_online_mask); + if (index >= nr_cpu_ids) + iter->stage = SHOW_TOTAL_IRQS; + else + iter->cpu_iter = index; + break; + case SHOW_TOTAL_IRQS: + iter->stage = SHOW_IRQ_DETAILS; + iter->irq_iter = 0; + break; + case SHOW_IRQ_DETAILS: + if (iter->irq_iter >= nr_irqs) /* see include/linux/irqnr.h */ + iter->stage = SHOW_TIMES; + break; + case SHOW_TIMES: + iter->stage = SHOW_TOTAL_SOFTIRQ; + break; + case SHOW_TOTAL_SOFTIRQ: + iter->stage = SHOW_SOFTIRQ_DETAILS; + break; + case SHOW_SOFTIRQ_DETAILS: + iter->stage = SHOW_EOL; + break; + case SHOW_EOL: + iter->stage = END_STATS; + return NULL; + default: + break; + } + return iter; +} + +static int show_total_cpu_stat(struct seq_file *p, struct seq_stat_iter *iter) { - int i, j; - unsigned long jif; u64 user, nice, system, idle, iowait, irq, softirq, steal; u64 guest, guest_nice; - u64 sum = 0; - u64 sum_softirq = 0; - unsigned int per_softirq_sums[NR_SOFTIRQS] = {0}; - struct timespec boottime; + int i, j; - user = nice = system = idle = iowait = - irq = softirq = steal = 0; + user = nice = system = idle = iowait = 0; + irq = softirq = steal = 0; guest = guest_nice = 0; - getboottime(&boottime); - jif = boottime.tv_sec; for_each_possible_cpu(i) { user += kcpustat_cpu(i).cpustat[CPUTIME_USER]; @@ -77,19 +165,19 @@ static int show_stat(struct seq_file *p, void *v) steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - sum += kstat_cpu_irqs_sum(i); - sum += arch_irq_stat_cpu(i); + iter->irq_sum += kstat_cpu_irqs_sum(i); + iter->irq_sum += arch_irq_stat_cpu(i); for (j = 0; j < NR_SOFTIRQS; j++) { unsigned int softirq_stat = kstat_softirqs_cpu(j, i); - per_softirq_sums[j] += softirq_stat; - sum_softirq += softirq_stat; + iter->per_softirq_sums[j] += softirq_stat; + iter->softirq_sum += softirq_stat; } } - sum += arch_irq_stat(); + iter->irq_sum += arch_irq_stat(); - seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " + return seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " "%llu\n", (unsigned long long)cputime64_to_clock_t(user), (unsigned long long)cputime64_to_clock_t(nice), @@ -101,81 +189,122 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(steal), (unsigned long long)cputime64_to_clock_t(guest), (unsigned long long)cputime64_to_clock_t(guest_nice)); - for_each_online_cpu(i) { - /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ - user = kcpustat_cpu(i).cpustat[CPUTIME_USER]; - nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE]; - system = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]; - idle = get_idle_time(i); - iowait = get_iowait_time(i); - irq = kcpustat_cpu(i).cpustat[CPUTIME_IRQ]; - softirq = kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]; - steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; - guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; - guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - seq_printf(p, - "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " - "%llu\n", - i, - (unsigned long long)cputime64_to_clock_t(user), - (unsigned long long)cputime64_to_clock_t(nice), - (unsigned long long)cputime64_to_clock_t(system), - (unsigned long long)cputime64_to_clock_t(idle), - (unsigned long long)cputime64_to_clock_t(iowait), - (unsigned long long)cputime64_to_clock_t(irq), - (unsigned long long)cputime64_to_clock_t(softirq), - (unsigned long long)cputime64_to_clock_t(steal), - (unsigned long long)cputime64_to_clock_t(guest), - (unsigned long long)cputime64_to_clock_t(guest_nice)); - } - seq_printf(p, "intr %llu", (unsigned long long)sum); - - /* sum again ? it could be updated? */ - for_each_irq_nr(j) - seq_printf(p, " %u", kstat_irqs(j)); - - seq_printf(p, - "\nctxt %llu\n" - "btime %lu\n" - "processes %lu\n" - "procs_running %lu\n" - "procs_blocked %lu\n", - nr_context_switches(), - (unsigned long)jif, - total_forks, - nr_running(), - nr_iowait()); - - seq_printf(p, "softirq %llu", (unsigned long long)sum_softirq); - - for (i = 0; i < NR_SOFTIRQS; i++) - seq_printf(p, " %u", per_softirq_sums[i]); - seq_putc(p, '\n'); +} +static int show_online_cpu_stat(struct seq_file *p, struct seq_stat_iter *iter) +{ + u64 user, nice, system, idle, iowait, irq, softirq, steal; + u64 guest, guest_nice; + int cpu = iter->cpu_iter; + + /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ + user = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; + nice = kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; + system = kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; + idle = get_idle_time(cpu); + iowait = get_iowait_time(cpu); + irq = kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; + softirq = kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; + steal = kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; + guest = kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST]; + guest_nice = kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST_NICE]; + return seq_printf(p, + "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " + "%llu\n", + cpu, + (unsigned long long)cputime64_to_clock_t(user), + (unsigned long long)cputime64_to_clock_t(nice), + (unsigned long long)cputime64_to_clock_t(system), + (unsigned long long)cputime64_to_clock_t(idle), + (unsigned long long)cputime64_to_clock_t(iowait), + (unsigned long long)cputime64_to_clock_t(irq), + (unsigned long long)cputime64_to_clock_t(softirq), + (unsigned long long)cputime64_to_clock_t(steal), + (unsigned long long)cputime64_to_clock_t(guest), + (unsigned long long)cputime64_to_clock_t(guest_nice)); +} + +static int show_irq_details(struct seq_file *p, struct seq_stat_iter *iter) +{ + int ret; + /* + * we update iterater in ->show()...seems Ugly but for avoiding + * tons of function calls, print out here as much as possible + */ + do { + ret = seq_printf(p, " %u", kstat_irqs(iter->irq_iter)); + if (!ret) + iter->irq_iter += 1; + } while (!ret && iter->irq_iter < nr_irqs); + + return 0; +} + +static int show_softirq_details(struct seq_file *p, struct seq_stat_iter *iter) +{ + int ret; + + do { + ret = seq_printf(p, " %u", + iter->per_softirq_sums[iter->softirq_iter]); + if (!ret) + iter->softirq_iter += 1; + } while (!ret && iter->irq_iter < NR_SOFTIRQS); + return 0; +} + +static int proc_stat_show(struct seq_file *p, void *v) +{ + struct seq_stat_iter *iter = v; + + switch (iter->stage) { + case SHOW_TOTAL_CPU_STAT: + return show_total_cpu_stat(p, iter); + case SHOW_PERCPU_STAT: + return show_online_cpu_stat(p, iter); + case SHOW_TOTAL_IRQS: + return seq_printf(p, "intr %llu", + (unsigned long long)iter->irq_sum); + case SHOW_IRQ_DETAILS: + return show_irq_details(p, iter); + case SHOW_TIMES: + return seq_printf(p, + "\nctxt %llu\n" + "btime %lu\n" + "processes %lu\n" + "procs_running %lu\n" + "procs_blocked %lu\n", + nr_context_switches(), + (unsigned long)iter->jiffies, + total_forks, + nr_running(), + nr_iowait()); + case SHOW_TOTAL_SOFTIRQ: + return seq_printf(p, "softirq %llu", + (unsigned long long)iter->softirq_sum); + case SHOW_SOFTIRQ_DETAILS: + return show_softirq_details(p, iter); + case SHOW_EOL: + return seq_putc(p, '\n'); + } return 0; } +static const struct seq_operations show_stat_op = { + .start = proc_stat_start, + .next = proc_stat_next, + .stop = proc_stat_stop, + .show = proc_stat_show +}; + static int stat_open(struct inode *inode, struct file *file) { - unsigned size = 4096 * (1 + num_possible_cpus() / 32); - char *buf; - struct seq_file *m; int res; /* don't ask for more than the kmalloc() max size */ - if (size > KMALLOC_MAX_SIZE) - size = KMALLOC_MAX_SIZE; - buf = kmalloc(size, GFP_KERNEL); - if (!buf) - return -ENOMEM; - - res = single_open(file, show_stat, NULL); - if (!res) { - m = file->private_data; - m->buf = buf; - m->size = size; - } else - kfree(buf); + res = seq_open_private(file, &show_stat_op, + sizeof(struct seq_stat_iter)); + return res; } @@ -183,7 +312,7 @@ static const struct file_operations proc_stat_operations = { .open = stat_open, .read = seq_read, .llseek = seq_lseek, - .release = single_release, + .release = seq_release_private, }; static int __init proc_stat_init(void) -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] proc: speedup /proc/stat handling 2012-01-23 10:16 ` KAMEZAWA Hiroyuki @ 2012-01-23 10:33 ` Glauber Costa 2012-01-24 1:25 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 26+ messages in thread From: Glauber Costa @ 2012-01-23 10:33 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Eric Dumazet, Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On 01/23/2012 02:16 PM, KAMEZAWA Hiroyuki wrote: > On Fri, 20 Jan 2012 16:59:24 +0100 > Eric Dumazet<eric.dumazet@gmail.com> wrote: > >> On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 >> bytes, and is slow : >> >> # strace -T -o /tmp/STRACE cat /proc/stat | wc -c >> 5826 >> # grep "cpu " /tmp/STRACE >> read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826<0.001504> >> >> >> Thats partly because show_stat() must be called twice since initial >> buffer size is too small (4096 bytes for less than 32 possible cpus) >> >> Fix this by : >> >> 1) Taking into account nr_irqs in the initial buffer sizing. >> >> 2) Using ksize() to allow better filling of initial buffer. >> >> 3) Reduce the bloat on "intr ..." line : >> Dont output trailing " 0" values at the end of irq range. >> >> An alternative to 1) would be to remember the largest m->count reached >> in show_stat() >> > > nice catch. But how about using usual seq_file rather than single_open() ? > I just don't like multi-page buffer for this small file...very much. > > A rough patch here, maybe optimization will not be enough. (IOW, this may be slow.) > I myself don't like it very much, at least at first sight. Even with optimizations applied, I doubt we can make this approach faster than what we currently do for /proc/stat. Also, the code gets a lot harder to read and grasp. Problem is, unlike most of the stuff using seq_file, /proc/stat shows a lot of different kinds of information, not a single kind of easily indexable information. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] proc: speedup /proc/stat handling 2012-01-23 10:33 ` Glauber Costa @ 2012-01-24 1:25 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-24 1:25 UTC (permalink / raw) To: Glauber Costa Cc: Eric Dumazet, Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Mon, 23 Jan 2012 14:33:54 +0400 Glauber Costa <glommer@parallels.com> wrote: > On 01/23/2012 02:16 PM, KAMEZAWA Hiroyuki wrote: > > On Fri, 20 Jan 2012 16:59:24 +0100 > > Eric Dumazet<eric.dumazet@gmail.com> wrote: >> > >> An alternative to 1) would be to remember the largest m->count reached > >> in show_stat() > >> > > > > nice catch. But how about using usual seq_file rather than single_open() ? > > I just don't like multi-page buffer for this small file...very much. > > > > A rough patch here, maybe optimization will not be enough. (IOW, this may be slow.) > > > > I myself don't like it very much, at least at first sight. > Even with optimizations applied, I doubt we can make this approach > faster than what we currently do for /proc/stat. > IIUC, most of cost comes from printing " 0". > Also, the code gets a lot harder to read and grasp. Problem is, unlike > most of the stuff using seq_file, /proc/stat shows a lot of different > kinds of information, not a single kind of easily indexable information. > I think current one is too simple. But yes, may not be worth to to use usual seq_file sequence. I did some optimization around number(). Because my environ is small. size of /proc/stat is 2780. [kamezawa@bluextal test]$ wc -c /proc/stat 2780 /proc/stat Test program is this. == test program (read 1000 tiems.)== #!/usr/bin/python num = 0 with open("/proc/stat") as f: while num < 1000 : data = f.read() f.seek(0, 0) num = num + 1 == Before patch (3.3-rc1) == [kamezawa@bluextal test]$ time ./stat_check.py real 0m0.142s user 0m0.022s sys 0m0.117s == After patch == [root@bluextal test]# time ./stat_check.py real 0m0.096s user 0m0.024s sys 0m0.069s == In above, most of improvements comes from replacing seq_printf() with seq_puts(). If the number of cpu increases, the most costly one will be kstat_irqs(). perf record after patch: 19.03% stat_check.py [kernel.kallsyms] [k] memcpy 7.83% stat_check.py [kernel.kallsyms] [k] seq_puts 6.83% stat_check.py [kernel.kallsyms] [k] kstat_irqs 5.75% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup 5.24% stat_check.py libpython2.6.so.1.0 [.] 0x8ccb0 4.35% stat_check.py [kernel.kallsyms] [k] vsnprintf 3.68% stat_check.py [kernel.kallsyms] [k] sub_preempt_count 3.45% stat_check.py [kernel.kallsyms] [k] number If we can find kstat_irqs()==0 without walking all possible cpus, we can cut most of costs... Anyway, this is my final version. I'll go to my usual work ;) == >From dae43400c9fc7328163eeee7770dc8d7c714ea8b Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Date: Mon, 23 Jan 2012 19:18:31 +0900 Subject: [PATCH] proc: use usual seq_file ops for /proc/stat rather than single_open. Now, /proc/stat uses single_open() for showing information. This means the all data will be gathered and buffered once to a (big) buf. Now, /proc/stat shows stats per cpu and stats per IRQs. To get information in once-shot, it allocates a big buffer (until KMALLOC_MAX_SIZE). Eric Dumazet reported that the bufsize calculation doesn't take the numner of IRQ into account and the information cannot be got in one-shot. (By this, seq_read() will allocate buffer again and read the whole data gain...) This patch changes /proc/stat to use seq_open() rather than single_open() and provides ->start(), ->next(), ->stop(), ->show(). By this, /proc/stat will not need to take care of size of buffer. BTW, most time consuming routine in /proc/stat is number() and vsprintf() to show tons of "0" in /proc/stat. Handling this by seq_puts() improves performance much. Changelog: - reduced loop of ->next(), ->show() - call seq_puts(" 0") when seq_printf(" %u%") && value == 0. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- fs/proc/stat.c | 293 ++++++++++++++++++++++++++++++++++++++++++-------------- 1 files changed, 222 insertions(+), 71 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..b32354c 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -49,22 +49,114 @@ static u64 get_iowait_time(int cpu) return iowait; } -static int show_stat(struct seq_file *p, void *v) +enum proc_stat_stage /*the numbers are used as *pos and iter->stage */ +{ + SHOW_TOTAL_CPU_STAT, + SHOW_PERCPU_STAT, + SHOW_TOTAL_IRQS, + SHOW_IRQ_DETAILS, + SHOW_TIMES, + SHOW_TOTAL_SOFTIRQ, + SHOW_SOFTIRQ_DETAILS, + SHOW_EOL, + END_STATS, +}; + +/* + * for reduce the number of calling ->next(), ->show() IRQ numbers are + * handled in batch. + */ +struct seq_stat_iter { + int stage; + unsigned long jiffies; + int cpu_iter; + int irq_iter; + int softirq_iter; + /* cached information */ + u64 irq_sum; + u64 softirq_sum; + u32 per_softirq_sums[NR_SOFTIRQS]; +}; + +void *proc_stat_start(struct seq_file *p, loff_t *pos) +{ + struct seq_stat_iter *iter = p->private; + + /* At lseek(), *pos==0 is passed.(see travers() in seq_file.c */ + if (!*pos) { + struct timespec boottime; + + iter->stage = SHOW_TOTAL_CPU_STAT; + iter->cpu_iter = 0; + iter->irq_iter = 0; + iter->softirq_iter = 0; + getboottime(&boottime); + iter->jiffies = boottime.tv_sec; + } + if (iter->stage == END_STATS) + return NULL; + return iter; +} + +void proc_stat_stop(struct seq_file *p, void *v) +{ + return; +} + +/* + * Unlike other seq_files, we update private itertor and it can be updated + * by ->show(). + */ +void *proc_stat_next(struct seq_file *p, void *v, loff_t *pos) +{ + struct seq_stat_iter *iter = p->private; + + switch (iter->stage) { + case SHOW_TOTAL_CPU_STAT: + iter->stage = SHOW_PERCPU_STAT; + iter->cpu_iter = cpumask_first(cpu_online_mask); + break; + case SHOW_PERCPU_STAT: /* cpu_iter is updated in ->show() */ + if (iter->cpu_iter >= nr_cpu_ids) + iter->stage = SHOW_TOTAL_IRQS; + break; + case SHOW_TOTAL_IRQS: + iter->stage = SHOW_IRQ_DETAILS; + iter->irq_iter = 0; + break; + case SHOW_IRQ_DETAILS: /* irq_iter is updated in ->show() */ + if (iter->irq_iter >= nr_irqs) /* see include/linux/irqnr.h */ + iter->stage = SHOW_TIMES; + break; + case SHOW_TIMES: + iter->stage = SHOW_TOTAL_SOFTIRQ; + break; + case SHOW_TOTAL_SOFTIRQ: + iter->stage = SHOW_SOFTIRQ_DETAILS; + iter->softirq_iter = 0; + break; + case SHOW_SOFTIRQ_DETAILS: /*softirq_iter is updated in ->show() */ + if (iter->softirq_iter >= NR_SOFTIRQS) + iter->stage = SHOW_EOL; + break; + case SHOW_EOL: + iter->stage = END_STATS; + return NULL; + default: + break; + } + return iter; +} + +static int show_total_cpu_stat(struct seq_file *p, struct seq_stat_iter *iter) { - int i, j; - unsigned long jif; u64 user, nice, system, idle, iowait, irq, softirq, steal; u64 guest, guest_nice; - u64 sum = 0; - u64 sum_softirq = 0; - unsigned int per_softirq_sums[NR_SOFTIRQS] = {0}; - struct timespec boottime; + int i, j; - user = nice = system = idle = iowait = - irq = softirq = steal = 0; + user = nice = system = idle = iowait = 0; + irq = softirq = steal = 0; guest = guest_nice = 0; - getboottime(&boottime); - jif = boottime.tv_sec; for_each_possible_cpu(i) { user += kcpustat_cpu(i).cpustat[CPUTIME_USER]; @@ -77,19 +169,19 @@ static int show_stat(struct seq_file *p, void *v) steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - sum += kstat_cpu_irqs_sum(i); - sum += arch_irq_stat_cpu(i); + iter->irq_sum += kstat_cpu_irqs_sum(i); + iter->irq_sum += arch_irq_stat_cpu(i); for (j = 0; j < NR_SOFTIRQS; j++) { unsigned int softirq_stat = kstat_softirqs_cpu(j, i); - per_softirq_sums[j] += softirq_stat; - sum_softirq += softirq_stat; + iter->per_softirq_sums[j] += softirq_stat; + iter->softirq_sum += softirq_stat; } } - sum += arch_irq_stat(); + iter->irq_sum += arch_irq_stat(); - seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " + return seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " "%llu\n", (unsigned long long)cputime64_to_clock_t(user), (unsigned long long)cputime64_to_clock_t(nice), @@ -101,22 +193,31 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(steal), (unsigned long long)cputime64_to_clock_t(guest), (unsigned long long)cputime64_to_clock_t(guest_nice)); - for_each_online_cpu(i) { - /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ - user = kcpustat_cpu(i).cpustat[CPUTIME_USER]; - nice = kcpustat_cpu(i).cpustat[CPUTIME_NICE]; - system = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]; - idle = get_idle_time(i); - iowait = get_iowait_time(i); - irq = kcpustat_cpu(i).cpustat[CPUTIME_IRQ]; - softirq = kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]; - steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; - guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; - guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - seq_printf(p, +} + +static int show_online_cpu_stat(struct seq_file *p, struct seq_stat_iter *iter) +{ + u64 user, nice, system, idle, iowait, irq, softirq, steal; + u64 guest, guest_nice; + int cpu = iter->cpu_iter; + int ret; + + /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ + while (cpu < nr_cpu_ids) { + user = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; + nice = kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; + system = kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; + idle = get_idle_time(cpu); + iowait = get_iowait_time(cpu); + irq = kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; + softirq = kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; + steal = kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; + guest = kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST]; + guest_nice = kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST_NICE]; + ret = seq_printf(p, "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " "%llu\n", - i, + cpu, (unsigned long long)cputime64_to_clock_t(user), (unsigned long long)cputime64_to_clock_t(nice), (unsigned long long)cputime64_to_clock_t(system), @@ -127,55 +228,105 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(steal), (unsigned long long)cputime64_to_clock_t(guest), (unsigned long long)cputime64_to_clock_t(guest_nice)); + if (ret) + return ret; + cpu = cpumask_next(iter->cpu_iter, cpu_online_mask); + iter->cpu_iter = cpu; } - seq_printf(p, "intr %llu", (unsigned long long)sum); - - /* sum again ? it could be updated? */ - for_each_irq_nr(j) - seq_printf(p, " %u", kstat_irqs(j)); - - seq_printf(p, - "\nctxt %llu\n" - "btime %lu\n" - "processes %lu\n" - "procs_running %lu\n" - "procs_blocked %lu\n", - nr_context_switches(), - (unsigned long)jif, - total_forks, - nr_running(), - nr_iowait()); - - seq_printf(p, "softirq %llu", (unsigned long long)sum_softirq); - - for (i = 0; i < NR_SOFTIRQS; i++) - seq_printf(p, " %u", per_softirq_sums[i]); - seq_putc(p, '\n'); + return 0; + +} + +static int show_irq_details(struct seq_file *p, struct seq_stat_iter *iter) +{ + int ret; + /* + * we update iterater in ->show()...seems Ugly but for avoiding + * tons of function calls, print out here as much as possible + */ + do { + unsigned int irqs = kstat_irqs(iter->irq_iter); + + if (irqs) + ret = seq_printf(p, " %u", kstat_irqs(irqs)); + else + ret = seq_puts(p, " 0"); + if (!ret) + iter->irq_iter += 1; + } while (!ret && iter->irq_iter < nr_irqs); + + return 0; +} + +static int show_softirq_details(struct seq_file *p, struct seq_stat_iter *iter) +{ + int ret; + + do { + int softirqs = iter->per_softirq_sums[iter->softirq_iter]; + + if (softirqs) + ret = seq_printf(p, " %u", softirqs); + else + ret = seq_puts(p, " 0"); + if (!ret) + iter->softirq_iter += 1; + } while (!ret && iter->irq_iter < NR_SOFTIRQS); + return 0; +} + +static int proc_stat_show(struct seq_file *p, void *v) +{ + struct seq_stat_iter *iter = v; + switch (iter->stage) { + case SHOW_TOTAL_CPU_STAT: + return show_total_cpu_stat(p, iter); + case SHOW_PERCPU_STAT: + return show_online_cpu_stat(p, iter); + case SHOW_TOTAL_IRQS: + return seq_printf(p, "intr %llu", + (unsigned long long)iter->irq_sum); + case SHOW_IRQ_DETAILS: + return show_irq_details(p, iter); + case SHOW_TIMES: + return seq_printf(p, + "\nctxt %llu\n" + "btime %lu\n" + "processes %lu\n" + "procs_running %lu\n" + "procs_blocked %lu\n", + nr_context_switches(), + (unsigned long)iter->jiffies, + total_forks, + nr_running(), + nr_iowait()); + case SHOW_TOTAL_SOFTIRQ: + return seq_printf(p, "softirq %llu", + (unsigned long long)iter->softirq_sum); + case SHOW_SOFTIRQ_DETAILS: + return show_softirq_details(p, iter); + case SHOW_EOL: + return seq_putc(p, '\n'); + } return 0; } +static const struct seq_operations show_stat_op = { + .start = proc_stat_start, + .next = proc_stat_next, + .stop = proc_stat_stop, + .show = proc_stat_show +}; + static int stat_open(struct inode *inode, struct file *file) { - unsigned size = 4096 * (1 + num_possible_cpus() / 32); - char *buf; - struct seq_file *m; int res; /* don't ask for more than the kmalloc() max size */ - if (size > KMALLOC_MAX_SIZE) - size = KMALLOC_MAX_SIZE; - buf = kmalloc(size, GFP_KERNEL); - if (!buf) - return -ENOMEM; - - res = single_open(file, show_stat, NULL); - if (!res) { - m = file->private_data; - m->buf = buf; - m->size = size; - } else - kfree(buf); + res = seq_open_private(file, &show_stat_op, + sizeof(struct seq_stat_iter)); + return res; } @@ -183,7 +334,7 @@ static const struct file_operations proc_stat_operations = { .open = stat_open, .read = seq_read, .llseek = seq_lseek, - .release = single_release, + .release = seq_release_private, }; static int __init proc_stat_init(void) -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2] proc: speedup /proc/stat handling 2012-01-20 15:59 [PATCH] proc: speedup /proc/stat handling Eric Dumazet 2012-01-20 22:55 ` Andrew Morton 2012-01-23 10:16 ` KAMEZAWA Hiroyuki @ 2012-01-25 0:01 ` Eric Dumazet 2012-01-25 0:12 ` Andrew Morton 2012-01-25 0:18 ` KAMEZAWA Hiroyuki 2 siblings, 2 replies; 26+ messages in thread From: Eric Dumazet @ 2012-01-25 0:01 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes, and is slow : # strace -T -o /tmp/STRACE cat /proc/stat | wc -c 5826 # grep "cpu " /tmp/STRACE read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> Thats partly because show_stat() must be called twice since initial buffer size is too small (4096 bytes for less than 32 possible cpus) Fix this by : 1) Taking into account nr_irqs in the initial buffer sizing. 2) Using ksize() to allow better filling of initial buffer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> --- V2: No change on "intr" line for compatibility sake. fs/proc/stat.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..ac44611 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -157,11 +157,14 @@ static int show_stat(struct seq_file *p, void *v) static int stat_open(struct inode *inode, struct file *file) { - unsigned size = 4096 * (1 + num_possible_cpus() / 32); + unsigned size = 1024 + 128 * num_possible_cpus(); char *buf; struct seq_file *m; int res; + /* minimum size to display an interrupt count : 2 bytes */ + size += 2 * nr_irqs; + /* don't ask for more than the kmalloc() max size */ if (size > KMALLOC_MAX_SIZE) size = KMALLOC_MAX_SIZE; @@ -173,7 +176,7 @@ static int stat_open(struct inode *inode, struct file *file) if (!res) { m = file->private_data; m->buf = buf; - m->size = size; + m->size = ksize(buf); } else kfree(buf); return res; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:01 ` [PATCH v2] " Eric Dumazet @ 2012-01-25 0:12 ` Andrew Morton 2012-01-25 0:22 ` Eric Dumazet 2012-01-25 0:18 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-25 0:12 UTC (permalink / raw) To: Eric Dumazet Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 25 Jan 2012 01:01:23 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 > bytes, and is slow : It is pretty bad: akpm:/usr/src/25> time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) (; for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done; ) 0.05s user 0.94s system 29% cpu 3.367 total (29% CPU?) > # strace -T -o /tmp/STRACE cat /proc/stat | wc -c > 5826 > # grep "cpu " /tmp/STRACE > read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> > Did you measure the improvement from this patch? > Thats partly because show_stat() must be called twice since initial > buffer size is too small (4096 bytes for less than 32 possible cpus) > > Fix this by : > > 1) Taking into account nr_irqs in the initial buffer sizing. > > 2) Using ksize() to allow better filling of initial buffer. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:12 ` Andrew Morton @ 2012-01-25 0:22 ` Eric Dumazet 2012-01-25 1:27 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2012-01-25 0:22 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner Le mardi 24 janvier 2012 à 16:12 -0800, Andrew Morton a écrit : > Did you measure the improvement from this patch? Unfortunately I can not reboot the server where I noticed this performance problem. On the smaller one, performance improvement is about 20%, because the second run of show_stat() can use data present in cpu cache. On big machines, I guess the 128 bytes per possible cpu reservation can avoid the second run. (since a typical cpuXXX line is smaller than 128 bytes) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:22 ` Eric Dumazet @ 2012-01-25 1:27 ` Andrew Morton 2012-01-25 5:29 ` Eric Dumazet 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-25 1:27 UTC (permalink / raw) To: Eric Dumazet Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 25 Jan 2012 01:22:25 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 24 janvier 2012 __ 16:12 -0800, Andrew Morton a __crit : > > > Did you measure the improvement from this patch? > > Unfortunately I can not reboot the server where I noticed this > performance problem. > > On the smaller one, performance improvement is about 20%, because the > second run of show_stat() can use data present in cpu cache. > > On big machines, I guess the 128 bytes per possible cpu reservation can > avoid the second run. (since a typical cpuXXX line is smaller than 128 > bytes) I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate any improvement for either of time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) oh well. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 1:27 ` Andrew Morton @ 2012-01-25 5:29 ` Eric Dumazet 2012-01-26 1:04 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2012-01-25 5:29 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner Le mardi 24 janvier 2012 à 17:27 -0800, Andrew Morton a écrit : > I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate > any improvement for either of > > time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) > time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) > > oh well. What size is /proc/stat ? wc -c /proc/stat If under 4096, there is no problem with existing code. I had the problem on a 16-way machine. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 5:29 ` Eric Dumazet @ 2012-01-26 1:04 ` Andrew Morton 2012-01-26 9:55 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-26 1:04 UTC (permalink / raw) To: Eric Dumazet Cc: KAMEZAWA Hiroyuki, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 25 Jan 2012 06:29:32 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 24 janvier 2012 __ 17:27 -0800, Andrew Morton a __crit : > > > I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate > > any improvement for either of > > > > time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) > > time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) > > > > oh well. > > What size is /proc/stat ? About 40mm, but it depends on the font size. > wc -c /proc/stat > > If under 4096, there is no problem with existing code. akpm2:/home/akpm> wc -c /proc/stat 2800 /proc/stat > I had the problem on a 16-way machine. OK.. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-26 1:04 ` Andrew Morton @ 2012-01-26 9:55 ` KAMEZAWA Hiroyuki 2012-01-27 0:43 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-26 9:55 UTC (permalink / raw) To: Andrew Morton Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 25 Jan 2012 17:04:16 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 25 Jan 2012 06:29:32 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Le mardi 24 janvier 2012 __ 17:27 -0800, Andrew Morton a __crit : > > > > > I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate > > > any improvement for either of > > > > > > time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) > > > time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) > > > > > > oh well. > > > > What size is /proc/stat ? > > About 40mm, but it depends on the font size. > > > wc -c /proc/stat > > > > If under 4096, there is no problem with existing code. > > akpm2:/home/akpm> wc -c /proc/stat > 2800 /proc/stat > > > I had the problem on a 16-way machine. > > OK.. I wrote following patch just for my fun, which makes /proc/stat twice fast. But I'm not sure whether this kind of dirty && special printk is worth to do or not.. because I can't see /proc/stat cost at shell-scripting. With python script test (read /proc/stat 1000times) == #!/usr/bin/python num = 0 with open("/proc/stat") as f: while num < 1000 : data = f.read() f.seek(0, 0) num = num + 1 == *Before patch [kamezawa@bluextal test]$ time ./stat_check.py real 0m0.154s user 0m0.020s sys 0m0.130s *After patch [kamezawa@bluextal test]$ time ./stat_check.py real 0m0.080s user 0m0.029s sys 0m0.048s For ascii interface, format_decode() and number() seems to be very costly... == >From 2bd6deb7048b1e4e4d32b9fb8c6ca93259a11620 Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Date: Thu, 26 Jan 2012 18:57:19 +0900 Subject: [PATCH] snprintf_batch for sequence printing in /proc/stat In /proc/stat, a sequence of numbers are printed out under the same format. Reading /proc/stat consumes 20% of cpu just for decoding the format. This patch adds a decode-once + print a sequence function for helping /proc/stat. New (special) function is snprintf_batch(). for example, Following is a function to print 0..10 in a sequence to given buffer. unsigned long long my_next_val(void *iter, bool *last) { int x = *(int *)iter; if (x == 11) { last = true; return 0; } *(int *)iter += 1; return x; } { ... int iter = 0; snprintf_batch(buf, size, " %d", &iter, my_next_val); } Using this will reduce the cost for format decoding and make /proc/stat faster, which shows tons of numbers in ascii.. With a python script which reads /proc/stat 1000 times. * Before Patch [kamezawa@bluextal test]$ time ./stat_check.py real 0m0.154s user 0m0.020s sys 0m0.130s * After Patch [kamezawa@bluextal test]$ time ./stat_check.py real 0m0.080s user 0m0.029s sys 0m0.048s In my environ, size of /proc/stat is 2700 bytes. Above score shows that patched one is almost twice fast....but in general, with shell-scripting, fork/exec/mmap etc...cost is much larger than cost of reading /proc/stat, benefit for usual users is doubtful If an admin or a program check /proc/stat periodically with his script, this improvement will give them better latency. Note: the added snprinf_batch() can be used for other places as /proc/<pid>/stat but I'm not sure whether this simple iterator interface is enough or not. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- fs/proc/stat.c | 16 +++++++- fs/seq_file.c | 18 +++++++++ include/linux/kernel.h | 4 ++ include/linux/seq_file.h | 3 + lib/vsprintf.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 131 insertions(+), 2 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..0687d45 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -49,6 +49,18 @@ static u64 get_iowait_time(int cpu) return iowait; } +static unsigned long long get_next_kstat_irq(void *iter, bool *last) +{ + int pos = *(int *)iter; + + if (pos == nr_irqs) { + *last = true; + return 0; + } + *(int *)iter = pos + 1; + return kstat_irqs(pos); +} + static int show_stat(struct seq_file *p, void *v) { int i, j; @@ -131,8 +143,8 @@ static int show_stat(struct seq_file *p, void *v) seq_printf(p, "intr %llu", (unsigned long long)sum); /* sum again ? it could be updated? */ - for_each_irq_nr(j) - seq_printf(p, " %u", kstat_irqs(j)); + j = 0; + seq_printnum_batch(p, " %u", &j, get_next_kstat_irq); seq_printf(p, "\nctxt %llu\n" diff --git a/fs/seq_file.c b/fs/seq_file.c index 4023d6b..fa367ce 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -386,6 +386,24 @@ int seq_printf(struct seq_file *m, const char *f, ...) } EXPORT_SYMBOL(seq_printf); +int seq_printnum_batch(struct seq_file *m, const char *f, void *iter, + unsigned long long (*next_val)(void *, bool *)) +{ + int len; + + if (m->count < m->size) { + len = snprintf_batch(m->buf + m->count, + m->size - m->count, f, iter, next_val); + if (m->count + len < m->size) { + m->count += len; + return 0; + } + } + m->count = m->size; + return -1; +} +EXPORT_SYMBOL_GPL(seq_printnum_batch); + /** * mangle_path - mangle and copy path to buffer beginning * @s: buffer start diff --git a/include/linux/kernel.h b/include/linux/kernel.h index e834342..7306174 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -320,6 +320,10 @@ extern int sscanf(const char *, const char *, ...) extern int vsscanf(const char *, const char *, va_list) __attribute__ ((format (scanf, 2, 0))); +/* special function for printing sequence values */ +extern int snprintf_batch(char *buf, size_t size, const char *fmt, + void *iter, unsigned long long (*next_val)(void *, bool *)); + extern int get_option(char **str, int *pint); extern char *get_options(const char *str, int nints, int *ints); extern unsigned long long memparse(const char *ptr, char **retptr); diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 44f1514..5ac464a 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -86,6 +86,9 @@ int seq_write(struct seq_file *seq, const void *data, size_t len); __printf(2, 3) int seq_printf(struct seq_file *, const char *, ...); +int seq_printnum_batch(struct seq_file *seq, const char *fmt, void *iter, + unsigned long long (*next_val)(void *, bool *)); + int seq_path(struct seq_file *, const struct path *, const char *); int seq_dentry(struct seq_file *, struct dentry *, const char *); int seq_path_root(struct seq_file *m, const struct path *path, diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 8e75003..03e45cc 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2038,3 +2038,95 @@ int sscanf(const char *buf, const char *fmt, ...) return i; } EXPORT_SYMBOL(sscanf); + +/* + * snprintf_batch - format a string with a give array and spacer + * @buf : The buffer to place the result info + * @size: The size of the buffer, including the trailing null space + * @fmt : The format for printing each numbers.(some limitations below) + * @iter:The array to be printed + * @array_size : the number of entries in array + * + * This function is for printing values given by a given function next_val() + * in a given format until next_val() finishes. Unlike printf, available + * format is very limited. The allowed fmt for this function is + * "a%?b". + * - a is a one charactor spacer printed before the value. + * - b is a one charactor spacer printed after the value + * - %? is an integer value. %d, %u, %x + other controls as 'h/hh/l/ll etc + * are available. But you can specify precision and field_width as argument. + * + * This function is useful when you need to print a sequence of values of + * the same type. This function stops when next_val() returns 'true' in + * the second argument. + * + * This function is originally designed for /proc/stat.. + */ + +int snprintf_batch(char *buf, size_t size, const char *fmt, void *iter, + unsigned long long (*next_val)(void *, bool *)) +{ + char *str, *end; + struct printf_spec spec = {0}; + unsigned long long num; + bool last = false; + char head, foot; + bool easy_zero_printing = false; + int read; + + if (WARN_ON_ONCE((int) size < 0)) + return 0; + + str = buf; + end = buf + size; + /* Make sure end is always >= buf */ + if (end < buf) { + end = ((void *)-1); + size = end - buf; + } + + head = foot = 0; + + if (*fmt != '%') { + head = *fmt; + fmt++; + } + /* decode happens only once */ + read = format_decode(fmt, &spec); + fmt += read; + if (*fmt) + foot = *fmt; + /* if fmt is simple, we can show '0' easily */ + if (spec.precision == -1 && + spec.field_width == -1 && + !(spec.flags & (ZEROPAD + PLUS + SPACE + SMALL + SPECIAL))) + easy_zero_printing = true; + + while (str < end) { + num = next_val(iter, &last); + if (last) + break; + if (str < end && head != 0) { + *str = head; + str++; + } + /* optimization for /proc/stat */ + if (num == 0 && easy_zero_printing) { + *str = '0'; + str++; + } else + str = number(str, end, num, spec); + if (str < end && foot != 0) { + *str = foot; + str++; + } + } + if (size > 0) { + if (str < end) + *str = '0'; + else + end[-1] = '0'; + } + return str - buf; +} +EXPORT_SYMBOL_GPL(snprintf_batch); -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-26 9:55 ` KAMEZAWA Hiroyuki @ 2012-01-27 0:43 ` Andrew Morton 2012-01-27 1:09 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-27 0:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Thu, 26 Jan 2012 18:55:20 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 25 Jan 2012 17:04:16 -0800 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Wed, 25 Jan 2012 06:29:32 +0100 > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > Le mardi 24 janvier 2012 __ 17:27 -0800, Andrew Morton a __crit : > > > > > > > I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate > > > > any improvement for either of > > > > > > > > time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) > > > > time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) > > > > > > > > oh well. > > > > > > What size is /proc/stat ? > > > > About 40mm, but it depends on the font size. > > > > > wc -c /proc/stat > > > > > > If under 4096, there is no problem with existing code. > > > > akpm2:/home/akpm> wc -c /proc/stat > > 2800 /proc/stat > > > > > I had the problem on a 16-way machine. > > > > OK.. > > > I wrote following patch just for my fun, which makes /proc/stat twice fast. > But I'm not sure whether this kind of dirty && special printk is worth to do or not.. > because I can't see /proc/stat cost at shell-scripting. It is rather a lot of not-very-general infrastructure. > > ... > > @@ -131,8 +143,8 @@ static int show_stat(struct seq_file *p, void *v) > seq_printf(p, "intr %llu", (unsigned long long)sum); > > /* sum again ? it could be updated? */ > - for_each_irq_nr(j) > - seq_printf(p, " %u", kstat_irqs(j)); > + j = 0; > + seq_printnum_batch(p, " %u", &j, get_next_kstat_irq); I expect most of these numbers are zero. I wonder if we would get useful speedups from for_each_irq_nr(j) { /* Apologetic comment goes here */ if (kstat_irqs(j)) seq_printf(p, " %u", kstat_irqs(j)); else seq_puts(p, " 0"); } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-27 0:43 ` Andrew Morton @ 2012-01-27 1:09 ` KAMEZAWA Hiroyuki 2012-01-27 1:18 ` Andrew Morton 2012-01-27 7:09 ` [PATCH v2] proc: speedup /proc/stat handling Eric Dumazet 0 siblings, 2 replies; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-27 1:09 UTC (permalink / raw) To: Andrew Morton Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Thu, 26 Jan 2012 16:43:42 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 26 Jan 2012 18:55:20 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > On Wed, 25 Jan 2012 17:04:16 -0800 > > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > On Wed, 25 Jan 2012 06:29:32 +0100 > > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > > > Le mardi 24 janvier 2012 __ 17:27 -0800, Andrew Morton a __crit : > > > > > > > > > I had a fiddle on an 8-way x86_64 machine. I'm unable to demonstrate > > > > > any improvement for either of > > > > > > > > > > time (for i in $(seq 1000); do; cat /proc/self/stat > /dev/null; done) > > > > > time (for i in $(seq 1000); do; cat /proc/1/stat > /dev/null; done) > > > > > > > > > > oh well. > > > > > > > > What size is /proc/stat ? > > > > > > About 40mm, but it depends on the font size. > > > > > > > wc -c /proc/stat > > > > > > > > If under 4096, there is no problem with existing code. > > > > > > akpm2:/home/akpm> wc -c /proc/stat > > > 2800 /proc/stat > > > > > > > I had the problem on a 16-way machine. > > > > > > OK.. > > > > > > I wrote following patch just for my fun, which makes /proc/stat twice fast. > > But I'm not sure whether this kind of dirty && special printk is worth to do or not.. > > because I can't see /proc/stat cost at shell-scripting. > > It is rather a lot of not-very-general infrastructure. > > > > > ... > > > > @@ -131,8 +143,8 @@ static int show_stat(struct seq_file *p, void *v) > > seq_printf(p, "intr %llu", (unsigned long long)sum); > > > > /* sum again ? it could be updated? */ > > - for_each_irq_nr(j) > > - seq_printf(p, " %u", kstat_irqs(j)); > > + j = 0; > > + seq_printnum_batch(p, " %u", &j, get_next_kstat_irq); > > I expect most of these numbers are zero. I wonder if we would get > useful speedups from > > for_each_irq_nr(j) { > /* Apologetic comment goes here */ > if (kstat_irqs(j)) > seq_printf(p, " %u", kstat_irqs(j)); > else > seq_puts(p, " 0"); > } Yes. This is very good optimization and shows much optimization. I did this at first try but did complicated ones because it seems not interesting. (This is my bad habit...) I'll try again and measure time. Thanks, -Kame ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-27 1:09 ` KAMEZAWA Hiroyuki @ 2012-01-27 1:18 ` Andrew Morton 2012-01-30 5:16 ` [PATCH] Add num_to_str() for speedup /proc/stat KAMEZAWA Hiroyuki 2012-01-27 7:09 ` [PATCH v2] proc: speedup /proc/stat handling Eric Dumazet 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-27 1:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Fri, 27 Jan 2012 10:09:33 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > I expect most of these numbers are zero. I wonder if we would get > > useful speedups from > > > > for_each_irq_nr(j) { > > /* Apologetic comment goes here */ > > if (kstat_irqs(j)) > > seq_printf(p, " %u", kstat_irqs(j)); > > else > > seq_puts(p, " 0"); > > } > > Yes. This is very good optimization and shows much optimization. > I did this at first try but did complicated ones because it seems > not interesting. (This is my bad habit...) > > I'll try again and measure time. seq_puts() is too slow ;) I bet seq_putc(p, ' ');seq_putc(p, '0') will complete in negative time. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] Add num_to_str() for speedup /proc/stat 2012-01-27 1:18 ` Andrew Morton @ 2012-01-30 5:16 ` KAMEZAWA Hiroyuki 2012-01-30 23:20 ` Andrew Morton 2012-02-01 14:43 ` Andrea Righi 0 siblings, 2 replies; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-30 5:16 UTC (permalink / raw) To: Andrew Morton Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Thu, 26 Jan 2012 17:18:00 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 27 Jan 2012 10:09:33 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > I expect most of these numbers are zero. I wonder if we would get > > > useful speedups from > > > > > > for_each_irq_nr(j) { > > > /* Apologetic comment goes here */ > > > if (kstat_irqs(j)) > > > seq_printf(p, " %u", kstat_irqs(j)); > > > else > > > seq_puts(p, " 0"); > > > } > > > > Yes. This is very good optimization and shows much optimization. > > I did this at first try but did complicated ones because it seems > > not interesting. (This is my bad habit...) > > > > I'll try again and measure time. > > seq_puts() is too slow ;) I bet seq_putc(p, ' ');seq_putc(p, '0') will > complete in negative time. How about this ? I think this is simple enough. = >From d1f215d4279152362721fd2ca74241fe85afe2ce Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Date: Mon, 30 Jan 2012 14:15:12 +0900 Subject: [PATCH] Add num_to_str() for speedup /proc/stat At reading /proc/stat, most of time is consumed by vsnprintf() at el. Here is a test script, reading /proc/stat 1000 times.. == stat_check.py num = 0 with open("/proc/stat") as f: while num < 1000 : data = f.read() f.seek(0, 0) num = num + 1 == perf shows 20.39% stat_check.py [kernel.kallsyms] [k] format_decode 13.41% stat_check.py [kernel.kallsyms] [k] number 12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf 10.85% stat_check.py [kernel.kallsyms] [k] memcpy 4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup 4.43% stat_check.py [kernel.kallsyms] [k] seq_printf This patch removes most of calls to vsnprintf() by adding num_to_str() and seq_print_decimal_ull(), which prints decimal numbers without rich functions provided by printf(). On my 8cpu box. == Before patch == [root@bluextal test]# time ./stat_check.py real 0m0.150s user 0m0.026s sys 0m0.121s == After patch == [root@bluextal test]# time ./stat_check.py real 0m0.055s user 0m0.022s sys 0m0.030s Maybe it's worth to add this simple function. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- fs/proc/stat.c | 55 ++++++++++++++++++++++----------------------- fs/seq_file.c | 34 ++++++++++++++++++++++++++++ include/linux/kernel.h | 8 ++++++ include/linux/seq_file.h | 5 +++- lib/vsprintf.c | 14 +++++++++++ 5 files changed, 87 insertions(+), 29 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 121f77c..0ff3b92 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -89,18 +89,19 @@ static int show_stat(struct seq_file *p, void *v) } sum += arch_irq_stat(); - seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " - "%llu\n", - (unsigned long long)cputime64_to_clock_t(user), - (unsigned long long)cputime64_to_clock_t(nice), - (unsigned long long)cputime64_to_clock_t(system), - (unsigned long long)cputime64_to_clock_t(idle), - (unsigned long long)cputime64_to_clock_t(iowait), - (unsigned long long)cputime64_to_clock_t(irq), - (unsigned long long)cputime64_to_clock_t(softirq), - (unsigned long long)cputime64_to_clock_t(steal), - (unsigned long long)cputime64_to_clock_t(guest), - (unsigned long long)cputime64_to_clock_t(guest_nice)); + seq_puts(p, "cpu "); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice)); + seq_putc(p, '\n'); + for_each_online_cpu(i) { /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ user = kcpustat_cpu(i).cpustat[CPUTIME_USER]; @@ -113,26 +114,24 @@ static int show_stat(struct seq_file *p, void *v) steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - seq_printf(p, - "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " - "%llu\n", - i, - (unsigned long long)cputime64_to_clock_t(user), - (unsigned long long)cputime64_to_clock_t(nice), - (unsigned long long)cputime64_to_clock_t(system), - (unsigned long long)cputime64_to_clock_t(idle), - (unsigned long long)cputime64_to_clock_t(iowait), - (unsigned long long)cputime64_to_clock_t(irq), - (unsigned long long)cputime64_to_clock_t(softirq), - (unsigned long long)cputime64_to_clock_t(steal), - (unsigned long long)cputime64_to_clock_t(guest), - (unsigned long long)cputime64_to_clock_t(guest_nice)); + seq_printf(p, "cpu %d", i); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest)); + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice)); + seq_putc(p, '\n'); } seq_printf(p, "intr %llu", (unsigned long long)sum); /* sum again ? it could be updated? */ for_each_irq_nr(j) - seq_printf(p, " %u", kstat_irqs(j)); + seq_put_decimal_ull(p, ' ', kstat_irqs(j)); seq_printf(p, "\nctxt %llu\n" @@ -149,7 +148,7 @@ static int show_stat(struct seq_file *p, void *v) seq_printf(p, "softirq %llu", (unsigned long long)sum_softirq); for (i = 0; i < NR_SOFTIRQS; i++) - seq_printf(p, " %u", per_softirq_sums[i]); + seq_put_decimal_ull(p, ' ', per_softirq_sums[i]); seq_putc(p, '\n'); return 0; diff --git a/fs/seq_file.c b/fs/seq_file.c index 4023d6b..a1ccb48 100644 --- a/fs/seq_file.c +++ b/fs/seq_file.c @@ -642,6 +642,40 @@ int seq_puts(struct seq_file *m, const char *s) } EXPORT_SYMBOL(seq_puts); +/* + * A helper routine for putting decimal numbers without rich format of printf(). + * only 'unsigned long long' is supported. + * This routine will put one byte delimiter + number into seq_file. + * This routine is very quick when you show lots of numbers. + * In usual cases, it will be better to use seq_printf(). It's easier to read. + */ +int seq_put_decimal_ull(struct seq_file *m, char delimiter, + unsigned long long num) +{ + int len; + + if (m->count + 2 >= m->size) /* we'll write 2 bytes at least */ + goto overflow; + + if (num < 10) { + m->buf[m->count++] = delimiter; + m->buf[m->count++] = num + '0'; + return 0; + } + + m->buf[m->count++] = delimiter; + + len = num_to_str(m->buf + m->count, m->size - m->count, num); + if (!len) + goto overflow; + m->count += len; + return 0; +overflow: + m->count = m->size; + return -1; +} +EXPORT_SYMBOL(seq_put_decimal_ull); + /** * seq_write - write arbitrary data to buffer * @seq: seq_file identifying the buffer to which data should be written diff --git a/include/linux/kernel.h b/include/linux/kernel.h index e834342..bb2da3f 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -299,6 +299,14 @@ extern long long simple_strtoll(const char *,char **,unsigned int); #define strict_strtoull kstrtoull #define strict_strtoll kstrtoll +/* + * Convert passed number to decimal string. + * returns returns the length of string. at buffer overflow, returns 0. + * + * If speed is not important, use snprintf(). It's easy to read the code. + */ +extern int num_to_str(char *buf, int size, unsigned long long num); + /* lib/printf utilities */ extern __printf(2, 3) int sprintf(char *buf, const char * fmt, ...); diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 44f1514..b58a95c 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -122,8 +122,11 @@ void *__seq_open_private(struct file *, const struct seq_operations *, int); int seq_open_private(struct file *, const struct seq_operations *, int); int seq_release_private(struct inode *, struct file *); -#define SEQ_START_TOKEN ((void *)1) +/* defined in lib/vsprintf.c */ +int seq_put_decimal_ull(struct seq_file *m, char delimiter, + unsigned long long num); +#define SEQ_START_TOKEN ((void *)1) /* * Helpers for iteration over list_head-s in seq_files */ diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 8e75003..ffb81b1 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -212,6 +212,20 @@ char *put_dec(char *buf, unsigned long long num) } } +int num_to_str(char *buf, int size, unsigned long long num) +{ + char tmp[66]; + int idx, len; + + len = put_dec(tmp, num) - tmp; + + if (len > size) + return 0; + for (idx = 0; idx < len; ++idx) + buf[idx] = tmp[len - idx - 1]; + return len; +} + #define ZEROPAD 1 /* pad with zero */ #define SIGN 2 /* unsigned/signed long */ #define PLUS 4 /* show plus */ -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] Add num_to_str() for speedup /proc/stat 2012-01-30 5:16 ` [PATCH] Add num_to_str() for speedup /proc/stat KAMEZAWA Hiroyuki @ 2012-01-30 23:20 ` Andrew Morton 2012-01-30 23:58 ` KAMEZAWA Hiroyuki 2012-02-01 14:43 ` Andrea Righi 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2012-01-30 23:20 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Mon, 30 Jan 2012 14:16:19 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Date: Mon, 30 Jan 2012 14:15:12 +0900 > Subject: [PATCH] Add num_to_str() for speedup /proc/stat > > At reading /proc/stat, most of time is consumed by vsnprintf() at el. > > Here is a test script, reading /proc/stat 1000 times.. > > == stat_check.py > num = 0 > with open("/proc/stat") as f: > while num < 1000 : > data = f.read() > f.seek(0, 0) > num = num + 1 > == > > perf shows > > 20.39% stat_check.py [kernel.kallsyms] [k] format_decode > 13.41% stat_check.py [kernel.kallsyms] [k] number > 12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf > 10.85% stat_check.py [kernel.kallsyms] [k] memcpy > 4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup > 4.43% stat_check.py [kernel.kallsyms] [k] seq_printf > > This patch removes most of calls to vsnprintf() by adding > num_to_str() and seq_print_decimal_ull(), which prints decimal numbers > without rich functions provided by printf(). > > On my 8cpu box. > == Before patch == > [root@bluextal test]# time ./stat_check.py > > real 0m0.150s > user 0m0.026s > sys 0m0.121s > > == After patch == > [root@bluextal test]# time ./stat_check.py > > real 0m0.055s > user 0m0.022s > sys 0m0.030s > > Maybe it's worth to add this simple function. I suppose so - the new infrastructure can be used elsewhere. I tried doing the if (kstst_irqs(j) == 0) { seq_putc(p, ' '); seq_putc(p, '0'); think on top of this and didn't observe any improvement. I made some changes - please review. I'm not sure why you did "char tmp[66]"? From: Andrew Morton <akpm@linux-foundation.org> Subject: procfs-add-num_to_str-to-speed-up-proc-stat-fix - remove incorrect comment - use less stack in num_to_str() - move comment from .h to .c - simplify seq_put_decimal_ull() Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/seq_file.c | 5 ++--- include/linux/kernel.h | 6 ------ include/linux/seq_file.h | 4 +--- lib/vsprintf.c | 8 +++++++- 4 files changed, 10 insertions(+), 13 deletions(-) diff -puN fs/proc/stat.c~procfs-add-num_to_str-to-speed-up-proc-stat-fix fs/proc/stat.c diff -puN fs/seq_file.c~procfs-add-num_to_str-to-speed-up-proc-stat-fix fs/seq_file.c --- a/fs/seq_file.c~procfs-add-num_to_str-to-speed-up-proc-stat-fix +++ a/fs/seq_file.c @@ -659,14 +659,13 @@ int seq_put_decimal_ull(struct seq_file if (m->count + 2 >= m->size) /* we'll write 2 bytes at least */ goto overflow; + m->buf[m->count++] = delimiter; + if (num < 10) { - m->buf[m->count++] = delimiter; m->buf[m->count++] = num + '0'; return 0; } - m->buf[m->count++] = delimiter; - len = num_to_str(m->buf + m->count, m->size - m->count, num); if (!len) goto overflow; diff -puN include/linux/kernel.h~procfs-add-num_to_str-to-speed-up-proc-stat-fix include/linux/kernel.h --- a/include/linux/kernel.h~procfs-add-num_to_str-to-speed-up-proc-stat-fix +++ a/include/linux/kernel.h @@ -299,12 +299,6 @@ extern long long simple_strtoll(const ch #define strict_strtoull kstrtoull #define strict_strtoll kstrtoll -/* - * Convert passed number to decimal string. - * returns returns the length of string. at buffer overflow, returns 0. - * - * If speed is not important, use snprintf(). It's easy to read the code. - */ extern int num_to_str(char *buf, int size, unsigned long long num); /* lib/printf utilities */ diff -puN include/linux/seq_file.h~procfs-add-num_to_str-to-speed-up-proc-stat-fix include/linux/seq_file.h --- a/include/linux/seq_file.h~procfs-add-num_to_str-to-speed-up-proc-stat-fix +++ a/include/linux/seq_file.h @@ -121,10 +121,8 @@ int single_release(struct inode *, struc void *__seq_open_private(struct file *, const struct seq_operations *, int); int seq_open_private(struct file *, const struct seq_operations *, int); int seq_release_private(struct inode *, struct file *); - -/* defined in lib/vsprintf.c */ int seq_put_decimal_ull(struct seq_file *m, char delimiter, - unsigned long long num); + unsigned long long num); #define SEQ_START_TOKEN ((void *)1) /* diff -puN lib/vsprintf.c~procfs-add-num_to_str-to-speed-up-proc-stat-fix lib/vsprintf.c --- a/lib/vsprintf.c~procfs-add-num_to_str-to-speed-up-proc-stat-fix +++ a/lib/vsprintf.c @@ -212,9 +212,15 @@ char *put_dec(char *buf, unsigned long l } } +/* + * Convert passed number to decimal string. + * Returns the length of string. On buffer overflow, returns 0. + * + * If speed is not important, use snprintf(). It's easy to read the code. + */ int num_to_str(char *buf, int size, unsigned long long num) { - char tmp[66]; + char tmp[21]; /* Enough for 2^64 in decimal */ int idx, len; len = put_dec(tmp, num) - tmp; _ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Add num_to_str() for speedup /proc/stat 2012-01-30 23:20 ` Andrew Morton @ 2012-01-30 23:58 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-30 23:58 UTC (permalink / raw) To: Andrew Morton Cc: Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Mon, 30 Jan 2012 15:20:51 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 30 Jan 2012 14:16:19 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Date: Mon, 30 Jan 2012 14:15:12 +0900 > > Subject: [PATCH] Add num_to_str() for speedup /proc/stat > > On my 8cpu box. > > == Before patch == > > [root@bluextal test]# time ./stat_check.py > > > > real 0m0.150s > > user 0m0.026s > > sys 0m0.121s > > > > == After patch == > > [root@bluextal test]# time ./stat_check.py > > > > real 0m0.055s > > user 0m0.022s > > sys 0m0.030s > > > > Maybe it's worth to add this simple function. > > I suppose so - the new infrastructure can be used elsewhere. > > I tried doing the > > if (kstst_irqs(j) == 0) { > seq_putc(p, ' '); > seq_putc(p, '0'); > > think on top of this and didn't observe any improvement. > > > I made some changes - please review. I'm not sure why you did "char > tmp[66]"? > Your fix seems fine to me. I'm sorry I copied tmp[66] from number().. and yes, 0xffffffffffffffff=18446744073709551615 , tmp[21] will be enough. I'll prepare patch for /proc/<pid>/stat and see 'ps' and 'top' performance. Thanks, -Kame ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Add num_to_str() for speedup /proc/stat 2012-01-30 5:16 ` [PATCH] Add num_to_str() for speedup /proc/stat KAMEZAWA Hiroyuki 2012-01-30 23:20 ` Andrew Morton @ 2012-02-01 14:43 ` Andrea Righi 2012-02-01 23:46 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 26+ messages in thread From: Andrea Righi @ 2012-02-01 14:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Mon, Jan 30, 2012 at 02:16:19PM +0900, KAMEZAWA Hiroyuki wrote: ... > diff --git a/fs/proc/stat.c b/fs/proc/stat.c > index 121f77c..0ff3b92 100644 > --- a/fs/proc/stat.c > +++ b/fs/proc/stat.c > @@ -89,18 +89,19 @@ static int show_stat(struct seq_file *p, void *v) > } > sum += arch_irq_stat(); > > - seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " > - "%llu\n", > - (unsigned long long)cputime64_to_clock_t(user), > - (unsigned long long)cputime64_to_clock_t(nice), > - (unsigned long long)cputime64_to_clock_t(system), > - (unsigned long long)cputime64_to_clock_t(idle), > - (unsigned long long)cputime64_to_clock_t(iowait), > - (unsigned long long)cputime64_to_clock_t(irq), > - (unsigned long long)cputime64_to_clock_t(softirq), > - (unsigned long long)cputime64_to_clock_t(steal), > - (unsigned long long)cputime64_to_clock_t(guest), > - (unsigned long long)cputime64_to_clock_t(guest_nice)); > + seq_puts(p, "cpu "); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest)); > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice)); > + seq_putc(p, '\n'); > + > for_each_online_cpu(i) { > /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ > user = kcpustat_cpu(i).cpustat[CPUTIME_USER]; > @@ -113,26 +114,24 @@ static int show_stat(struct seq_file *p, void *v) > steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; > guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; > guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; > - seq_printf(p, > - "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " > - "%llu\n", > - i, > - (unsigned long long)cputime64_to_clock_t(user), > - (unsigned long long)cputime64_to_clock_t(nice), > - (unsigned long long)cputime64_to_clock_t(system), > - (unsigned long long)cputime64_to_clock_t(idle), > - (unsigned long long)cputime64_to_clock_t(iowait), > - (unsigned long long)cputime64_to_clock_t(irq), > - (unsigned long long)cputime64_to_clock_t(softirq), > - (unsigned long long)cputime64_to_clock_t(steal), > - (unsigned long long)cputime64_to_clock_t(guest), > - (unsigned long long)cputime64_to_clock_t(guest_nice)); > + seq_printf(p, "cpu %d", i); ^^^^^^ mmh... if I'm not wrong this looks like an ABI change. Thanks, -Andrea --- From: Andrea Righi <andrea@betterlinux.com> Subject: procfs: avoid breaking the ABI in /proc/stat The speed up of /proc/stat changed the output of the cpu statistics adding an extra space between the string "cpu" and the cpu id. Restore the old ABI by removing this space. Signed-off-by: Andrea Righi <andrea@betterlinux.com> --- fs/proc/stat.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 0ff3b92..7ed7efa 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -114,7 +114,7 @@ static int show_stat(struct seq_file *p, void *v) steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; - seq_printf(p, "cpu %d", i); + seq_printf(p, "cpu%d", i); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system)); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] Add num_to_str() for speedup /proc/stat 2012-02-01 14:43 ` Andrea Righi @ 2012-02-01 23:46 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-01 23:46 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Eric Dumazet, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 1 Feb 2012 15:43:33 +0100 Andrea Righi <andrea@betterlinux.com> wrote: > On Mon, Jan 30, 2012 at 02:16:19PM +0900, KAMEZAWA Hiroyuki wrote: > ... > > diff --git a/fs/proc/stat.c b/fs/proc/stat.c > > index 121f77c..0ff3b92 100644 > > --- a/fs/proc/stat.c > > +++ b/fs/proc/stat.c > > @@ -89,18 +89,19 @@ static int show_stat(struct seq_file *p, void *v) > > } > > sum += arch_irq_stat(); > > > > - seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " > > - "%llu\n", > > - (unsigned long long)cputime64_to_clock_t(user), > > - (unsigned long long)cputime64_to_clock_t(nice), > > - (unsigned long long)cputime64_to_clock_t(system), > > - (unsigned long long)cputime64_to_clock_t(idle), > > - (unsigned long long)cputime64_to_clock_t(iowait), > > - (unsigned long long)cputime64_to_clock_t(irq), > > - (unsigned long long)cputime64_to_clock_t(softirq), > > - (unsigned long long)cputime64_to_clock_t(steal), > > - (unsigned long long)cputime64_to_clock_t(guest), > > - (unsigned long long)cputime64_to_clock_t(guest_nice)); > > + seq_puts(p, "cpu "); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(system)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(idle)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(iowait)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(irq)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(softirq)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest)); > > + seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice)); > > + seq_putc(p, '\n'); > > + > > for_each_online_cpu(i) { > > /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ > > user = kcpustat_cpu(i).cpustat[CPUTIME_USER]; > > @@ -113,26 +114,24 @@ static int show_stat(struct seq_file *p, void *v) > > steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; > > guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; > > guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; > > - seq_printf(p, > > - "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " > > - "%llu\n", > > - i, > > - (unsigned long long)cputime64_to_clock_t(user), > > - (unsigned long long)cputime64_to_clock_t(nice), > > - (unsigned long long)cputime64_to_clock_t(system), > > - (unsigned long long)cputime64_to_clock_t(idle), > > - (unsigned long long)cputime64_to_clock_t(iowait), > > - (unsigned long long)cputime64_to_clock_t(irq), > > - (unsigned long long)cputime64_to_clock_t(softirq), > > - (unsigned long long)cputime64_to_clock_t(steal), > > - (unsigned long long)cputime64_to_clock_t(guest), > > - (unsigned long long)cputime64_to_clock_t(guest_nice)); > > + seq_printf(p, "cpu %d", i); > ^^^^^^ > mmh... if I'm not wrong this looks like an ABI change. > > Thanks, > -Andrea > > --- > From: Andrea Righi <andrea@betterlinux.com> > Subject: procfs: avoid breaking the ABI in /proc/stat > > The speed up of /proc/stat changed the output of the cpu statistics > adding an extra space between the string "cpu" and the cpu id. > > Restore the old ABI by removing this space. > > Signed-off-by: Andrea Righi <andrea@betterlinux.com> > --- > fs/proc/stat.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/proc/stat.c b/fs/proc/stat.c > index 0ff3b92..7ed7efa 100644 > --- a/fs/proc/stat.c > +++ b/fs/proc/stat.c > @@ -114,7 +114,7 @@ static int show_stat(struct seq_file *p, void *v) > steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; > guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; > guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; > - seq_printf(p, "cpu %d", i); > + seq_printf(p, "cpu%d", i); Ah, I'm very soory. And thank you very much catching this early ! -Kame ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-27 1:09 ` KAMEZAWA Hiroyuki 2012-01-27 1:18 ` Andrew Morton @ 2012-01-27 7:09 ` Eric Dumazet 1 sibling, 0 replies; 26+ messages in thread From: Eric Dumazet @ 2012-01-27 7:09 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner Le vendredi 27 janvier 2012 à 10:09 +0900, KAMEZAWA Hiroyuki a écrit : > On Thu, 26 Jan 2012 16:43:42 -0800 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > I expect most of these numbers are zero. I wonder if we would get > > useful speedups from > > > > for_each_irq_nr(j) { > > /* Apologetic comment goes here */ > > if (kstat_irqs(j)) > > seq_printf(p, " %u", kstat_irqs(j)); > > else > > seq_puts(p, " 0"); > > } > > Yes. This is very good optimization and shows much optimization. > I did this at first try but did complicated ones because it seems > not interesting. (This is my bad habit...) > > I'll try again and measure time. > Also make it generic maybe, and evaluate kstat_irqs(i) once. seq_put_u32(struct seq_file *m, u32 val); Converting an "u32" to its decimal representation should be fast, even for non zero values. Many other users could benefit from this. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:01 ` [PATCH v2] " Eric Dumazet 2012-01-25 0:12 ` Andrew Morton @ 2012-01-25 0:18 ` KAMEZAWA Hiroyuki 2012-01-25 0:26 ` Eric Dumazet 1 sibling, 1 reply; 26+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-01-25 0:18 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner On Wed, 25 Jan 2012 01:01:23 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 > bytes, and is slow : > > # strace -T -o /tmp/STRACE cat /proc/stat | wc -c > 5826 > # grep "cpu " /tmp/STRACE > read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504> > > > Thats partly because show_stat() must be called twice since initial > buffer size is too small (4096 bytes for less than 32 possible cpus) > > Fix this by : > > 1) Taking into account nr_irqs in the initial buffer sizing. > > 2) Using ksize() to allow better filling of initial buffer. > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Glauber Costa <glommer@parallels.com> > Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Paul Tuner <pjt@google.com> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> BTW, what is the reason of this change ? > - unsigned size = 4096 * (1 + num_possible_cpus() / 32); > + unsigned size = 1024 + 128 * num_possible_cpus(); I think size of buffer is affected by the number of online cpus. (Maybe 128 is enough but please add comment why 128 ?) And > size += 2 * nr_irqs; If this assumption fails, the issue comes again... -Kame ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:18 ` KAMEZAWA Hiroyuki @ 2012-01-25 0:26 ` Eric Dumazet 2012-01-30 8:06 ` Jörg-Volker Peetz 0 siblings, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2012-01-25 0:26 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner Le mercredi 25 janvier 2012 à 09:18 +0900, KAMEZAWA Hiroyuki a écrit : > BTW, what is the reason of this change ? > > > - unsigned size = 4096 * (1 + num_possible_cpus() / 32); > > + unsigned size = 1024 + 128 * num_possible_cpus(); > > I think size of buffer is affected by the number of online cpus. > (Maybe 128 is enough but please add comment why 128 ?) > There is no change, as 4096/32 is 128 bytes per cpu. Only change is granularity is not any more 32 cpus, but one. Of course, kmalloc() is going to roundup to next power of two anyway. So real allocation is bigger, unless we switch to vmalloc() eventually. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-25 0:26 ` Eric Dumazet @ 2012-01-30 8:06 ` Jörg-Volker Peetz 2012-01-30 9:25 ` Eric Dumazet 0 siblings, 1 reply; 26+ messages in thread From: Jörg-Volker Peetz @ 2012-01-30 8:06 UTC (permalink / raw) To: linux-kernel Cc: KAMEZAWA Hiroyuki, Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, linux-kernel, Russell King - ARM Linux, Paul Tuner Eric Dumazet wrote, on 01/25/12 01:26: > Le mercredi 25 janvier 2012 à 09:18 +0900, KAMEZAWA Hiroyuki a écrit : > >> BTW, what is the reason of this change ? >> >>> - unsigned size = 4096 * (1 + num_possible_cpus() / 32); >>> + unsigned size = 1024 + 128 * num_possible_cpus(); >> >> I think size of buffer is affected by the number of online cpus. >> (Maybe 128 is enough but please add comment why 128 ?) >> > > There is no change, as 4096/32 is 128 bytes per cpu. > Wrong math, only num_possible_cpus() is divided by 32. Thus, - unsigned size = 4096 * (1 + num_possible_cpus() / 32); + unsigned size = 4096 + 128 * num_possible_cpus(); <snip> -- Best regards, Jörg-Volker. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-30 8:06 ` Jörg-Volker Peetz @ 2012-01-30 9:25 ` Eric Dumazet 2012-01-30 10:00 ` Jörg-Volker Peetz 0 siblings, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2012-01-30 9:25 UTC (permalink / raw) To: Jörg-Volker Peetz Cc: linux-kernel, KAMEZAWA Hiroyuki, Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, Paul Tuner Le lundi 30 janvier 2012 à 09:06 +0100, Jörg-Volker Peetz a écrit : > Eric Dumazet wrote, on 01/25/12 01:26: > > Le mercredi 25 janvier 2012 à 09:18 +0900, KAMEZAWA Hiroyuki a écrit : > > > >> BTW, what is the reason of this change ? > >> > >>> - unsigned size = 4096 * (1 + num_possible_cpus() / 32); > >>> + unsigned size = 1024 + 128 * num_possible_cpus(); > >> > >> I think size of buffer is affected by the number of online cpus. > >> (Maybe 128 is enough but please add comment why 128 ?) > >> > > > > There is no change, as 4096/32 is 128 bytes per cpu. > > > > Wrong math, only num_possible_cpus() is divided by 32. Thus, > > - unsigned size = 4096 * (1 + num_possible_cpus() / 32); > + unsigned size = 4096 + 128 * num_possible_cpus(); > > <snip> It is good math, once you take the time to think a bit about it. The original question was about the 128 * num_possible_cpus() 4096/32 is 128 as I said. The 4096 -> 1024 is just taking into account fact that once you do the correct computations, you dont need initial 4096 value, and 1024 is more than enough. Example on a dual core machine : # dmesg|grep nr_irq [ 0.000000] nr_irqs_gsi: 40 [ 0.000000] NR_IRQS:2304 nr_irqs:712 16 size = 1024 + 2*128 + 2*712 = 2704 bytes (rounded to 4096 by kmalloc()) # wc -c /proc/stat 1767 /proc/stat Problem with original math was that for a machine with 16 cpus or a machine with 1 cpu, we ended with the same 4096 value. That was a real problem. If we instead use "unsigned size = 4096 + 128 * num_possible_cpus();" as you suggest, we would always allocate 2 pages of memory, this is not needed at all for typical 1/2/4 way machines. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2] proc: speedup /proc/stat handling 2012-01-30 9:25 ` Eric Dumazet @ 2012-01-30 10:00 ` Jörg-Volker Peetz 0 siblings, 0 replies; 26+ messages in thread From: Jörg-Volker Peetz @ 2012-01-30 10:00 UTC (permalink / raw) To: linux-kernel Cc: linux-kernel, KAMEZAWA Hiroyuki, Andrew Morton, Glauber Costa, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, Paul Tuner Eric Dumazet wrote, on 01/30/12 10:25: > Le lundi 30 janvier 2012 à 09:06 +0100, Jörg-Volker Peetz a écrit : >> Eric Dumazet wrote, on 01/25/12 01:26: >>> Le mercredi 25 janvier 2012 à 09:18 +0900, KAMEZAWA Hiroyuki a écrit : >>> >>>> BTW, what is the reason of this change ? >>>> >>>>> - unsigned size = 4096 * (1 + num_possible_cpus() / 32); >>>>> + unsigned size = 1024 + 128 * num_possible_cpus(); >>>> >>>> I think size of buffer is affected by the number of online cpus. >>>> (Maybe 128 is enough but please add comment why 128 ?) >>>> >>> >>> There is no change, as 4096/32 is 128 bytes per cpu. >>> >> >> Wrong math, only num_possible_cpus() is divided by 32. Thus, >> >> - unsigned size = 4096 * (1 + num_possible_cpus() / 32); >> + unsigned size = 4096 + 128 * num_possible_cpus(); >> >> <snip> > > > It is good math, once you take the time to think a bit about it. > > The original question was about the 128 * num_possible_cpus() > > 4096/32 is 128 as I said. > > The 4096 -> 1024 is just taking into account fact that once you do the > correct computations, you dont need initial 4096 value, and 1024 is more > than enough. Thanks for your explanation, that makes sense now. At first, I was mislead by your comment "There is no change" which was apparently related to the second term in the formula. > > Example on a dual core machine : > > # dmesg|grep nr_irq > [ 0.000000] nr_irqs_gsi: 40 > [ 0.000000] NR_IRQS:2304 nr_irqs:712 16 > > size = 1024 + 2*128 + 2*712 = 2704 bytes (rounded to 4096 by kmalloc()) > > # wc -c /proc/stat > 1767 /proc/stat > > Problem with original math was that for a machine with 16 cpus or a > machine with 1 cpu, we ended with the same 4096 value. That was a real > problem. > > If we instead use "unsigned size = 4096 + 128 * num_possible_cpus();" as > you suggest, we would always allocate 2 pages of memory, this is not > needed at all for typical 1/2/4 way machines. > -- Best regards, Jörg-Volker. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2012-02-01 23:48 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-20 15:59 [PATCH] proc: speedup /proc/stat handling Eric Dumazet 2012-01-20 22:55 ` Andrew Morton 2012-01-23 10:16 ` KAMEZAWA Hiroyuki 2012-01-23 10:33 ` Glauber Costa 2012-01-24 1:25 ` KAMEZAWA Hiroyuki 2012-01-25 0:01 ` [PATCH v2] " Eric Dumazet 2012-01-25 0:12 ` Andrew Morton 2012-01-25 0:22 ` Eric Dumazet 2012-01-25 1:27 ` Andrew Morton 2012-01-25 5:29 ` Eric Dumazet 2012-01-26 1:04 ` Andrew Morton 2012-01-26 9:55 ` KAMEZAWA Hiroyuki 2012-01-27 0:43 ` Andrew Morton 2012-01-27 1:09 ` KAMEZAWA Hiroyuki 2012-01-27 1:18 ` Andrew Morton 2012-01-30 5:16 ` [PATCH] Add num_to_str() for speedup /proc/stat KAMEZAWA Hiroyuki 2012-01-30 23:20 ` Andrew Morton 2012-01-30 23:58 ` KAMEZAWA Hiroyuki 2012-02-01 14:43 ` Andrea Righi 2012-02-01 23:46 ` KAMEZAWA Hiroyuki 2012-01-27 7:09 ` [PATCH v2] proc: speedup /proc/stat handling Eric Dumazet 2012-01-25 0:18 ` KAMEZAWA Hiroyuki 2012-01-25 0:26 ` Eric Dumazet 2012-01-30 8:06 ` Jörg-Volker Peetz 2012-01-30 9:25 ` Eric Dumazet 2012-01-30 10:00 ` Jörg-Volker Peetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).