linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: reduce /proc/schedstat access times
@ 2012-02-02 20:55 Eric Dumazet
  2012-02-03  7:52 ` Ingo Molnar
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Dumazet @ 2012-02-02 20:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra

On a 16 cpus NUMA machine, we can have quite a long /proc/schedstat

# wc -c /proc/schedstat 
8355 /proc/schedstat

It appears show_schedstat() is called three times, because initial
seq_file buffer size is underestimated.

seq buffer must be reallocated two times and we spend 3x more time than
needed.

A quick fix is to maintain the maximum length reached in
show_schedstat() instead of guessing.

Also use seq_bitmap() to avoid the mask_str temp variable.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/stats.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 2a581ba..34bb9c8 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -11,15 +11,11 @@
  * format, so that tools can adapt (or abort)
  */
 #define SCHEDSTAT_VERSION 15
+static size_t max_schedstat_len = 4096;
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
 	int cpu;
-	int mask_len = DIV_ROUND_UP(NR_CPUS, 32) * 9;
-	char *mask_str = kmalloc(mask_len, GFP_KERNEL);
-
-	if (mask_str == NULL)
-		return -ENOMEM;
 
 	seq_printf(seq, "version %d\n", SCHEDSTAT_VERSION);
 	seq_printf(seq, "timestamp %lu\n", jiffies);
@@ -47,9 +43,9 @@ static int show_schedstat(struct seq_file *seq, void *v)
 		for_each_domain(cpu, sd) {
 			enum cpu_idle_type itype;
 
-			cpumask_scnprintf(mask_str, mask_len,
-					  sched_domain_span(sd));
-			seq_printf(seq, "domain%d %s", dcount++, mask_str);
+			seq_printf(seq, "domain%d ", dcount++);
+			seq_bitmap(seq, cpumask_bits(sched_domain_span(sd)),
+				   nr_cpumask_bits);
 			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
 					itype++) {
 				seq_printf(seq, " %u %u %u %u %u %u %u %u",
@@ -73,14 +69,13 @@ static int show_schedstat(struct seq_file *seq, void *v)
 		rcu_read_unlock();
 #endif
 	}
-	kfree(mask_str);
+	max_schedstat_len = max(max_schedstat_len, seq->count);
 	return 0;
 }
 
 static int schedstat_open(struct inode *inode, struct file *file)
 {
-	unsigned int size = PAGE_SIZE * (1 + num_online_cpus() / 32);
-	char *buf = kmalloc(size, GFP_KERNEL);
+	char *buf = kmalloc(max_schedstat_len, GFP_KERNEL);
 	struct seq_file *m;
 	int res;
 
@@ -90,7 +85,7 @@ static int schedstat_open(struct inode *inode, struct file *file)
 	if (!res) {
 		m = file->private_data;
 		m->buf = buf;
-		m->size = size;
+		m->size = ksize(buf);
 	} else
 		kfree(buf);
 	return res;



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] sched: reduce /proc/schedstat access times
  2012-02-02 20:55 [PATCH] sched: reduce /proc/schedstat access times Eric Dumazet
@ 2012-02-03  7:52 ` Ingo Molnar
  0 siblings, 0 replies; 2+ messages in thread
From: Ingo Molnar @ 2012-02-03  7:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On a 16 cpus NUMA machine, we can have quite a long /proc/schedstat
> 
> # wc -c /proc/schedstat 
> 8355 /proc/schedstat

Btw., the long-term goal would be to make the schedstats info 
fully available via perf and integrate it into 'perf sched' - or 
'perf stat --sched' or 'perf schedstat' (whichever variant suits 
the person who first implements it).

> @@ -47,9 +43,9 @@ static int show_schedstat(struct seq_file *seq, void *v)
>  		for_each_domain(cpu, sd) {
>  			enum cpu_idle_type itype;
>  
> -			cpumask_scnprintf(mask_str, mask_len,
> -					  sched_domain_span(sd));
> -			seq_printf(seq, "domain%d %s", dcount++, mask_str);
> +			seq_printf(seq, "domain%d ", dcount++);
> +			seq_bitmap(seq, cpumask_bits(sched_domain_span(sd)),
> +				   nr_cpumask_bits);
>  			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
>  					itype++) {
>  				seq_printf(seq, " %u %u %u %u %u %u %u %u",

that way, via perf, all information gets passed in a binary 
fashion through the perf ring-buffer, so there's no formatting 
overhead (only during post-processing), no restart artifacts due 
to seqfile limitations, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-02-03  7:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-02 20:55 [PATCH] sched: reduce /proc/schedstat access times Eric Dumazet
2012-02-03  7:52 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).