All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
@ 2011-01-17 10:49 Ciju Rajan K
  2011-01-17 10:52 ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-17 10:49 UTC (permalink / raw)
  To: lkml
  Cc: Peter Zijlstra, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri,
	Ciju Rajan K

Hi All,

Here is the first version of the patch set, which updates the /proc/schedstat statistics. Please review the patches and let me know your thoughts.

Ciju Rajan K  (2):
    sched: Removing unused fields from /proc/schedstat
    sched: Updating the sched-stat documentation

 Documentation/scheduler/sched-stats.txt |  140 +++++++++++---------------------
 include/linux/sched.h                   |   11 --
 kernel/sched.c                          |    1 
 kernel/sched_debug.c                    |    1 
 kernel/sched_stats.h                    |   13 +-
 5 files changed, 56 insertions(+), 110 deletions(-)

-Ciju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat
  2011-01-17 10:49 [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Ciju Rajan K
@ 2011-01-17 10:52 ` Ciju Rajan K
  2011-01-17 10:54 ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
  2011-01-17 16:54 ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Peter Zijlstra
  2 siblings, 0 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-17 10:52 UTC (permalink / raw)
  To: lkml
  Cc: Ciju Rajan K, Peter Zijlstra, Ingo Molnar, Bharata B Rao,
	Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 112 bytes --]

This patch removes the unused statistics from /proc/schedstat. Also updates the request queue structure fields.

[-- Attachment #2: patch1_v1.patch --]
[-- Type: text/plain, Size: 2880 bytes --]

Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com>
---

diff -Naurp a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h	2011-01-17 00:49:08.000000000 +0530
+++ b/include/linux/sched.h	2011-01-17 01:01:27.000000000 +0530
@@ -943,20 +943,9 @@ struct sched_domain {
 	unsigned int alb_failed;
 	unsigned int alb_pushed;
 
-	/* SD_BALANCE_EXEC stats */
-	unsigned int sbe_count;
-	unsigned int sbe_balanced;
-	unsigned int sbe_pushed;
-
-	/* SD_BALANCE_FORK stats */
-	unsigned int sbf_count;
-	unsigned int sbf_balanced;
-	unsigned int sbf_pushed;
-
 	/* try_to_wake_up() stats */
 	unsigned int ttwu_wake_remote;
 	unsigned int ttwu_move_affine;
-	unsigned int ttwu_move_balance;
 #endif
 #ifdef CONFIG_SCHED_DEBUG
 	char *name;
diff -Naurp a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c	2011-01-17 00:32:14.000000000 +0530
+++ b/kernel/sched.c	2011-01-17 00:38:24.000000000 +0530
@@ -545,7 +545,6 @@ struct rq {
 	unsigned int yld_count;
 
 	/* schedule() stats */
-	unsigned int sched_switch;
 	unsigned int sched_count;
 	unsigned int sched_goidle;
 
diff -Naurp a/kernel/sched_debug.c b/kernel/sched_debug.c
--- a/kernel/sched_debug.c	2011-01-17 00:31:36.000000000 +0530
+++ b/kernel/sched_debug.c	2011-01-17 00:36:52.000000000 +0530
@@ -286,7 +286,6 @@ static void print_cpu(struct seq_file *m
 
 	P(yld_count);
 
-	P(sched_switch);
 	P(sched_count);
 	P(sched_goidle);
 #ifdef CONFIG_SMP
diff -Naurp a/kernel/sched_stats.h b/kernel/sched_stats.h
--- a/kernel/sched_stats.h	2011-01-17 00:31:56.000000000 +0530
+++ b/kernel/sched_stats.h	2011-01-17 11:49:50.000000000 +0530
@@ -4,7 +4,7 @@
  * bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16
 
 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -26,9 +26,9 @@ static int show_schedstat(struct seq_fil
 
 		/* runqueue-specific stats */
 		seq_printf(seq,
-		    "cpu%d %u %u %u %u %u %u %llu %llu %lu",
+		    "cpu%d %u %u %u %u %u %llu %llu %lu",
 		    cpu, rq->yld_count,
-		    rq->sched_switch, rq->sched_count, rq->sched_goidle,
+		    rq->sched_count, rq->sched_goidle,
 		    rq->ttwu_count, rq->ttwu_local,
 		    rq->rq_cpu_time,
 		    rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
@@ -57,12 +57,9 @@ static int show_schedstat(struct seq_fil
 				    sd->lb_nobusyg[itype]);
 			}
 			seq_printf(seq,
-				   " %u %u %u %u %u %u %u %u %u %u %u %u\n",
+				   " %u %u %u %u %u\n",
 			    sd->alb_count, sd->alb_failed, sd->alb_pushed,
-			    sd->sbe_count, sd->sbe_balanced, sd->sbe_pushed,
-			    sd->sbf_count, sd->sbf_balanced, sd->sbf_pushed,
-			    sd->ttwu_wake_remote, sd->ttwu_move_affine,
-			    sd->ttwu_move_balance);
+			    sd->ttwu_wake_remote, sd->ttwu_move_affine);
 		}
 		preempt_enable();
 #endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation
  2011-01-17 10:49 [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Ciju Rajan K
  2011-01-17 10:52 ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
@ 2011-01-17 10:54 ` Ciju Rajan K
  2011-01-17 16:54 ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Peter Zijlstra
  2 siblings, 0 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-17 10:54 UTC (permalink / raw)
  To: lkml
  Cc: Ciju Rajan K, Peter Zijlstra, Ingo Molnar, Bharata B Rao,
	Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 50 bytes --]

This patch updates the sched-stat documentation. 

[-- Attachment #2: patch2_v1.patch --]
[-- Type: text/plain, Size: 8550 bytes --]

Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com>
---

diff -Naurp a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt
--- a/Documentation/scheduler/sched-stats.txt	2011-01-17 01:07:47.000000000 +0530
+++ b/Documentation/scheduler/sched-stats.txt	2011-01-17 15:32:05.000000000 +0530
@@ -26,119 +26,81 @@ Note that any such script will necessari
 reason to change versions is changes in the output format.  For those wishing
 to write their own scripts, the fields are described here.
 
+The first two fields of /proc/schedstat indicates the version (current
+version is 16) and jiffies values. The following values are from 
+cpu & domain statistics.
+
 CPU statistics
 --------------
-cpu<N> 1 2 3 4 5 6 7 8 9 10 11 12
+The format is like this:
+
+cpu<N> 1 2 3 4 5 6 7 8
 
-NOTE: In the sched_yield() statistics, the active queue is considered empty
-    if it has only one process in it, since obviously the process calling
-    sched_yield() is that process.
-
-First four fields are sched_yield() statistics:
-     1) # of times both the active and the expired queue were empty
-     2) # of times just the active queue was empty
-     3) # of times just the expired queue was empty
-     4) # of times sched_yield() was called
-
-Next three are schedule() statistics:
-     5) # of times we switched to the expired queue and reused it
-     6) # of times schedule() was called
-     7) # of times schedule() left the processor idle
-
-Next two are try_to_wake_up() statistics:
-     8) # of times try_to_wake_up() was called
-     9) # of times try_to_wake_up() was called to wake up the local cpu
-
-Next three are statistics describing scheduling latency:
-    10) sum of all time spent running by tasks on this processor (in jiffies)
-    11) sum of all time spent waiting to run by tasks on this processor (in
-        jiffies)
-    12) # of timeslices run on this cpu
+NOTE: In the sched_yield() statistics, the active queue is considered
+      empty if it has only one process in it, since obviously the 
+      process calling sched_yield() is that process.
+
+     1) # of times sched_yield() was called on this CPU
+     2) # of times scheduler runs on this CPU
+     3) # of times scheduler picks idle task as next task on this CPU
+     4) # of times try_to_wake_up() is run on this CPU 
+        (Number of times task wakeup is attempted from this CPU)
+     5) # of times try_to_wake_up() wakes up a task on the same CPU 
+        (local wakeup)
+     6) Time(ns) for which tasks have run on this CPU
+     7) Time(ns) for which tasks on this CPU's runqueue have waited 
+        before getting to run on the CPU
+     8) # of tasks that have run on this CPU
 
 
 Domain statistics
 -----------------
-One of these is produced per domain for each cpu described. (Note that if
-CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output.)
+One of these is produced per domain for each cpu described. 
+(Note that if CONFIG_SMP is not defined, *no* domains are utilized
+ and these lines will not appear in the output.)
 
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
 
 The first field is a bit mask indicating what cpus this domain operates over.
 
-The next 24 are a variety of load_balance() statistics in grouped into types
-of idleness (idle, busy, and newly idle):
+The next 24 are a variety of load_balance() statistics grouped into
+types of idleness (idle, busy, and newly idle). The three idle 
+states are:
+
+CPU_NEWLY_IDLE:    Load balancer is being run on a CPU which is 
+                   about to enter IDLE state
+CPU_IDLE:          This state is entered after CPU_NEWLY_IDLE 
+                   state fails to find a new task for this CPU
+CPU_NOT_IDLE:      Load balancer is being run on a CPU when it is 
+                   not in IDLE state (busy times)
+
+There are eight stats available for each of the three idle states:
 
-     1) # of times in this domain load_balance() was called when the
-        cpu was idle
+     1) # of times in this domain load_balance() was called
      2) # of times in this domain load_balance() checked but found
-        the load did not require balancing when the cpu was idle
+        the load did not require balancing
      3) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was idle
+        more tasks and failed
      4) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was idle
-     5) # of times in this domain pull_task() was called when the cpu
-        was idle
+        load_balance() in this domain
+     5) # of times in this domain pull_task() was called
      6) # of times in this domain pull_task() was called even though
-        the target task was cache-hot when idle
+        the target task was cache-hot
      7) # of times in this domain load_balance() was called but did
-        not find a busier queue while the cpu was idle
-     8) # of times in this domain a busier queue was found while the
-        cpu was idle but no busier group was found
-
-     9) # of times in this domain load_balance() was called when the
-        cpu was busy
-    10) # of times in this domain load_balance() checked but found the
-        load did not require balancing when busy
-    11) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was busy
-    12) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was busy
-    13) # of times in this domain pull_task() was called when busy
-    14) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when busy
-    15) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was busy
-    16) # of times in this domain a busier queue was found while the cpu
-        was busy but no busier group was found
-
-    17) # of times in this domain load_balance() was called when the
-        cpu was just becoming idle
-    18) # of times in this domain load_balance() checked but found the
-        load did not require balancing when the cpu was just becoming idle
-    19) # of times in this domain load_balance() tried to move one or more
-        tasks and failed, when the cpu was just becoming idle
-    20) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was just becoming idle
-    21) # of times in this domain pull_task() was called when newly idle
-    22) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when just becoming idle
-    23) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was just becoming idle
-    24) # of times in this domain a busier queue was found while the cpu
-        was just becoming idle but no busier group was found
-
+        not find a busier queue
+     8) # of times in this domain a busier queue was found but no 
+        busier group was found
+     
    Next three are active_load_balance() statistics:
     25) # of times active_load_balance() was called
     26) # of times active_load_balance() tried to move a task and failed
     27) # of times active_load_balance() successfully moved a task
 
-   Next three are sched_balance_exec() statistics:
-    28) sbe_cnt is not used
-    29) sbe_balanced is not used
-    30) sbe_pushed is not used
-
-   Next three are sched_balance_fork() statistics:
-    31) sbf_cnt is not used
-    32) sbf_balanced is not used
-    33) sbf_pushed is not used
-
-   Next three are try_to_wake_up() statistics:
-    34) # of times in this domain try_to_wake_up() awoke a task that
+   Next two are try_to_wake_up() statistics:
+    28) # of times in this domain try_to_wake_up() awoke a task that
         last ran on a different cpu in this domain
-    35) # of times in this domain try_to_wake_up() moved a task to the
+    29) # of times in this domain try_to_wake_up() moved a task to the
         waking cpu because it was cache-cold on its own cpu anyway
-    36) # of times in this domain try_to_wake_up() started passive balancing
 
 /proc/<pid>/schedstat
 ----------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-17 10:49 [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Ciju Rajan K
  2011-01-17 10:52 ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
  2011-01-17 10:54 ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
@ 2011-01-17 16:54 ` Peter Zijlstra
  2011-01-18  6:01   ` Ciju Rajan K
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2011-01-17 16:54 UTC (permalink / raw)
  To: Ciju Rajan K; +Cc: lkml, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

On Mon, 2011-01-17 at 16:19 +0530, Ciju Rajan K wrote:
> Hi All,
> 
> Here is the first version of the patch set, which updates
> the /proc/schedstat statistics. Please review the patches and let me
> know your thoughts.

Whats the impact on existing userspace, does the change warrant the
effort of changing the userspace tools?

Also, attachment fail.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-17 16:54 ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Peter Zijlstra
@ 2011-01-18  6:01   ` Ciju Rajan K
  2011-01-18  6:04     ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-18  6:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri, Ciju Rajan K

Hi Peter,


On 01/17/2011 10:24 PM, Peter Zijlstra wrote:
> On Mon, 2011-01-17 at 16:19 +0530, Ciju Rajan K wrote:
>> Hi All,
>>
>> Here is the first version of the patch set, which updates
>> the /proc/schedstat statistics. Please review the patches and let me
>> know your thoughts.
> 
> Whats the impact on existing userspace, does the change warrant the
> effort of changing the userspace tools?

Since the fields are not at the end, there might be some changes required for the userspace scripts. But the benefits would be that we will have more relevant stats in /proc/schedstat.

Most of the userspace applications will be considering the version field of /proc/schedstat. Since we have an updated version for these changes, there should not be any breakage for the existing applications. One more advantage is that who ever looks at /proc/schedstat will see the actual numbers rather than seeing more zeros. Basically the readability improves. 

"http://eaglet.rain.com/rick/linux/schedstat/" admits that the format for /proc/schedstat can change depending upon the version. If you are ok with the /proc/schedstat updates, I can also send a patch to update "http://eaglet.rain.com/rick/linux/schedstat/v15/latency.c" program which parses some of the schedstat entries.


> 
> Also, attachment fail.

It should be ok now.

-Ciju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat
  2011-01-18  6:01   ` Ciju Rajan K
@ 2011-01-18  6:04     ` Ciju Rajan K
  2011-01-18  6:04     ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
  2011-01-18  7:29     ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Satoru Takeuchi
  2 siblings, 0 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-18  6:04 UTC (permalink / raw)
  To: Peter Zijlstra, lkml
  Cc: Ciju Rajan K, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

sched: Updating the fields of /proc/schedstat 

This patch removes the unused statistics from /proc/schedstat. 
Also updates the request queue structure fields. 

Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com>
---

diff -Naurp a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h	2011-01-17 00:49:08.000000000 +0530
+++ b/include/linux/sched.h	2011-01-17 01:01:27.000000000 +0530
@@ -943,20 +943,9 @@ struct sched_domain {
 	unsigned int alb_failed;
 	unsigned int alb_pushed;

-	/* SD_BALANCE_EXEC stats */
-	unsigned int sbe_count;
-	unsigned int sbe_balanced;
-	unsigned int sbe_pushed;
-
-	/* SD_BALANCE_FORK stats */
-	unsigned int sbf_count;
-	unsigned int sbf_balanced;
-	unsigned int sbf_pushed;
-
 	/* try_to_wake_up() stats */
 	unsigned int ttwu_wake_remote;
 	unsigned int ttwu_move_affine;
-	unsigned int ttwu_move_balance;
 #endif
 #ifdef CONFIG_SCHED_DEBUG
 	char *name;
diff -Naurp a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c	2011-01-17 00:32:14.000000000 +0530
+++ b/kernel/sched.c	2011-01-17 00:38:24.000000000 +0530
@@ -545,7 +545,6 @@ struct rq {
 	unsigned int yld_count;

 	/* schedule() stats */
-	unsigned int sched_switch;
 	unsigned int sched_count;
 	unsigned int sched_goidle;

diff -Naurp a/kernel/sched_debug.c b/kernel/sched_debug.c
--- a/kernel/sched_debug.c	2011-01-17 00:31:36.000000000 +0530
+++ b/kernel/sched_debug.c	2011-01-17 00:36:52.000000000 +0530
@@ -286,7 +286,6 @@ static void print_cpu(struct seq_file *m

 	P(yld_count);

-	P(sched_switch);
 	P(sched_count);
 	P(sched_goidle);
 #ifdef CONFIG_SMP
diff -Naurp a/kernel/sched_stats.h b/kernel/sched_stats.h
--- a/kernel/sched_stats.h	2011-01-17 00:31:56.000000000 +0530
+++ b/kernel/sched_stats.h	2011-01-17 11:49:50.000000000 +0530
@@ -4,7 +4,7 @@
  * bump this up when changing the output format or the meaning of an existing
  * format, so that tools can adapt (or abort)
  */
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16

 static int show_schedstat(struct seq_file *seq, void *v)
 {
@@ -26,9 +26,9 @@ static int show_schedstat(struct seq_fil

 		/* runqueue-specific stats */
 		seq_printf(seq,
-		    "cpu%d %u %u %u %u %u %u %llu %llu %lu",
+		    "cpu%d %u %u %u %u %u %llu %llu %lu",
 		    cpu, rq->yld_count,
-		    rq->sched_switch, rq->sched_count, rq->sched_goidle,
+		    rq->sched_count, rq->sched_goidle,
 		    rq->ttwu_count, rq->ttwu_local,
 		    rq->rq_cpu_time,
 		    rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
@@ -57,12 +57,9 @@ static int show_schedstat(struct seq_fil
 				    sd->lb_nobusyg[itype]);
 			}
 			seq_printf(seq,
-				   " %u %u %u %u %u %u %u %u %u %u %u %u\n",
+				   " %u %u %u %u %u\n",
 			    sd->alb_count, sd->alb_failed, sd->alb_pushed,
-			    sd->sbe_count, sd->sbe_balanced, sd->sbe_pushed,
-			    sd->sbf_count, sd->sbf_balanced, sd->sbf_pushed,
-			    sd->ttwu_wake_remote, sd->ttwu_move_affine,
-			    sd->ttwu_move_balance);
+			    sd->ttwu_wake_remote, sd->ttwu_move_affine);
 		}
 		preempt_enable();
 #endif


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation
  2011-01-18  6:01   ` Ciju Rajan K
  2011-01-18  6:04     ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
@ 2011-01-18  6:04     ` Ciju Rajan K
  2011-01-19  7:11       ` Satoru Takeuchi
  2011-01-18  7:29     ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Satoru Takeuchi
  2 siblings, 1 reply; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-18  6:04 UTC (permalink / raw)
  To: Peter Zijlstra, lkml
  Cc: Ciju Rajan K, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

sched: Updating the sched-stat documentation

Some of the unused fields are removed from /proc/schedstat.
This is the documentation changes reflecting the same.

Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com>
---

diff -Naurp a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt
--- a/Documentation/scheduler/sched-stats.txt	2011-01-17 01:07:47.000000000 +0530
+++ b/Documentation/scheduler/sched-stats.txt	2011-01-17 15:32:05.000000000 +0530
@@ -26,119 +26,81 @@ Note that any such script will necessari
 reason to change versions is changes in the output format.  For those wishing
 to write their own scripts, the fields are described here.

+The first two fields of /proc/schedstat indicates the version (current
+version is 16) and jiffies values. The following values are from 
+cpu & domain statistics.
+
 CPU statistics
 --------------
-cpu<N> 1 2 3 4 5 6 7 8 9 10 11 12
+The format is like this:
+
+cpu<N> 1 2 3 4 5 6 7 8

-NOTE: In the sched_yield() statistics, the active queue is considered empty
-    if it has only one process in it, since obviously the process calling
-    sched_yield() is that process.
-
-First four fields are sched_yield() statistics:
-     1) # of times both the active and the expired queue were empty
-     2) # of times just the active queue was empty
-     3) # of times just the expired queue was empty
-     4) # of times sched_yield() was called
-
-Next three are schedule() statistics:
-     5) # of times we switched to the expired queue and reused it
-     6) # of times schedule() was called
-     7) # of times schedule() left the processor idle
-
-Next two are try_to_wake_up() statistics:
-     8) # of times try_to_wake_up() was called
-     9) # of times try_to_wake_up() was called to wake up the local cpu
-
-Next three are statistics describing scheduling latency:
-    10) sum of all time spent running by tasks on this processor (in jiffies)
-    11) sum of all time spent waiting to run by tasks on this processor (in
-        jiffies)
-    12) # of timeslices run on this cpu
+NOTE: In the sched_yield() statistics, the active queue is considered
+      empty if it has only one process in it, since obviously the 
+      process calling sched_yield() is that process.
+
+     1) # of times sched_yield() was called on this CPU
+     2) # of times scheduler runs on this CPU
+     3) # of times scheduler picks idle task as next task on this CPU
+     4) # of times try_to_wake_up() is run on this CPU 
+        (Number of times task wakeup is attempted from this CPU)
+     5) # of times try_to_wake_up() wakes up a task on the same CPU 
+        (local wakeup)
+     6) Time(ns) for which tasks have run on this CPU
+     7) Time(ns) for which tasks on this CPU's runqueue have waited 
+        before getting to run on the CPU
+     8) # of tasks that have run on this CPU


 Domain statistics
 -----------------
-One of these is produced per domain for each cpu described. (Note that if
-CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output.)
+One of these is produced per domain for each cpu described. 
+(Note that if CONFIG_SMP is not defined, *no* domains are utilized
+ and these lines will not appear in the output.)

-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

 The first field is a bit mask indicating what cpus this domain operates over.

-The next 24 are a variety of load_balance() statistics in grouped into types
-of idleness (idle, busy, and newly idle):
+The next 24 are a variety of load_balance() statistics grouped into
+types of idleness (idle, busy, and newly idle). The three idle 
+states are:
+
+CPU_NEWLY_IDLE:    Load balancer is being run on a CPU which is 
+                   about to enter IDLE state
+CPU_IDLE:          This state is entered after CPU_NEWLY_IDLE 
+                   state fails to find a new task for this CPU
+CPU_NOT_IDLE:      Load balancer is being run on a CPU when it is 
+                   not in IDLE state (busy times)
+
+There are eight stats available for each of the three idle states:

-     1) # of times in this domain load_balance() was called when the
-        cpu was idle
+     1) # of times in this domain load_balance() was called
      2) # of times in this domain load_balance() checked but found
-        the load did not require balancing when the cpu was idle
+        the load did not require balancing
      3) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was idle
+        more tasks and failed
      4) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was idle
-     5) # of times in this domain pull_task() was called when the cpu
-        was idle
+        load_balance() in this domain
+     5) # of times in this domain pull_task() was called
      6) # of times in this domain pull_task() was called even though
-        the target task was cache-hot when idle
+        the target task was cache-hot
      7) # of times in this domain load_balance() was called but did
-        not find a busier queue while the cpu was idle
-     8) # of times in this domain a busier queue was found while the
-        cpu was idle but no busier group was found
-
-     9) # of times in this domain load_balance() was called when the
-        cpu was busy
-    10) # of times in this domain load_balance() checked but found the
-        load did not require balancing when busy
-    11) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was busy
-    12) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was busy
-    13) # of times in this domain pull_task() was called when busy
-    14) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when busy
-    15) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was busy
-    16) # of times in this domain a busier queue was found while the cpu
-        was busy but no busier group was found
-
-    17) # of times in this domain load_balance() was called when the
-        cpu was just becoming idle
-    18) # of times in this domain load_balance() checked but found the
-        load did not require balancing when the cpu was just becoming idle
-    19) # of times in this domain load_balance() tried to move one or more
-        tasks and failed, when the cpu was just becoming idle
-    20) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was just becoming idle
-    21) # of times in this domain pull_task() was called when newly idle
-    22) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when just becoming idle
-    23) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was just becoming idle
-    24) # of times in this domain a busier queue was found while the cpu
-        was just becoming idle but no busier group was found
-
+        not find a busier queue
+     8) # of times in this domain a busier queue was found but no 
+        busier group was found
+     
    Next three are active_load_balance() statistics:
     25) # of times active_load_balance() was called
     26) # of times active_load_balance() tried to move a task and failed
     27) # of times active_load_balance() successfully moved a task

-   Next three are sched_balance_exec() statistics:
-    28) sbe_cnt is not used
-    29) sbe_balanced is not used
-    30) sbe_pushed is not used
-
-   Next three are sched_balance_fork() statistics:
-    31) sbf_cnt is not used
-    32) sbf_balanced is not used
-    33) sbf_pushed is not used
-
-   Next three are try_to_wake_up() statistics:
-    34) # of times in this domain try_to_wake_up() awoke a task that
+   Next two are try_to_wake_up() statistics:
+    28) # of times in this domain try_to_wake_up() awoke a task that
         last ran on a different cpu in this domain
-    35) # of times in this domain try_to_wake_up() moved a task to the
+    29) # of times in this domain try_to_wake_up() moved a task to the
         waking cpu because it was cache-cold on its own cpu anyway
-    36) # of times in this domain try_to_wake_up() started passive balancing

 /proc/<pid>/schedstat
 ----------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-18  6:01   ` Ciju Rajan K
  2011-01-18  6:04     ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
  2011-01-18  6:04     ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
@ 2011-01-18  7:29     ` Satoru Takeuchi
  2011-01-18  7:50       ` Ciju Rajan K
  2 siblings, 1 reply; 13+ messages in thread
From: Satoru Takeuchi @ 2011-01-18  7:29 UTC (permalink / raw)
  To: Ciju Rajan K
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

Hi Ciju,

(2011/01/18 15:01), Ciju Rajan K wrote:
> Hi Peter,
>
>
> On 01/17/2011 10:24 PM, Peter Zijlstra wrote:
>> On Mon, 2011-01-17 at 16:19 +0530, Ciju Rajan K wrote:
>>> Hi All,
>>>
>>> Here is the first version of the patch set, which updates
>>> the /proc/schedstat statistics. Please review the patches and let me
>>> know your thoughts.
>>
>> Whats the impact on existing userspace, does the change warrant the
>> effort of changing the userspace tools?
>
> Since the fields are not at the end, there might be some changes required for the userspace scripts. But the benefits would be that we will have more relevant stats in /proc/schedstat.
>
> Most of the userspace applications will be considering the version field of /proc/schedstat. Since we have an updated version for these changes, there should not be any breakage for the existing applications. One more advantage is that who ever looks at /proc/schedstat will see the actual numbers rather than seeing more zeros. Basically the readability improves.
>
> "http://eaglet.rain.com/rick/linux/schedstat/" admits that the format for /proc/schedstat can change depending upon the version. If you are ok with the /proc/schedstat updates, I can also send a patch to update "http://eaglet.rain.com/rick/linux/schedstat/v15/latency.c" program which parses some of the schedstat entries.

I don't like this patches because it breaks backward compatibility.
If there are any user who uses these fields, they can't get
the information which these fields provides from this time on.
In this context, `user' means not only application but also the person
who refers to /proc/schedstat directly.

In fact, although I can't say "command XXX refers to these field",
I sometimes check {sbe_*,sbf_*} to confirm load_balance behavior
by issuing, for example,

===============================================================================
watch /proc/schedstat
===============================================================================

or

===============================================================================
while true ; do
   cat /proc/schedstat >>schedstat_log
   sleep 10
done
===============================================================================

Thanks
Satoru

>
>
>>
>> Also, attachment fail.
>
> It should be ok now.
>
> -Ciju
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-18  7:29     ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Satoru Takeuchi
@ 2011-01-18  7:50       ` Ciju Rajan K
  2011-01-19  6:42         ` Satoru Takeuchi
  0 siblings, 1 reply; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-18  7:50 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao,
	Srivatsa Vaddagiri, Ciju Rajan K



On 01/18/2011 12:59 PM, Satoru Takeuchi wrote:
> Hi Ciju,
> 
Hello Satoru,

> I don't like this patches because it breaks backward compatibility.
> If there are any user who uses these fields, they can't get
> the information which these fields provides from this time on.
> In this context, `user' means not only application but also the person
> who refers to /proc/schedstat directly.

This patch set removes only those fields which are currently not in use.
If you observe the fields of /proc/schedstat the following fields are not
being updated.

For each cpu:

2) sched_switch

For each domain:

28) sd->sbe_count
29) sd->sbe_balanced
30) sd->sbe_pushed
31) sd->sbf_count
32) sd->sbf_balanced 
33) sd->sbf_pushed
36) sd->ttwu_move_balance

The serial numbers indicate the positions of the fields in version 15 
of/proc/schedstat
> 
> In fact, although I can't say "command XXX refers to these field",
> I sometimes check {sbe_*,sbf_*} to confirm load_balance behavior
> by issuing, for example,
> 
> ===============================================================================
> watch /proc/schedstat
> ===============================================================================
> 
> or
> 
> ===============================================================================
> while true ; do
> cat /proc/schedstat >>schedstat_log
> sleep 10
> done
> ===============================================================================

Only concern is that the user might have to update the scripts to get the
correct position of the fields. Which anyway the scripts have to take care
depending on the version of /proc/schedstat.

I hope this addresses your concern.

-Ciju
> 
> Thanks
> Satoru
> 
>>
>>
>>>
>>> Also, attachment fail.
>>
>> It should be ok now.
>>
>> -Ciju
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-18  7:50       ` Ciju Rajan K
@ 2011-01-19  6:42         ` Satoru Takeuchi
  2011-01-19 15:55           ` Ciju Rajan K
  0 siblings, 1 reply; 13+ messages in thread
From: Satoru Takeuchi @ 2011-01-19  6:42 UTC (permalink / raw)
  To: Ciju Rajan K
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

Hi Ciju,

(2011/01/18 16:50), Ciju Rajan K wrote:
>
>
> On 01/18/2011 12:59 PM, Satoru Takeuchi wrote:
>> Hi Ciju,
>>
> Hello Satoru,
>
>> I don't like this patches because it breaks backward compatibility.
>> If there are any user who uses these fields, they can't get
>> the information which these fields provides from this time on.
>> In this context, `user' means not only application but also the person
>> who refers to /proc/schedstat directly.
>
> This patch set removes only those fields which are currently not in use.
> If you observe the fields of /proc/schedstat the following fields are not
> being updated.

Ah... I misunderstood the meaning of `unused' and complained based on too
old kernel's source. Sorry.

I confirmed that these fields are actually not treated by upstream kernel
at all. So I think it's OK if any userland tools are updated synchronized
with this change. Does its benefit is more than its cost? In my
understanding, its benefit is improving readability and reducing some memory
footprint, and its cost is changing all userspace tools referring /proc/schedstat.

# Unfortunately I don't know how much it costs.

Thanks,
Satoru

>
> For each cpu:
>
> 2) sched_switch
>
> For each domain:
>
> 28) sd->sbe_count
> 29) sd->sbe_balanced
> 30) sd->sbe_pushed
> 31) sd->sbf_count
> 32) sd->sbf_balanced
> 33) sd->sbf_pushed
> 36) sd->ttwu_move_balance
>
> The serial numbers indicate the positions of the fields in version 15
> of/proc/schedstat
>>
>> In fact, although I can't say "command XXX refers to these field",
>> I sometimes check {sbe_*,sbf_*} to confirm load_balance behavior
>> by issuing, for example,
>>
>> ===============================================================================
>> watch /proc/schedstat
>> ===============================================================================
>>
>> or
>>
>> ===============================================================================
>> while true ; do
>> cat /proc/schedstat>>schedstat_log
>> sleep 10
>> done
>> ===============================================================================
>
> Only concern is that the user might have to update the scripts to get the
> correct position of the fields. Which anyway the scripts have to take care
> depending on the version of /proc/schedstat.
>
> I hope this addresses your concern.
>
> -Ciju
>>
>> Thanks
>> Satoru
>>
>>>
>>>
>>>>
>>>> Also, attachment fail.
>>>
>>> It should be ok now.
>>>
>>> -Ciju
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation
  2011-01-18  6:04     ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
@ 2011-01-19  7:11       ` Satoru Takeuchi
  2011-01-19 15:47         ` Ciju Rajan K
  0 siblings, 1 reply; 13+ messages in thread
From: Satoru Takeuchi @ 2011-01-19  7:11 UTC (permalink / raw)
  To: Ciju Rajan K
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao, Srivatsa Vaddagiri

Hi Ciju,

(2011/01/18 15:04), Ciju Rajan K wrote:
> sched: Updating the sched-stat documentation
>
> Some of the unused fields are removed from /proc/schedstat.
> This is the documentation changes reflecting the same.
>
> Signed-off-by: Ciju Rajan K<ciju@linux.vnet.ibm.com>
> ---
>
> diff -Naurp a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt
> --- a/Documentation/scheduler/sched-stats.txt	2011-01-17 01:07:47.000000000 +0530
> +++ b/Documentation/scheduler/sched-stats.txt	2011-01-17 15:32:05.000000000 +0530
> @@ -26,119 +26,81 @@ Note that any such script will necessari
>   reason to change versions is changes in the output format.  For those wishing
>   to write their own scripts, the fields are described here.
>
> +The first two fields of /proc/schedstat indicates the version (current
> +version is 16) and jiffies values. The following values are from
> +cpu&  domain statistics.

cpu & domain statistics.

> +
>   CPU statistics
>   --------------
> -cpu<N>  1 2 3 4 5 6 7 8 9 10 11 12
> +The format is like this:
> +
> +cpu<N>  1 2 3 4 5 6 7 8
>
> -NOTE: In the sched_yield() statistics, the active queue is considered empty
> -    if it has only one process in it, since obviously the process calling
> -    sched_yield() is that process.
> -
> -First four fields are sched_yield() statistics:
> -     1) # of times both the active and the expired queue were empty
> -     2) # of times just the active queue was empty
> -     3) # of times just the expired queue was empty
> -     4) # of times sched_yield() was called
> -
> -Next three are schedule() statistics:
> -     5) # of times we switched to the expired queue and reused it
> -     6) # of times schedule() was called
> -     7) # of times schedule() left the processor idle
> -
> -Next two are try_to_wake_up() statistics:
> -     8) # of times try_to_wake_up() was called
> -     9) # of times try_to_wake_up() was called to wake up the local cpu
> -
> -Next three are statistics describing scheduling latency:
> -    10) sum of all time spent running by tasks on this processor (in jiffies)
> -    11) sum of all time spent waiting to run by tasks on this processor (in
> -        jiffies)
> -    12) # of timeslices run on this cpu
> +NOTE: In the sched_yield() statistics, the active queue is considered
> +      empty if it has only one process in it, since obviously the
> +      process calling sched_yield() is that process.

There are no active/expired queue any more.

> +
> +     1) # of times sched_yield() was called on this CPU
> +     2) # of times scheduler runs on this CPU
> +     3) # of times scheduler picks idle task as next task on this CPU
> +     4) # of times try_to_wake_up() is run on this CPU
> +        (Number of times task wakeup is attempted from this CPU)
> +     5) # of times try_to_wake_up() wakes up a task on the same CPU
> +        (local wakeup)
> +     6) Time(ns) for which tasks have run on this CPU
> +     7) Time(ns) for which tasks on this CPU's runqueue have waited
> +        before getting to run on the CPU
> +     8) # of tasks that have run on this CPU
>
>
>   Domain statistics
>   -----------------
> -One of these is produced per domain for each cpu described. (Note that if
> -CONFIG_SMP is not defined, *no* domains are utilized and these lines
> -will not appear in the output.)
> +One of these is produced per domain for each cpu described.
> +(Note that if CONFIG_SMP is not defined, *no* domains are utilized
> + and these lines will not appear in the output.)
>
> -domain<N>  <cpumask>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
> +domain<N>  <cpumask>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
>
>   The first field is a bit mask indicating what cpus this domain operates over.
>
> -The next 24 are a variety of load_balance() statistics in grouped into types
> -of idleness (idle, busy, and newly idle):
> +The next 24 are a variety of load_balance() statistics grouped into
> +types of idleness (idle, busy, and newly idle). The three idle
> +states are:
> +
> +CPU_NEWLY_IDLE:    Load balancer is being run on a CPU which is
> +                   about to enter IDLE state
> +CPU_IDLE:          This state is entered after CPU_NEWLY_IDLE
> +                   state fails to find a new task for this CPU
> +CPU_NOT_IDLE:      Load balancer is being run on a CPU when it is
> +                   not in IDLE state (busy times)
> +
> +There are eight stats available for each of the three idle states:

It's more helpful iffiled 1-8 are for CPU_NEWLY_IDLE, 9-15 are for
CPU_IDLE, and 16-24 is for CPU_NOT_IDLE. Current description doesn't
say whether

  1) lb_count[CPU_NEWLY_IDLE],
  2) lb_balanced[CPU_NEWLY_IDLE],
  3) lb_failed[CPU_NEWLY_IDLE],
  ...

or

  1) lb_count[CPU_NEWLY_IDLE]
  2) lb_count[CPU_IDLE]
  3) lb_count[CPU_NOT_IDLE]
  ...

Thanks,
Satoru

>
> -     1) # of times in this domain load_balance() was called when the
> -        cpu was idle
> +     1) # of times in this domain load_balance() was called
>        2) # of times in this domain load_balance() checked but found
> -        the load did not require balancing when the cpu was idle
> +        the load did not require balancing
>        3) # of times in this domain load_balance() tried to move one or
> -        more tasks and failed, when the cpu was idle
> +        more tasks and failed
>        4) sum of imbalances discovered (if any) with each call to
> -        load_balance() in this domain when the cpu was idle
> -     5) # of times in this domain pull_task() was called when the cpu
> -        was idle
> +        load_balance() in this domain
> +     5) # of times in this domain pull_task() was called
>        6) # of times in this domain pull_task() was called even though
> -        the target task was cache-hot when idle
> +        the target task was cache-hot
>        7) # of times in this domain load_balance() was called but did
> -        not find a busier queue while the cpu was idle
> -     8) # of times in this domain a busier queue was found while the
> -        cpu was idle but no busier group was found
> -
> -     9) # of times in this domain load_balance() was called when the
> -        cpu was busy
> -    10) # of times in this domain load_balance() checked but found the
> -        load did not require balancing when busy
> -    11) # of times in this domain load_balance() tried to move one or
> -        more tasks and failed, when the cpu was busy
> -    12) sum of imbalances discovered (if any) with each call to
> -        load_balance() in this domain when the cpu was busy
> -    13) # of times in this domain pull_task() was called when busy
> -    14) # of times in this domain pull_task() was called even though the
> -        target task was cache-hot when busy
> -    15) # of times in this domain load_balance() was called but did not
> -        find a busier queue while the cpu was busy
> -    16) # of times in this domain a busier queue was found while the cpu
> -        was busy but no busier group was found
> -
> -    17) # of times in this domain load_balance() was called when the
> -        cpu was just becoming idle
> -    18) # of times in this domain load_balance() checked but found the
> -        load did not require balancing when the cpu was just becoming idle
> -    19) # of times in this domain load_balance() tried to move one or more
> -        tasks and failed, when the cpu was just becoming idle
> -    20) sum of imbalances discovered (if any) with each call to
> -        load_balance() in this domain when the cpu was just becoming idle
> -    21) # of times in this domain pull_task() was called when newly idle
> -    22) # of times in this domain pull_task() was called even though the
> -        target task was cache-hot when just becoming idle
> -    23) # of times in this domain load_balance() was called but did not
> -        find a busier queue while the cpu was just becoming idle
> -    24) # of times in this domain a busier queue was found while the cpu
> -        was just becoming idle but no busier group was found
> -
> +        not find a busier queue
> +     8) # of times in this domain a busier queue was found but no
> +        busier group was found
> +
>      Next three are active_load_balance() statistics:
>       25) # of times active_load_balance() was called
>       26) # of times active_load_balance() tried to move a task and failed
>       27) # of times active_load_balance() successfully moved a task
>
> -   Next three are sched_balance_exec() statistics:
> -    28) sbe_cnt is not used
> -    29) sbe_balanced is not used
> -    30) sbe_pushed is not used
> -
> -   Next three are sched_balance_fork() statistics:
> -    31) sbf_cnt is not used
> -    32) sbf_balanced is not used
> -    33) sbf_pushed is not used
> -
> -   Next three are try_to_wake_up() statistics:
> -    34) # of times in this domain try_to_wake_up() awoke a task that
> +   Next two are try_to_wake_up() statistics:
> +    28) # of times in this domain try_to_wake_up() awoke a task that
>           last ran on a different cpu in this domain
> -    35) # of times in this domain try_to_wake_up() moved a task to the
> +    29) # of times in this domain try_to_wake_up() moved a task to the
>           waking cpu because it was cache-cold on its own cpu anyway
> -    36) # of times in this domain try_to_wake_up() started passive balancing
>
>   /proc/<pid>/schedstat
>   ----------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation
  2011-01-19  7:11       ` Satoru Takeuchi
@ 2011-01-19 15:47         ` Ciju Rajan K
  0 siblings, 0 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-19 15:47 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao,
	Srivatsa Vaddagiri, Ciju Rajan K

Hi Satoru,


>> +The first two fields of /proc/schedstat indicates the version (current
>> +version is 16) and jiffies values. The following values are from
>> +cpu& domain statistics.
> 
> cpu & domain statistics.
> 

Will correct it in the next version.


>> +
>> +CPU_NEWLY_IDLE: Load balancer is being run on a CPU which is
>> + about to enter IDLE state
>> +CPU_IDLE: This state is entered after CPU_NEWLY_IDLE
>> + state fails to find a new task for this CPU
>> +CPU_NOT_IDLE: Load balancer is being run on a CPU when it is
>> + not in IDLE state (busy times)
>> +
>> +There are eight stats available for each of the three idle states:
> 
> It's more helpful iffiled 1-8 are for CPU_NEWLY_IDLE, 9-15 are for
> CPU_IDLE, and 16-24 is for CPU_NOT_IDLE. Current description doesn't
> say whether
> 
> 1) lb_count[CPU_NEWLY_IDLE],
> 2) lb_balanced[CPU_NEWLY_IDLE],
> 3) lb_failed[CPU_NEWLY_IDLE],
> ...
> 
> or
> 
> 1) lb_count[CPU_NEWLY_IDLE]
> 2) lb_count[CPU_IDLE]
> 3) lb_count[CPU_NOT_IDLE]
> ...
> 
Agreed. There is a certain ambiguity there. I will find a better way
to mention it.

Thanks for reviewing this. I will send the next version soon.

-Ciju
> Thanks,
> Satoru
> 
>>
>> - 1) # of times in this domain load_balance() was called when the
>> - cpu was idle
>> + 1) # of times in this domain load_balance() was called
>> 2) # of times in this domain load_balance() checked but found
>> - the load did not require balancing when the cpu was idle
>> + the load did not require balancing
>> 3) # of times in this domain load_balance() tried to move one or
>> - more tasks and failed, when the cpu was idle
>> + more tasks and failed
>> 4) sum of imbalances discovered (if any) with each call to
>> - load_balance() in this domain when the cpu was idle
>> - 5) # of times in this domain pull_task() was called when the cpu
>> - was idle
>> + load_balance() in this domain
>> + 5) # of times in this domain pull_task() was called
>> 6) # of times in this domain pull_task() was called even though
>> - the target task was cache-hot when idle
>> + the target task was cache-hot
>> 7) # of times in this domain load_balance() was called but did
>> - not find a busier queue while the cpu was idle
>> - 8) # of times in this domain a busier queue was found while the
>> - cpu was idle but no busier group was found
>> -
>> - 9) # of times in this domain load_balance() was called when the
>> - cpu was busy
>> - 10) # of times in this domain load_balance() checked but found the
>> - load did not require balancing when busy
>> - 11) # of times in this domain load_balance() tried to move one or
>> - more tasks and failed, when the cpu was busy
>> - 12) sum of imbalances discovered (if any) with each call to
>> - load_balance() in this domain when the cpu was busy
>> - 13) # of times in this domain pull_task() was called when busy
>> - 14) # of times in this domain pull_task() was called even though the
>> - target task was cache-hot when busy
>> - 15) # of times in this domain load_balance() was called but did not
>> - find a busier queue while the cpu was busy
>> - 16) # of times in this domain a busier queue was found while the cpu
>> - was busy but no busier group was found
>> -
>> - 17) # of times in this domain load_balance() was called when the
>> - cpu was just becoming idle
>> - 18) # of times in this domain load_balance() checked but found the
>> - load did not require balancing when the cpu was just becoming idle
>> - 19) # of times in this domain load_balance() tried to move one or more
>> - tasks and failed, when the cpu was just becoming idle
>> - 20) sum of imbalances discovered (if any) with each call to
>> - load_balance() in this domain when the cpu was just becoming idle
>> - 21) # of times in this domain pull_task() was called when newly idle
>> - 22) # of times in this domain pull_task() was called even though the
>> - target task was cache-hot when just becoming idle
>> - 23) # of times in this domain load_balance() was called but did not
>> - find a busier queue while the cpu was just becoming idle
>> - 24) # of times in this domain a busier queue was found while the cpu
>> - was just becoming idle but no busier group was found
>> -
>> + not find a busier queue
>> + 8) # of times in this domain a busier queue was found but no
>> + busier group was found
>> +
>> Next three are active_load_balance() statistics:
>> 25) # of times active_load_balance() was called
>> 26) # of times active_load_balance() tried to move a task and failed
>> 27) # of times active_load_balance() successfully moved a task
>>
>> - Next three are sched_balance_exec() statistics:
>> - 28) sbe_cnt is not used
>> - 29) sbe_balanced is not used
>> - 30) sbe_pushed is not used
>> -
>> - Next three are sched_balance_fork() statistics:
>> - 31) sbf_cnt is not used
>> - 32) sbf_balanced is not used
>> - 33) sbf_pushed is not used
>> -
>> - Next three are try_to_wake_up() statistics:
>> - 34) # of times in this domain try_to_wake_up() awoke a task that
>> + Next two are try_to_wake_up() statistics:
>> + 28) # of times in this domain try_to_wake_up() awoke a task that
>> last ran on a different cpu in this domain
>> - 35) # of times in this domain try_to_wake_up() moved a task to the
>> + 29) # of times in this domain try_to_wake_up() moved a task to the
>> waking cpu because it was cache-cold on its own cpu anyway
>> - 36) # of times in this domain try_to_wake_up() started passive balancing
>>
>> /proc/<pid>/schedstat
>> ----------------
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat
  2011-01-19  6:42         ` Satoru Takeuchi
@ 2011-01-19 15:55           ` Ciju Rajan K
  0 siblings, 0 replies; 13+ messages in thread
From: Ciju Rajan K @ 2011-01-19 15:55 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Peter Zijlstra, lkml, Ingo Molnar, Bharata B Rao,
	Srivatsa Vaddagiri, Ciju Rajan K


>>
>> This patch set removes only those fields which are currently not in use.
>> If you observe the fields of /proc/schedstat the following fields are not
>> being updated.
> 
> Ah... I misunderstood the meaning of `unused' and complained based on too
> old kernel's source. Sorry.> 

> I confirmed that these fields are actually not treated by upstream kernel
> at all. So I think it's OK if any userland tools are updated synchronized
> with this change.

Thanks for verifying it.

 Does its benefit is more than its cost? In my
> understanding, its benefit is improving readability and reducing some memory
> footprint, and its cost is changing all userspace tools referring /proc/schedstat.

In my opinion we should go ahead with updating /proc/schedstat. 
As I mentioned earlier, we can have an updated script for 
http://eaglet.rain.com/rick/linux/schedstat/ 

> 
> # Unfortunately I don't know how much it costs.

The changes required in the userspace would be very small. 
Its just a matter of re-ordering the values. I also can't think
of much complicated scenario than this.

-Ciju
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-01-19 15:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-17 10:49 [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Ciju Rajan K
2011-01-17 10:52 ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
2011-01-17 10:54 ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
2011-01-17 16:54 ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Peter Zijlstra
2011-01-18  6:01   ` Ciju Rajan K
2011-01-18  6:04     ` [PATCH 1/2 v1.0]sched: Removing unused fields from /proc/schedstat Ciju Rajan K
2011-01-18  6:04     ` [PATCH 2/2 v1.0]sched: Updating the sched-stat documentation Ciju Rajan K
2011-01-19  7:11       ` Satoru Takeuchi
2011-01-19 15:47         ` Ciju Rajan K
2011-01-18  7:29     ` [RFC][PATCH 0/2 v1.0]sched: updating /proc/schedstat Satoru Takeuchi
2011-01-18  7:50       ` Ciju Rajan K
2011-01-19  6:42         ` Satoru Takeuchi
2011-01-19 15:55           ` Ciju Rajan K

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.