All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
@ 2012-05-07 21:42 Frank Rowand
  2012-05-07 21:47 ` Frank Rowand
  0 siblings, 1 reply; 8+ messages in thread
From: Frank Rowand @ 2012-05-07 21:42 UTC (permalink / raw)
  To: linux-rt-users, williams; +Cc: jkacur, dvhart


cyclictest option '-e', '--latency' was added out of alphabetic
order and was not added to the help text.  Clean up before
following patch which will add a new option.

Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
 src/cyclictest/cyclictest.c |   17 	9 +	8 -	0 !
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: b/src/cyclictest/cyclictest.c
===================================================================
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -834,6 +834,7 @@ static void display_help(int error)
 	       "-D       --duration=t      specify a length for the test run\n"
 	       "                           default is in seconds, but 'm', 'h', or 'd' maybe added\n"
 	       "                           to modify value to minutes, hours or days\n"
+	       "-e       --latency=PM_QOS  write PM_QOS to /dev/cpu_dma_latency\n"
 	       "-E       --event           event tracing (used with -b)\n"
 	       "-f       --ftrace          function trace (when -b is active)\n"
 	       "-h       --histogram=US    dump a latency histogram to stdout after the run\n"
@@ -963,6 +964,7 @@ static void process_options (int argc, c
 			{"clock", required_argument, NULL, 'c'},
 			{"context", no_argument, NULL, 'C'},
 			{"distance", required_argument, NULL, 'd'},
+			{"latency", required_argument, NULL, 'e'},
 			{"event", no_argument, NULL, 'E'},
 			{"ftrace", no_argument, NULL, 'f'},
 			{"histogram", required_argument, NULL, 'h'},
@@ -992,10 +994,9 @@ static void process_options (int argc, c
 			{"traceopt", required_argument, NULL, 'O'},
 			{"smp", no_argument, NULL, 'S'},
 			{"numa", no_argument, NULL, 'U'},
-			{"latency", required_argument, NULL, 'e'},
 			{NULL, 0, NULL, 0}
 		};
-		int c = getopt_long(argc, argv, "a::b:Bc:Cd:Efh:H:i:Il:MnNo:O:p:PmqrsSt::uUvD:wWT:y:e:",
+		int c = getopt_long(argc, argv, "a::b:Bc:Cd:e:Efh:H:i:Il:MnNo:O:p:PmqrsSt::uUvD:wWT:y:",
 				    long_options, &option_index);
 		if (c == -1)
 			break;
@@ -1019,6 +1020,12 @@ static void process_options (int argc, c
 		case 'c': clocksel = atoi(optarg); break;
 		case 'C': tracetype = CTXTSWITCH; break;
 		case 'd': distance = atoi(optarg); break;
+		case 'e': /* power management latency target value */
+			  /* note: default is 0 (zero) */
+			latency_target_value = atoi(optarg);
+			if (latency_target_value < 0)
+				latency_target_value = 0;
+			break;
 		case 'E': enable_events = 1; break;
 		case 'f': tracetype = FUNCTION; ftrace = 1; break;
 		case 'H': histofall = 1; /* fall through */
@@ -1101,12 +1108,6 @@ static void process_options (int argc, c
 			warn("ignoring --numa or -U\n");
 #endif
 			break;
-		case 'e': /* power management latency target value */
-			  /* note: default is 0 (zero) */
-			latency_target_value = atoi(optarg);
-			if (latency_target_value < 0)
-				latency_target_value = 0;
-			break;
 
 		case '?': display_help(0); break;
 		}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-07 21:42 [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt Frank Rowand
@ 2012-05-07 21:47 ` Frank Rowand
  2012-05-07 21:52   ` Frank Rowand
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Frank Rowand @ 2012-05-07 21:47 UTC (permalink / raw)
  To: linux-rt-users, williams; +Cc: jkacur, dvhart


cyclictest: ARM panda clock resolution will be ~30 usec unless
CONFIG_OMAP_32K_TIMER=n, resulting in a poor latency report.

This patch does _not_ fix the problem, it merely provides the
instrumentation to make it visible.  The value of measured
resolution is useful information for any system.

Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
---
 src/cyclictest/cyclictest.c |   87 	86 +	1 -	0 !
 1 file changed, 86 insertions(+), 1 deletion(-)

Index: b/src/cyclictest/cyclictest.c
===================================================================
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -171,6 +171,7 @@ static int duration = 0;
 static int use_nsecs = 0;
 static int refresh_on_max;
 static int force_sched_other;
+static int check_clock_resolution_loops = 0;
 
 static pthread_cond_t refresh_on_max_cond = PTHREAD_COND_INITIALIZER;
 static pthread_mutex_t refresh_on_max_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -854,6 +855,9 @@ static void display_help(int error)
 	       "-P       --preemptoff      Preempt off tracing (used with -b)\n"
 	       "-q       --quiet           print only a summary on exit\n"
 	       "-r       --relative        use relative timer instead of absolute\n"
+	       "-R LOOP  --resolution=LOOP check clock resolution, calling clock_gettime() LOOP\n"
+	       "                           times.  list of clock_gettime() values will be\n"
+	       "                           reported with -v\n"
 	       "-s       --system          use sys_nanosleep and sys_setitimer\n"
 	       "-t       --threads         one thread per available processor\n"
 	       "-t [NUM] --threads=NUM     number of threads:\n"
@@ -982,6 +986,7 @@ static void process_options (int argc, c
 			{"preemptoff", no_argument, NULL, 'P'},
 			{"quiet", no_argument, NULL, 'q'},
 			{"relative", no_argument, NULL, 'r'},
+			{"resolution", required_argument, NULL, 'R'},
 			{"system", no_argument, NULL, 's'},
 			{"threads", optional_argument, NULL, 't'},
 			{"unbuffered", no_argument, NULL, 'u'},
@@ -996,7 +1001,7 @@ static void process_options (int argc, c
 			{"numa", no_argument, NULL, 'U'},
 			{NULL, 0, NULL, 0}
 		};
-		int c = getopt_long(argc, argv, "a::b:Bc:Cd:e:Efh:H:i:Il:MnNo:O:p:PmqrsSt::uUvD:wWT:y:",
+		int c = getopt_long(argc, argv, "a::b:Bc:Cd:e:Efh:H:i:Il:MnNo:O:p:PmqrR:sSt::uUvD:wWT:y:",
 				    long_options, &option_index);
 		if (c == -1)
 			break;
@@ -1061,6 +1066,14 @@ static void process_options (int argc, c
 			break;
 		case 'q': quiet = 1; break;
 		case 'r': timermode = TIMER_RELTIME; break;
+		case 'R':
+			{
+			char *endptr;
+			check_clock_resolution_loops = strtol(optarg, &endptr, 10);
+			if ((endptr == optarg) || !check_clock_resolution_loops)
+				check_clock_resolution_loops = -1;
+			}
+			break;
 		case 's': use_system = MODE_SYS_OFFSET; break;
 		case 't':
 			if (smp) {
@@ -1132,6 +1145,11 @@ static void process_options (int argc, c
 	} else if (tracelimit)
 		fileprefix = procfileprefix;
 
+	if (check_clock_resolution_loops == -1) {
+		warn("-R requires positive integer value for LOOP\n");
+		error = 1;
+	}
+
 	if (clocksel < 0 || clocksel > ARRAY_SIZE(clocksources))
 		error = 1;
 
@@ -1391,6 +1409,73 @@ int main(int argc, char **argv)
 	if (check_timer())
 		warn("High resolution timers not available\n");
 
+	if (check_clock_resolution_loops) {
+		int clock;
+		uint64_t diff;
+		int k;
+		struct timespec *now;
+		struct timespec prev;
+		struct timespec res;
+		int zero_diff = 0;
+		uint64_t min_non_zero_diff = UINT64_MAX;
+		uint64_t reported_resolution = UINT64_MAX;
+
+		now = calloc(check_clock_resolution_loops, sizeof(*now));
+		clock = clocksources[clocksel];
+
+		if (clock_getres(clock, &res)) {
+			warn("clock_getres failed");
+		} else {
+			reported_resolution = (NSEC_PER_SEC * res.tv_sec) + res.tv_nsec;
+		}
+
+		clock_gettime(clock, &prev);
+
+		for (k=0; k < check_clock_resolution_loops; k++) {
+			clock_gettime(clock, &now[k]);
+		}
+
+		if (verbose) {
+			printf("\n\nFor consecutive calls to clock_gettime():\n\n");
+			printf("time, delta time (nsec)\n\n");
+		}
+
+		prev = now[0];
+		for (k=1; k < check_clock_resolution_loops; k++) {
+
+			/* always show delta in ns */
+			diff = calcdiff_ns(now[k], prev);
+
+			if (diff == 0) {
+				zero_diff = 1;
+			} else if (diff < min_non_zero_diff) {
+				min_non_zero_diff = diff;
+			}
+
+			if (verbose) {
+				printf("%ld.%06ld", now[k].tv_sec, now[k].tv_nsec);
+				printf("  %llu\n", diff);
+			}
+
+			prev = now[k];
+		}
+
+		if (verbose ||
+		    (min_non_zero_diff && (min_non_zero_diff > reported_resolution))) {
+		    	/*
+			 * Measured clock resolution includes the time to call
+			 * clock_gettime(), so it will be slightly larger than
+			 * actual resolution.
+			 */
+			fprintf(stderr, "reported clock resolution: %lld nsec\n",
+				reported_resolution);
+			fprintf(stderr, "measured clock resolution less than: %llu nsec\n",
+				min_non_zero_diff);
+		}
+
+		free(now);
+	}
+
 	mode = use_nanosleep + use_system;
 
 	sigemptyset(&sigset);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-07 21:47 ` Frank Rowand
@ 2012-05-07 21:52   ` Frank Rowand
  2012-05-07 22:00   ` Frank Rowand
  2012-05-07 22:05   ` Frank Rowand
  2 siblings, 0 replies; 8+ messages in thread
From: Frank Rowand @ 2012-05-07 21:52 UTC (permalink / raw)
  To: Rowand, Frank; +Cc: linux-rt-users, williams, jkacur, dvhart

[-- Attachment #1: Type: text/plain, Size: 146 bytes --]

The attachment is graphs that show the latency results for the ARM
panda with a 'bad' 32khz timer, compared to a higher resolution timer.

-Frank

[-- Attachment #2: zzz_clock_resolution_check_graphs.png --]
[-- Type: image/png, Size: 27043 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-07 21:47 ` Frank Rowand
  2012-05-07 21:52   ` Frank Rowand
@ 2012-05-07 22:00   ` Frank Rowand
  2012-05-08  0:30     ` Luis Claudio R. Goncalves
  2012-05-07 22:05   ` Frank Rowand
  2 siblings, 1 reply; 8+ messages in thread
From: Frank Rowand @ 2012-05-07 22:00 UTC (permalink / raw)
  To: Rowand, Frank; +Cc: linux-rt-users, williams, jkacur, dvhart

This is the resulting message on the ARM panda with a 'bad'
32khz timer:


  # cyclictest -q -p 80 -t -n -l 10 -h ${hist_bins} -R 100
  reported clock resolution: 1 nsec
  measured clock resolution less than: 30517 nsec


A possible follow on patch would be to generate a hard
error (fail the test) if the measured resolution was
above some unreasonable value (perhaps > 1 msec), but
allow the hard fail to be overridden with yet another
command line option.  Any opinions about that?

-Frank


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-07 21:47 ` Frank Rowand
  2012-05-07 21:52   ` Frank Rowand
  2012-05-07 22:00   ` Frank Rowand
@ 2012-05-07 22:05   ` Frank Rowand
  2 siblings, 0 replies; 8+ messages in thread
From: Frank Rowand @ 2012-05-07 22:05 UTC (permalink / raw)
  To: linux-rt-users, williams; +Cc: jkacur, dvhart

Darn it, the subject should have been:

  [PATCH 2/2] RFC cyclictest: report large measured clock latency


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-07 22:00   ` Frank Rowand
@ 2012-05-08  0:30     ` Luis Claudio R. Goncalves
  2012-05-08  2:05       ` Frank Rowand
  0 siblings, 1 reply; 8+ messages in thread
From: Luis Claudio R. Goncalves @ 2012-05-08  0:30 UTC (permalink / raw)
  To: Frank Rowand; +Cc: Rowand, Frank, linux-rt-users, williams, jkacur, dvhart

On Mon, May 07, 2012 at 03:00:17PM -0700, Frank Rowand wrote:
| This is the resulting message on the ARM panda with a 'bad'
| 32khz timer:
| 
| 
|   # cyclictest -q -p 80 -t -n -l 10 -h ${hist_bins} -R 100
|   reported clock resolution: 1 nsec
|   measured clock resolution less than: 30517 nsec

How about using a fixed loop size (say 1000000 clock reads) to define
the average cost of reading the clock (the second value presented above)
instead of a variable amount of iterations? Reading the clock twice and
calculating the average could lead to wrong impressions. Also, it would
be interesting to run such test under a real time priority (FIFO:2, maybe?)
to avoid too much external interference on the readings, mainly involuntary
context switches.

Having too different values called 'clock resolution' may be a good source
of confusion. The value of clock_getres() is the resolution, as per the
system jargon, and the second value should be granularity, reading cost,
the-average-time-it-takes-to-read-the-clock or something alike.
 
| A possible follow on patch would be to generate a hard
| error (fail the test) if the measured resolution was
| above some unreasonable value (perhaps > 1 msec), but
| allow the hard fail to be overridden with yet another
| command line option.  Any opinions about that?

My suggestion is to keep the current behavior and add an option to
stop/complain case the clock has a poor resolution or has a reading
cost too high.

Luis
-- 
[ Luis Claudio R. Goncalves             Red Hat  -  Realtime Team ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-08  0:30     ` Luis Claudio R. Goncalves
@ 2012-05-08  2:05       ` Frank Rowand
  2012-05-08 13:53         ` Luis Claudio R. Goncalves
  0 siblings, 1 reply; 8+ messages in thread
From: Frank Rowand @ 2012-05-08  2:05 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Rowand, Frank, linux-rt-users, williams, jkacur, dvhart

On 05/07/12 17:30, Luis Claudio R. Goncalves wrote:
> On Mon, May 07, 2012 at 03:00:17PM -0700, Frank Rowand wrote:
> | This is the resulting message on the ARM panda with a 'bad'
> | 32khz timer:
> | 
> | 
> |   # cyclictest -q -p 80 -t -n -l 10 -h ${hist_bins} -R 100
> |   reported clock resolution: 1 nsec
> |   measured clock resolution less than: 30517 nsec
> 
> How about using a fixed loop size (say 1000000 clock reads) to define
> the average cost of reading the clock (the second value presented above)
> instead of a variable amount of iterations? Reading the clock twice and
> calculating the average could lead to wrong impressions. Also, it would
> be interesting to run such test under a real time priority (FIFO:2, maybe?)
> to avoid too much external interference on the readings, mainly involuntary
> context switches.
> 
> Having too different values called 'clock resolution' may be a good source
> of confusion. The value of clock_getres() is the resolution, as per the
> system jargon, and the second value should be granularity, reading cost,
> the-average-time-it-takes-to-read-the-clock or something alike.

The 'measured clock resolution' is not intended to be representative of the
time it takes for clock_gettime() to execute.  It really is a measure of
the resolution that the clock is actually delivering.  It is the smallest
amount that the actual clock actually increments.

Look at the patch again:
If consecutive calls to clock_gettime() return the same value, then the
clock resolution is larger than the execution time of clock_gettime().
If this is the case, then when the clock moves to a new value, the
difference from the old value is representative of the clock resolution.

The patch calls clock_gettime() a number of times in a loop, collecting
timestamps.  The timestamps are then scanned, looking for:

  1) does a difference of zero occur?
  2) what is the minimum non-zero difference?

If a difference of zero is found, then the clock resolution is larger
than the execution time of clock_gettime().  If this resolution is
_also_ greater than the resolution that the clock claims from
clock_getres(), then this is reported.

Reporting the minimum non-zero difference is being generous.
Saying that the measured resolution is "less than" is also
being generous.  What can I say, I'm trying to be optimistic? :-)

This might be more obvious if I show you the actual clock_gettime()
values from a run on an ARM panda:

   # cyclictest -q -p 80 --smp -l 1 -R 50 -v
   
   
   For consecutive calls to clock_gettime():
   
   time, delta time (nsec)
   
   8720.940737610  0
   8720.940737610  0
   8720.940737610  0
   8720.940737610  0
   8720.940737610  0
   8720.940737610  0
   8720.940737610  0
   8720.940768124  30514
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940768124  0
   8720.940798638  30514
   8720.940798638  0
   8720.940798638  0
   reported clock resolution: 1 nsec
   measured clock resolution less than: 30514 nsec


>  
> | A possible follow on patch would be to generate a hard
> | error (fail the test) if the measured resolution was
> | above some unreasonable value (perhaps > 1 msec), but
> | allow the hard fail to be overridden with yet another
> | command line option.  Any opinions about that?
> 
> My suggestion is to keep the current behavior and add an option to
> stop/complain case the clock has a poor resolution or has a reading
> cost too high.
> 
> Luis



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt
  2012-05-08  2:05       ` Frank Rowand
@ 2012-05-08 13:53         ` Luis Claudio R. Goncalves
  0 siblings, 0 replies; 8+ messages in thread
From: Luis Claudio R. Goncalves @ 2012-05-08 13:53 UTC (permalink / raw)
  To: Frank Rowand; +Cc: Rowand, Frank, linux-rt-users, williams, jkacur, dvhart

On Mon, May 07, 2012 at 07:05:23PM -0700, Frank Rowand wrote:
| On 05/07/12 17:30, Luis Claudio R. Goncalves wrote:
| > On Mon, May 07, 2012 at 03:00:17PM -0700, Frank Rowand wrote:
| > | This is the resulting message on the ARM panda with a 'bad'
| > | 32khz timer:
| > | 
| > | 
| > |   # cyclictest -q -p 80 -t -n -l 10 -h ${hist_bins} -R 100
| > |   reported clock resolution: 1 nsec
| > |   measured clock resolution less than: 30517 nsec
| > 
| > How about using a fixed loop size (say 1000000 clock reads) to define
| > the average cost of reading the clock (the second value presented above)
| > instead of a variable amount of iterations? Reading the clock twice and
| > calculating the average could lead to wrong impressions. Also, it would
| > be interesting to run such test under a real time priority (FIFO:2, maybe?)
| > to avoid too much external interference on the readings, mainly involuntary
| > context switches.
| > 
| > Having too different values called 'clock resolution' may be a good source
| > of confusion. The value of clock_getres() is the resolution, as per the
| > system jargon, and the second value should be granularity, reading cost,
| > the-average-time-it-takes-to-read-the-clock or something alike.
| 
| The 'measured clock resolution' is not intended to be representative of the
| time it takes for clock_gettime() to execute.  It really is a measure of
| the resolution that the clock is actually delivering.  It is the smallest
| amount that the actual clock actually increments.

Sorry, I read your patch 'upside-down' and I was not considering the
possibility of clock_getres() returning a totally bogus value. Wouldn't
that be the case to fix the resolution information for this clocksource
in the kernel?

How about running your test for a fixed amount of time? 250ms sounds like a
safe amount of time and if the readings do not change in 250ms you surely
have a problem with the clock. A configurable amount of repetitions may
lead to wrong results... if you are locky enough to have the clock valu
changing right in th middle of you 10x loop, you may see latencies later
that are just fruit of the unreliable clock.

You could also, when reporting the findings about the clock resolution,
that said resolution could mask latencies as big as it. Or induce in the
results latencies as big as it.

Anyway, nice patch.

Cheers,
Luis
-- 
[ Luis Claudio R. Goncalves             Red Hat  -  Realtime Team ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-08 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07 21:42 [PATCH 1/2] RFC cyclictest: clean up --latency ordering in getopt Frank Rowand
2012-05-07 21:47 ` Frank Rowand
2012-05-07 21:52   ` Frank Rowand
2012-05-07 22:00   ` Frank Rowand
2012-05-08  0:30     ` Luis Claudio R. Goncalves
2012-05-08  2:05       ` Frank Rowand
2012-05-08 13:53         ` Luis Claudio R. Goncalves
2012-05-07 22:05   ` Frank Rowand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.