All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] IRQ next prediction and mbed governor
@ 2019-03-08 21:20 Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 1/3] genirq/timings: Remove variance computation code Daniel Lezcano
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Daniel Lezcano @ 2019-03-08 21:20 UTC (permalink / raw)
  To: rjw, tglx; +Cc: ulf.hansson, linux-pm, linux-kernel

The following patchset provides the missing bits to predict the next
event on the the current CPU by identifying three categories of wakeup
sources, the interrupts from the devices, the timers and the IPI
rescheduling.

Initially, the interrupt prediction was based on the statistical
normal law, discarding interrupt intervals out of the gaussian curve,
thus allowing to predict only interrupts coming in regular intervals.
Unfortunately this approach prevents to handle any kind of irregular
intervals, closing the door for more accuracy in the prediction.

With a derived array suffixes algorithm, it is possible to quickly
detect the repeating patterns in less than 1us per interrupt on x86.

The algorithm is described in the documentation. It is simple in
in apparence but it needs some attention to understand how it allows
to detect the pattern and return a guess estimate of the next event.

And finally, a new cpuidle governor is added to make use of these
predictions for the different wake up sources but targetting embedded
systems.

Daniel Lezcano (3):
  genirq/timings: Remove variance computation code
  genirq/timings: Add array suffix computation code
  cpuidle/drivers/mbed: Add new governor for embedded systems

 drivers/cpuidle/Kconfig            |  11 +-
 drivers/cpuidle/governors/Makefile |   1 +
 drivers/cpuidle/governors/mbed.c   | 143 +++++++++
 kernel/irq/timings.c               | 488 +++++++++++++++++++----------
 4 files changed, 485 insertions(+), 158 deletions(-)
 create mode 100644 drivers/cpuidle/governors/mbed.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] genirq/timings: Remove variance computation code
  2019-03-08 21:20 [PATCH 0/3] IRQ next prediction and mbed governor Daniel Lezcano
@ 2019-03-08 21:20 ` Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 2/3] genirq/timings: Add array suffix " Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems Daniel Lezcano
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Lezcano @ 2019-03-08 21:20 UTC (permalink / raw)
  To: rjw, tglx; +Cc: ulf.hansson, linux-pm, linux-kernel

The next patch will introduce another approach to compute the next
interrupt based on the array suffixes derived algorithm. This one
will replace the variance computation code.

The patch review will be too complex if we change the code little by
little, it is much simpler to remove the variance code and add in the
next patch the suffix array code.

Remove the variance code in timings.c

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/irq/timings.c | 252 +------------------------------------------
 1 file changed, 2 insertions(+), 250 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index 1e4cb63a5c82..3cde046a2bc8 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -8,7 +8,6 @@
 #include <linux/interrupt.h>
 #include <linux/idr.h>
 #include <linux/irq.h>
-#include <linux/math64.h>
 
 #include <trace/events/irq.h>
 
@@ -19,13 +18,7 @@ DEFINE_STATIC_KEY_FALSE(irq_timing_enabled);
 DEFINE_PER_CPU(struct irq_timings, irq_timings);
 
 struct irqt_stat {
-	u64	next_evt;
-	u64	last_ts;
-	u64	variance;
-	u32	avg;
-	u32	nr_samples;
-	int	anomalies;
-	int	valid;
+	u64     next_evt;
 };
 
 static DEFINE_IDR(irqt_stats);
@@ -40,184 +33,6 @@ void irq_timings_disable(void)
 	static_branch_disable(&irq_timing_enabled);
 }
 
-/**
- * irqs_update - update the irq timing statistics with a new timestamp
- *
- * @irqs: an irqt_stat struct pointer
- * @ts: the new timestamp
- *
- * The statistics are computed online, in other words, the code is
- * designed to compute the statistics on a stream of values rather
- * than doing multiple passes on the values to compute the average,
- * then the variance. The integer division introduces a loss of
- * precision but with an acceptable error margin regarding the results
- * we would have with the double floating precision: we are dealing
- * with nanosec, so big numbers, consequently the mantisse is
- * negligeable, especially when converting the time in usec
- * afterwards.
- *
- * The computation happens at idle time. When the CPU is not idle, the
- * interrupts' timestamps are stored in the circular buffer, when the
- * CPU goes idle and this routine is called, all the buffer's values
- * are injected in the statistical model continuying to extend the
- * statistics from the previous busy-idle cycle.
- *
- * The observations showed a device will trigger a burst of periodic
- * interrupts followed by one or two peaks of longer time, for
- * instance when a SD card device flushes its cache, then the periodic
- * intervals occur again. A one second inactivity period resets the
- * stats, that gives us the certitude the statistical values won't
- * exceed 1x10^9, thus the computation won't overflow.
- *
- * Basically, the purpose of the algorithm is to watch the periodic
- * interrupts and eliminate the peaks.
- *
- * An interrupt is considered periodically stable if the interval of
- * its occurences follow the normal distribution, thus the values
- * comply with:
- *
- *      avg - 3 x stddev < value < avg + 3 x stddev
- *
- * Which can be simplified to:
- *
- *      -3 x stddev < value - avg < 3 x stddev
- *
- *      abs(value - avg) < 3 x stddev
- *
- * In order to save a costly square root computation, we use the
- * variance. For the record, stddev = sqrt(variance). The equation
- * above becomes:
- *
- *      abs(value - avg) < 3 x sqrt(variance)
- *
- * And finally we square it:
- *
- *      (value - avg) ^ 2 < (3 x sqrt(variance)) ^ 2
- *
- *      (value - avg) x (value - avg) < 9 x variance
- *
- * Statistically speaking, any values out of this interval is
- * considered as an anomaly and is discarded. However, a normal
- * distribution appears when the number of samples is 30 (it is the
- * rule of thumb in statistics, cf. "30 samples" on Internet). When
- * there are three consecutive anomalies, the statistics are resetted.
- *
- */
-static void irqs_update(struct irqt_stat *irqs, u64 ts)
-{
-	u64 old_ts = irqs->last_ts;
-	u64 variance = 0;
-	u64 interval;
-	s64 diff;
-
-	/*
-	 * The timestamps are absolute time values, we need to compute
-	 * the timing interval between two interrupts.
-	 */
-	irqs->last_ts = ts;
-
-	/*
-	 * The interval type is u64 in order to deal with the same
-	 * type in our computation, that prevent mindfuck issues with
-	 * overflow, sign and division.
-	 */
-	interval = ts - old_ts;
-
-	/*
-	 * The interrupt triggered more than one second apart, that
-	 * ends the sequence as predictible for our purpose. In this
-	 * case, assume we have the beginning of a sequence and the
-	 * timestamp is the first value. As it is impossible to
-	 * predict anything at this point, return.
-	 *
-	 * Note the first timestamp of the sequence will always fall
-	 * in this test because the old_ts is zero. That is what we
-	 * want as we need another timestamp to compute an interval.
-	 */
-	if (interval >= NSEC_PER_SEC) {
-		memset(irqs, 0, sizeof(*irqs));
-		irqs->last_ts = ts;
-		return;
-	}
-
-	/*
-	 * Pre-compute the delta with the average as the result is
-	 * used several times in this function.
-	 */
-	diff = interval - irqs->avg;
-
-	/*
-	 * Increment the number of samples.
-	 */
-	irqs->nr_samples++;
-
-	/*
-	 * Online variance divided by the number of elements if there
-	 * is more than one sample.  Normally the formula is division
-	 * by nr_samples - 1 but we assume the number of element will be
-	 * more than 32 and dividing by 32 instead of 31 is enough
-	 * precise.
-	 */
-	if (likely(irqs->nr_samples > 1))
-		variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
-
-	/*
-	 * The rule of thumb in statistics for the normal distribution
-	 * is having at least 30 samples in order to have the model to
-	 * apply. Values outside the interval are considered as an
-	 * anomaly.
-	 */
-	if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) {
-		/*
-		 * After three consecutive anomalies, we reset the
-		 * stats as it is no longer stable enough.
-		 */
-		if (irqs->anomalies++ >= 3) {
-			memset(irqs, 0, sizeof(*irqs));
-			irqs->last_ts = ts;
-			return;
-		}
-	} else {
-		/*
-		 * The anomalies must be consecutives, so at this
-		 * point, we reset the anomalies counter.
-		 */
-		irqs->anomalies = 0;
-	}
-
-	/*
-	 * The interrupt is considered stable enough to try to predict
-	 * the next event on it.
-	 */
-	irqs->valid = 1;
-
-	/*
-	 * Online average algorithm:
-	 *
-	 *  new_average = average + ((value - average) / count)
-	 *
-	 * The variance computation depends on the new average
-	 * to be computed here first.
-	 *
-	 */
-	irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
-
-	/*
-	 * Online variance algorithm:
-	 *
-	 *  new_variance = variance + (value - average) x (value - new_average)
-	 *
-	 * Warning: irqs->avg is updated with the line above, hence
-	 * 'interval - irqs->avg' is no longer equal to 'diff'
-	 */
-	irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
-
-	/*
-	 * Update the next event
-	 */
-	irqs->next_evt = ts + irqs->avg;
-}
-
 /**
  * irq_timings_next_event - Return when the next event is supposed to arrive
  *
@@ -246,12 +61,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
  */
 u64 irq_timings_next_event(u64 now)
 {
-	struct irq_timings *irqts = this_cpu_ptr(&irq_timings);
-	struct irqt_stat *irqs;
-	struct irqt_stat __percpu *s;
-	u64 ts, next_evt = U64_MAX;
-	int i, irq = 0;
-
 	/*
 	 * This function must be called with the local irq disabled in
 	 * order to prevent the timings circular buffer to be updated
@@ -259,64 +68,7 @@ u64 irq_timings_next_event(u64 now)
 	 */
 	lockdep_assert_irqs_disabled();
 
-	/*
-	 * Number of elements in the circular buffer: If it happens it
-	 * was flushed before, then the number of elements could be
-	 * smaller than IRQ_TIMINGS_SIZE, so the count is used,
-	 * otherwise the array size is used as we wrapped. The index
-	 * begins from zero when we did not wrap. That could be done
-	 * in a nicer way with the proper circular array structure
-	 * type but with the cost of extra computation in the
-	 * interrupt handler hot path. We choose efficiency.
-	 *
-	 * Inject measured irq/timestamp to the statistical model
-	 * while decrementing the counter because we consume the data
-	 * from our circular buffer.
-	 */
-	for (i = irqts->count & IRQ_TIMINGS_MASK,
-		     irqts->count = min(IRQ_TIMINGS_SIZE, irqts->count);
-	     irqts->count > 0; irqts->count--, i = (i + 1) & IRQ_TIMINGS_MASK) {
-
-		irq = irq_timing_decode(irqts->values[i], &ts);
-
-		s = idr_find(&irqt_stats, irq);
-		if (s) {
-			irqs = this_cpu_ptr(s);
-			irqs_update(irqs, ts);
-		}
-	}
-
-	/*
-	 * Look in the list of interrupts' statistics, the earliest
-	 * next event.
-	 */
-	idr_for_each_entry(&irqt_stats, s, i) {
-
-		irqs = this_cpu_ptr(s);
-
-		if (!irqs->valid)
-			continue;
-
-		if (irqs->next_evt <= now) {
-			irq = i;
-			next_evt = now;
-
-			/*
-			 * This interrupt mustn't use in the future
-			 * until new events occur and update the
-			 * statistics.
-			 */
-			irqs->valid = 0;
-			break;
-		}
-
-		if (irqs->next_evt < next_evt) {
-			irq = i;
-			next_evt = irqs->next_evt;
-		}
-	}
-
-	return next_evt;
+	return 0;
 }
 
 void irq_timings_free(int irq)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] genirq/timings: Add array suffix computation code
  2019-03-08 21:20 [PATCH 0/3] IRQ next prediction and mbed governor Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 1/3] genirq/timings: Remove variance computation code Daniel Lezcano
@ 2019-03-08 21:20 ` Daniel Lezcano
  2019-03-24 17:44   ` Thomas Gleixner
  2019-03-08 21:20 ` [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems Daniel Lezcano
  2 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2019-03-08 21:20 UTC (permalink / raw)
  To: rjw, tglx; +Cc: ulf.hansson, linux-pm, linux-kernel

The previous variance was discarding values from the timings when they
were considered as anomalies as stated by the normal law statistical
model.

However in the interrupt life, we can have multiple anomalies due to
the nature of the device generating the interrupts, and most of the
time we can observe a repeating pattern, that is particulary true for
network, console, MMC or SSD devices.

With the variance approach, we miss the patterns and we can only deal with the
interrupt coming in regular intervals.

After a long investigation, this patch provides the array suffixes
derived algorithm where we can detect regular and repeating patterns
interrupt events.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/irq/timings.c | 432 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 427 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index 3cde046a2bc8..f88465563bd8 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -8,6 +8,8 @@
 #include <linux/interrupt.h>
 #include <linux/idr.h>
 #include <linux/irq.h>
+#include <linux/math64.h>
+#include <linux/log2.h>
 
 #include <trace/events/irq.h>
 
@@ -17,10 +19,6 @@ DEFINE_STATIC_KEY_FALSE(irq_timing_enabled);
 
 DEFINE_PER_CPU(struct irq_timings, irq_timings);
 
-struct irqt_stat {
-	u64     next_evt;
-};
-
 static DEFINE_IDR(irqt_stats);
 
 void irq_timings_enable(void)
@@ -33,6 +31,381 @@ void irq_timings_disable(void)
 	static_branch_disable(&irq_timing_enabled);
 }
 
+/*
+ * The main goal of this algorithm is to predict the next interrupt
+ * occurrence on the current CPU.
+ *
+ * Currently, the interrupt timings are stored in a circular array
+ * buffer every time there is an interrupt, as a tuple: the interrupt
+ * number and the associated timestamp when the event occurred <irq,
+ * timestamp>.
+ *
+ * For every interrupt occurring in a short period of time, we can
+ * measure the elapsed time between the occurrences for the same
+ * interrupt and we end up with a suite of intervals. The experience
+ * showed the interrupts are often coming following a periodic
+ * pattern.
+ *
+ * The objective of the algorithm is to find out this periodic pattern
+ * in a fastest way and use its period to predict the next irq event.
+ *
+ * When the next interrupt event is requested, we are in the situation
+ * where the interrupts are disabled and the circular buffer
+ * containing the timings is filled with the events which happened
+ * after the previous next-interrupt-event request.
+ *
+ * At this point, we read the circular buffer and we fill the irq
+ * related statistics structure. After this step, the circular array
+ * containing the timings is empty because all the values are
+ * dispatched in their corresponding buffers.
+ *
+ * Now for each interrupt, we can predict the next event by using the
+ * suffix array, log interval and exponential moving average
+ *
+ * 1. Suffix array
+ *
+ * Suffix array is an array of all the suffixes of a string. It is
+ * widely used as a data structure for compression, text search, ...
+ * For instance for the word 'banana', the suffixes will be: 'banana'
+ * 'anana' 'nana' 'ana' 'na' 'a'
+ *
+ * Usually, the suffix array is sorted but for our purpose it is
+ * not necessary and won't provide any improvement in the context of
+ * the solved problem where we clearly define the boundaries of the
+ * search by a max period and min period.
+ *
+ * The suffix array will build a suite of intervals of different
+ * length and will look for the repetition of each suite. If the suite
+ * is repeating then we have the period because it is the length of
+ * the suite whatever its position in the buffer.
+ *
+ * 2. Log interval
+ *
+ * We saw the irq timings allow to compute the interval of the
+ * occurrences for a specific interrupt. We can reasonibly assume the
+ * longer is the interval, the higher is the error for the next event
+ * and we can consider storing those interval values into an array
+ * where each slot in the array correspond to an interval at the power
+ * of 2 of the index. For example, index 12 will contain values
+ * between 2^11 and 2^12.
+ *
+ * At the end we have an array of values where at each index defines a
+ * [2^index - 1, 2 ^ index] interval values allowing to store a large
+ * number of values inside a small array.
+ *
+ * For example, if we have the value 1123, then we store it at
+ * ilog2(1123) = 10 index value.
+ *
+ * Storing those value at the specific index is done by computing an
+ * exponential moving average for this specific slot. For instance,
+ * for values 1800, 1123, 1453, ... fall under the same slot (10) and
+ * the exponential moving average is computed every time a new value
+ * is stored at this slot.
+ *
+ * 3. Exponential Moving Average
+ *
+ * The EMA is largely used to track a signal for stocks or as a low
+ * pass filter. The magic of the formula, is it is very simple and the
+ * reactivity of the average can be tuned with the factors called
+ * alpha.
+ *
+ * The higher the alphas are, the faster the average respond to the
+ * signal change. In our case, if a slot in the array is a big
+ * interval, we can have numbers with a big difference between
+ * them. The impact of those differences in the average computation
+ * can be tuned by changing the alpha value.
+ *
+ *
+ *  -- The algorithm --
+ *
+ * We saw the different processing above, now let's see how they are
+ * used together.
+ *
+ * For each interrupt:
+ *	For each interval:
+ *		Compute the index = ilog2(interval)
+ *		Compute a new_ema(buffer[index], interval)
+ *		Store the index in a circular buffer
+ *
+ *	Compute the suffix array of the indexes
+ *
+ *	For each suffix:
+ *		If the suffix is reverse-found 3 times
+ *			Return suffix
+ *
+ *	Return Not found
+ *
+ * However we can not have endless suffix array to be build, it won't
+ * make sense and it will add an extra overhead, so we can restrict
+ * this to a maximum suffix length of 5 and a minimum suffix length of
+ * 2. The experience showed 5 is the majority of the maximum pattern
+ * period found for different devices.
+ *
+ * The result is a pattern finding less than 1us for an interrupt.
+ *
+ * Example based on real values:
+ *
+ * Example 1 : MMC write/read interrupt interval:
+ *
+ *	223947, 1240, 1384, 1386, 1386,
+ *	217416, 1236, 1384, 1386, 1387,
+ *	214719, 1241, 1386, 1387, 1384,
+ *	213696, 1234, 1384, 1386, 1388,
+ *	219904, 1240, 1385, 1389, 1385,
+ *	212240, 1240, 1386, 1386, 1386,
+ *	214415, 1236, 1384, 1386, 1387,
+ *	214276, 1234, 1384, 1388, ?
+ *
+ * For each element, apply ilog2(value)
+ *
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, ?
+ *
+ * Max period of 5, we take the last (max_period * 3) 15 elements as
+ * we can be confident if the pattern repeats itself three times it is
+ * a repeating pattern.
+ *
+ *	             8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, 8,
+ *	15, 8, 8, 8, ?
+ *
+ * Suffixes are:
+ *
+ *  1) 8, 15, 8, 8, 8  <- max period
+ *  2) 8, 15, 8, 8
+ *  3) 8, 15, 8
+ *  4) 8, 15
+ *
+ * From there we search the repeating pattern for each suffix.
+ *
+ * buffer: 8, 15, 8, 8, 8, 8, 15, 8, 8, 8, 8, 15, 8, 8, 8
+ *         |   |  |  |  |  |   |  |  |  |  |   |  |  |  |
+ *         8, 15, 8, 8, 8  |   |  |  |  |  |   |  |  |  |
+ *                         8, 15, 8, 8, 8  |   |  |  |  |
+ *                                         8, 15, 8, 8, 8
+ *
+ * When moving the suffix, we found exactly 3 matches.
+ *
+ * The first suffix with period 5 is repeating.
+ *
+ * The next event is (3 * max_period) % suffix_period
+ *
+ * In this example, the result 0, so the next event is suffix[0] => 8
+ *
+ * However, 8 is the index in the array of exponential moving average
+ * which was calculated on the fly when storing the values, so the
+ * interval is ema[8] = 1366
+ *
+ *
+ * Example 2:
+ *
+ *	4, 3, 5, 100,
+ *	3, 3, 5, 117,
+ *	4, 4, 5, 112,
+ *	4, 3, 4, 110,
+ *	3, 5, 3, 117,
+ *	4, 4, 5, 112,
+ *	4, 3, 4, 110,
+ *	3, 4, 5, 112,
+ *	4, 3, 4, 110
+ *
+ * ilog2
+ *
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4
+ *
+ * Max period 5:
+ *	   0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4,
+ *	0, 0, 0, 4
+ *
+ * Suffixes:
+ *
+ *  1) 0, 0, 4, 0, 0
+ *  2) 0, 0, 4, 0
+ *  3) 0, 0, 4
+ *  4) 0, 0
+ *
+ * buffer: 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4
+ *         |  |  |  |  |  |  X
+ *         0, 0, 4, 0, 0, |  X
+ *                        0, 0
+ *
+ * buffer: 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 0, 4
+ *         |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
+ *         0, 0, 4, 0, |  |  |  |  |  |  |  |  |  |  |
+ *                     0, 0, 4, 0, |  |  |  |  |  |  |
+ *                                 0, 0, 4, 0, |  |  |
+ *                                             0  0  4
+ *
+ * Pattern is found 3 times, the remaining is 1 which results from
+ * (max_period * 3) % suffix_period. This value is the index in the
+ * suffix arrays. The suffix array for a period 4 has the value 4
+ * at index 1.
+ */
+#define EMA_ALPHA_VAL		64
+#define EMA_ALPHA_SHIFT		7
+
+#define PREDICTION_PERIOD_MIN	2
+#define PREDICTION_PERIOD_MAX	5
+#define PREDICTION_FACTOR	4
+#define PREDICTION_MAX		10 /* 2 ^ PREDICTION_MAX useconds */
+#define PREDICTION_BUFFER_SIZE	16 /* slots for EMAs, hardly more than 16 */
+
+struct irqt_stat {
+	u64	last_ts;
+	int	ema_time[PREDICTION_BUFFER_SIZE];
+	int	timings[IRQ_TIMINGS_SIZE];
+	int	circ_timings[IRQ_TIMINGS_SIZE];
+	int	count;
+};
+
+/*
+ * Exponential moving average computation
+ */
+static int irq_timings_ema_new(s64 value, s64 ema_old)
+{
+	if (likely(ema_old))
+		return ema_old + (((value - ema_old) * EMA_ALPHA_VAL) >>
+				  EMA_ALPHA_SHIFT);
+	return value;
+}
+
+static int irq_timings_next_event_index(int *buffer, size_t len, int period_max)
+{
+	int i;
+
+	for (i = period_max; i >= PREDICTION_PERIOD_MIN ; i--) {
+
+		int *begin = &buffer[len - (i * 3)];
+		int *ptr = begin;
+
+		while (!memcmp(ptr, begin, i * sizeof(*ptr))) {
+			ptr += i;
+			if (ptr >= &buffer[len])
+				return begin[((i * 3) % i)];
+		}
+	}
+
+	return -1;
+}
+
+static u64 __irq_timings_next_event(struct irqt_stat *irqs, int irq, u64 now)
+{
+	int index, i;
+	int period_max;
+	int count, start;
+	int min = INT_MAX;
+
+	if ((now - irqs->last_ts) >= NSEC_PER_SEC) {
+		irqs->count = irqs->last_ts = 0;
+		return U64_MAX;
+	}
+
+	period_max = irqs->count > (3 * PREDICTION_PERIOD_MAX) ?
+		PREDICTION_PERIOD_MAX : irqs->count / 3;
+
+	if (period_max <= PREDICTION_PERIOD_MIN)
+		return U64_MAX;
+
+	/*
+	 * 'count' will depends if the circular buffer wrapped or not
+	 */
+	count = irqs->count < IRQ_TIMINGS_SIZE ?
+		irqs->count : IRQ_TIMINGS_SIZE;
+
+	start = irqs->count < IRQ_TIMINGS_SIZE ?
+		0 : (irqs->count & IRQ_TIMINGS_MASK);
+
+	/*
+	 * Copy the content of the circular buffer into another buffer
+	 * in order to linearize the buffer instead of dealing with
+	 * wrapping indexes and shifted array which will be prone to
+	 * error and extremelly difficult to debug.
+	 */
+	for (i = 0; i < count; i++) {
+		irqs->timings[i] = irqs->circ_timings[(start + i) &
+						      IRQ_TIMINGS_MASK];
+		min = min_t(int, irqs->timings[i], min);
+	}
+
+	index = irq_timings_next_event_index(irqs->timings, count, period_max);
+	if (index < 0)
+		return irqs->last_ts + min;
+
+	return irqs->last_ts + irqs->ema_time[index];
+}
+
+static inline void irq_timings_store(int irq, struct irqt_stat *irqs, u64 ts)
+{
+	u64 old_ts = irqs->last_ts;
+	u64 interval;
+	int index;
+
+	/*
+	 * The timestamps are absolute time values, we need to compute
+	 * the timing interval between two interrupts.
+	 */
+	irqs->last_ts = ts;
+
+	/*
+	 * The interval type is u64 in order to deal with the same
+	 * type in our computation, that prevent mindfuck issues with
+	 * overflow, sign and division.
+	 */
+	interval = ts - old_ts;
+
+	/*
+	 * The interrupt triggered more than one second apart, that
+	 * ends the sequence as predictible for our purpose. In this
+	 * case, assume we have the beginning of a sequence and the
+	 * timestamp is the first value. As it is impossible to
+	 * predict anything at this point, return.
+	 *
+	 * Note the first timestamp of the sequence will always fall
+	 * in this test because the old_ts is zero. That is what we
+	 * want as we need another timestamp to compute an interval.
+	 */
+	if (interval >= NSEC_PER_SEC) {
+		irqs->count = 0;
+		return;
+	}
+
+	/*
+	 * Get the index in the ema table for this interrupt. The
+	 * PREDICTION_FACTOR increase the interval size for the array
+	 * of exponential average.
+	 */
+	index = likely(interval) ?
+		ilog2((interval >> 10) / PREDICTION_FACTOR) : 0;
+
+	/*
+	 * Store the index as an element of the pattern in another
+	 * circular array.
+	 */
+	irqs->circ_timings[irqs->count & IRQ_TIMINGS_MASK] = index;
+
+	irqs->ema_time[index] = irq_timings_ema_new(interval,
+						    irqs->ema_time[index]);
+
+	irqs->count++;
+}
+
 /**
  * irq_timings_next_event - Return when the next event is supposed to arrive
  *
@@ -61,6 +434,12 @@ void irq_timings_disable(void)
  */
 u64 irq_timings_next_event(u64 now)
 {
+	struct irq_timings *irqts = this_cpu_ptr(&irq_timings);
+	struct irqt_stat *irqs;
+	struct irqt_stat __percpu *s;
+	u64 ts, next_evt = U64_MAX;
+	int i, irq = 0;
+
 	/*
 	 * This function must be called with the local irq disabled in
 	 * order to prevent the timings circular buffer to be updated
@@ -68,7 +447,50 @@ u64 irq_timings_next_event(u64 now)
 	 */
 	lockdep_assert_irqs_disabled();
 
-	return 0;
+	if (!irqts->count)
+		return next_evt;
+
+	/*
+	 * Number of elements in the circular buffer: If it happens it
+	 * was flushed before, then the number of elements could be
+	 * smaller than IRQ_TIMINGS_SIZE, so the count is used,
+	 * otherwise the array size is used as we wrapped. The index
+	 * begins from zero when we did not wrap. That could be done
+	 * in a nicer way with the proper circular array structure
+	 * type but with the cost of extra computation in the
+	 * interrupt handler hot path. We choose efficiency.
+	 *
+	 * Inject measured irq/timestamp to the pattern prediction
+	 * model while decrementing the counter because we consume the
+	 * data from our circular buffer.
+	 */
+	for (i = (irqts->count & IRQ_TIMINGS_MASK) - 1,
+		     irqts->count = min(IRQ_TIMINGS_SIZE, irqts->count);
+	     irqts->count > 0; irqts->count--, i = (i + 1) & IRQ_TIMINGS_MASK) {
+
+		irq = irq_timing_decode(irqts->values[i], &ts);
+		s = idr_find(&irqt_stats, irq);
+		if (s)
+			irq_timings_store(irq, this_cpu_ptr(s), ts);
+	}
+
+	/*
+	 * Look in the list of interrupts' statistics, the earliest
+	 * next event.
+	 */
+	idr_for_each_entry(&irqt_stats, s, i) {
+
+		irqs = this_cpu_ptr(s);
+
+		ts = __irq_timings_next_event(irqs, i, now);
+		if (ts <= now)
+			return now;
+
+		if (ts < next_evt)
+			next_evt = ts;
+	}
+
+	return next_evt;
 }
 
 void irq_timings_free(int irq)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems
  2019-03-08 21:20 [PATCH 0/3] IRQ next prediction and mbed governor Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 1/3] genirq/timings: Remove variance computation code Daniel Lezcano
  2019-03-08 21:20 ` [PATCH 2/3] genirq/timings: Add array suffix " Daniel Lezcano
@ 2019-03-08 21:20 ` Daniel Lezcano
  2019-04-02 13:22   ` Quentin Perret
  2 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2019-03-08 21:20 UTC (permalink / raw)
  To: rjw, tglx; +Cc: ulf.hansson, linux-pm, linux-kernel

The objective is the same for all the governors: save energy, but at
the end the governors menu, ladder and teo aim to improve the
performances with an acceptable energy drop for some workloads which
are identified for servers and desktops (with the help of a firmware).

It is very difficult to do changes in these governors for embedded
systems without impacting performances on servers/desktops or ruin the
optimizations for the workloads on these platforms.

The mbed governor is a new governor targeting embedded systems
running on battery where the energy saving has a higher priority than
servers or desktops. This governor aims to save energy as much as
possible but accepting a performance degradation.

In this way, we can optimize the governor for specific mobile workload
and more generally embedded systems without impacting other platforms.

The governor is based on the irq timings where we predict the next
interrupt occurrences on the current CPU and the next timer. It is
well suited for mobile and more generally embedded systems where the
interrupts are usually pinned on one CPU and where the power is more
important than the performances.

The multimedia applications on the embedded system spawns several
threads which are migrated across the different CPUs and waking
between them up. In order to catch this situation we have also to
track the idle task rescheduling duration with a relative degree of
confidence as the scheduler is involved in the task migrations. The
resched information is in the scope of the governor via the reflect
callback.

The governor begins with a clean foundation basing the prediction on
the irq behavior returned by the irq timings, the timers and the idle
task rescheduling. The advantage of the approach is we have a full
view of the wakeup sources as we identify them separately and then we
can control the situation without relying on biased heuristics.

Despite the naive idle state selection for this first iteration, the
governor provides a performance improvement of 30% for Jankbench
throughout for the same amount of energy than the menu governor and a
similar energy saving for other workloads. There are areas of
improvement for the different workloads which will be optimized
iteratively with non-regression testing for previous identified
workloads on an Android reference platform.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/cpuidle/Kconfig            |  11 ++-
 drivers/cpuidle/governors/Makefile |   1 +
 drivers/cpuidle/governors/mbed.c   | 143 +++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+), 1 deletion(-)
 create mode 100644 drivers/cpuidle/governors/mbed.c

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index 8caccbbd7353..6fef94c9bde0 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -4,7 +4,7 @@ config CPU_IDLE
 	bool "CPU idle PM support"
 	default y if ACPI || PPC_PSERIES
 	select CPU_IDLE_GOV_LADDER if (!NO_HZ && !NO_HZ_IDLE)
-	select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) && !CPU_IDLE_GOV_TEO
+	select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) && !CPU_IDLE_GOV_TEO && !CPU_IDLE_GOV_MBED
 	help
 	  CPU idle is a generic framework for supporting software-controlled
 	  idle processor power management.  It includes modular cross-platform
@@ -32,6 +32,15 @@ config CPU_IDLE_GOV_TEO
 	  Some workloads benefit from using it and it generally should be safe
 	  to use.  Say Y here if you are not happy with the alternatives.
 
+config CPU_IDLE_GOV_MBED
+	bool "Embedded governor"
+	select IRQ_TIMINGS
+	help
+	  The embedded governor is based on irq timings measurements and
+	  pattern research combined with the next timer. This governor
+	  suits very well on embedded systems where the interrupts are
+	  grouped on a single core and the power is the priority.
+
 config DT_IDLE_STATES
 	bool
 
diff --git a/drivers/cpuidle/governors/Makefile b/drivers/cpuidle/governors/Makefile
index 4d8aff5248a8..d4571935575c 100644
--- a/drivers/cpuidle/governors/Makefile
+++ b/drivers/cpuidle/governors/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o
 obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o
 obj-$(CONFIG_CPU_IDLE_GOV_TEO) += teo.o
+obj-$(CONFIG_CPU_IDLE_GOV_MBED) += mbed.o
diff --git a/drivers/cpuidle/governors/mbed.c b/drivers/cpuidle/governors/mbed.c
new file mode 100644
index 000000000000..8d8c8ba92533
--- /dev/null
+++ b/drivers/cpuidle/governors/mbed.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019, Linaro Ltd
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ */
+#include <linux/cpuidle.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/tick.h>
+#include <linux/interrupt.h>
+#include <linux/sched/clock.h>
+
+struct mbed_device {
+	u64 idle_ema_avg;
+	u64 idle_total;
+	unsigned long last_jiffies;
+};
+
+#define EMA_ALPHA_VAL		64
+#define EMA_ALPHA_SHIFT		7
+#define MAX_RESCHED_INTERVAL_MS	100
+
+static DEFINE_PER_CPU(struct mbed_device, mbed_devices);
+
+static int mbed_ema_new(s64 value, s64 ema_old)
+{
+	if (likely(ema_old))
+		return ema_old + (((value - ema_old) * EMA_ALPHA_VAL) >>
+				  EMA_ALPHA_SHIFT);
+	return value;
+}
+
+static void mbed_reflect(struct cpuidle_device *dev, int index)
+{
+        struct mbed_device *mbed_dev = this_cpu_ptr(&mbed_devices);
+
+	/*
+	 * The idle task was not rescheduled since
+	 * MAX_RESCHED_INTERVAL_MS, let's consider the duration is
+	 * long enough to clear our stats.
+	 */
+	if (time_after(jiffies, mbed_dev->last_jiffies +
+		       msecs_to_jiffies(MAX_RESCHED_INTERVAL_MS)))
+		mbed_dev->idle_ema_avg = 0;
+
+	/*
+	 * Sum all the residencies in order to compute the total
+	 * duration of the idle task.
+	 */
+	mbed_dev->idle_total += dev->last_residency;
+
+	/*
+	 * We exited the idle state with the need_resched() flag, the
+	 * idle task will be rescheduled, so store the duration the
+	 * idle task was scheduled in an exponential moving average and
+	 * reset the total of the idle duration.
+	 */
+	if (need_resched()) {
+		mbed_dev->idle_ema_avg = mbed_ema_new(mbed_dev->idle_total,
+						      mbed_dev->idle_ema_avg);
+		mbed_dev->idle_total = 0;
+		mbed_dev->last_jiffies = jiffies;
+	}
+}
+
+static int mbed_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		       bool *stop_tick)
+{
+        struct mbed_device *mbed_dev = this_cpu_ptr(&mbed_devices);
+	int latency_req = cpuidle_governor_latency_req(dev->cpu);
+	int i, index = 0;
+	ktime_t delta_next;
+	u64 now, irq_length, timer_length;
+	u64 idle_duration_us;
+
+	/*
+	 * Get the present time as reference for the next steps
+	 */
+	now = local_clock();
+
+	/*
+	 * Get the next interrupt event giving the 'now' as a
+	 * reference, if the next event appears to have already
+	 * expired then we get the 'now' returned which ends up with a
+	 * zero duration.
+	 */
+	irq_length = irq_timings_next_event(now) - now;
+
+	/*
+	 * Get the timer duration before expiration.
+	 */
+	timer_length = ktime_to_ns(tick_nohz_get_sleep_length(&delta_next));
+
+	/*
+	 * Get the smallest duration between the timer and the irq next event.
+	 */
+	idle_duration_us = min_t(u64, irq_length, timer_length) / NSEC_PER_USEC;
+
+	/*
+	 * Get the idle task duration average if the information is
+	 * available.
+	 */
+	if (mbed_dev->idle_ema_avg)
+		idle_duration_us = min_t(u64, idle_duration_us,
+					 mbed_dev->idle_ema_avg);
+
+	for (i = 0; i < drv->state_count; i++) {
+		struct cpuidle_state *s = &drv->states[i];
+		struct cpuidle_state_usage *su = &dev->states_usage[i];
+
+		if (s->disabled || su->disable)
+			continue;
+
+		if (s->exit_latency > latency_req)
+			break;
+
+		if (s->target_residency > idle_duration_us)
+			break;
+
+		index = i;
+	}
+
+	if (!index)
+		*stop_tick = false;
+
+	return index;
+}
+
+static struct cpuidle_governor mbed_governor = {
+	.name =		"mbed",
+	.rating =	20,
+	.select =	mbed_select,
+	.reflect =	mbed_reflect,
+};
+
+static int __init init_governor(void)
+{
+	irq_timings_enable();
+	return cpuidle_register_governor(&mbed_governor);
+}
+
+postcore_initcall(init_governor);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] genirq/timings: Add array suffix computation code
  2019-03-08 21:20 ` [PATCH 2/3] genirq/timings: Add array suffix " Daniel Lezcano
@ 2019-03-24 17:44   ` Thomas Gleixner
  2019-03-26 15:22     ` Daniel Lezcano
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2019-03-24 17:44 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, ulf.hansson, linux-pm, linux-kernel

Daniel,

On Fri, 8 Mar 2019, Daniel Lezcano wrote:

> The previous variance was discarding values from the timings when they
> were considered as anomalies as stated by the normal law statistical
> model.
> 
> However in the interrupt life, we can have multiple anomalies due to
> the nature of the device generating the interrupts, and most of the
> time we can observe a repeating pattern, that is particulary true for
> network, console, MMC or SSD devices.
> 
> With the variance approach, we miss the patterns and we can only deal with the
> interrupt coming in regular intervals.
> 
> After a long investigation, this patch provides the array suffixes
> derived algorithm where we can detect regular and repeating patterns
> interrupt events.

'This patch', 'we' .... Documentation/process/submitting-patches.rst :)

> +#define EMA_ALPHA_VAL		64
> +#define EMA_ALPHA_SHIFT		7
> +
> +#define PREDICTION_PERIOD_MIN	2
> +#define PREDICTION_PERIOD_MAX	5
> +#define PREDICTION_FACTOR	4
> +#define PREDICTION_MAX		10 /* 2 ^ PREDICTION_MAX useconds */
> +#define PREDICTION_BUFFER_SIZE	16 /* slots for EMAs, hardly more than 16 */
> +
> +struct irqt_stat {
> +	u64	last_ts;
> +	int	ema_time[PREDICTION_BUFFER_SIZE];
> +	int	timings[IRQ_TIMINGS_SIZE];
> +	int	circ_timings[IRQ_TIMINGS_SIZE];
> +	int	count;
> +};
> +
> +/*
> + * Exponential moving average computation
> + */
> +static int irq_timings_ema_new(s64 value, s64 ema_old)

There is a mixed bag of s64/u64 all over this code. Please stay
consistent. We had enough sign confusion bugs in the past.

> +{
> +	if (likely(ema_old))
> +		return ema_old + (((value - ema_old) * EMA_ALPHA_VAL) >>
> +				  EMA_ALPHA_SHIFT);

Eew. really.

> +	return value;

What's wrong with doing:

	if (unlikely(!ema_old))
		return value;

	value = (value - ema_old) * EMA_ALPHA_VAL;
	return ema_old + value >> EMA_ALPHA_SHIFT;

Hmm? Becomes too readable, right?

> +}
> +
> +static int irq_timings_next_event_index(int *buffer, size_t len, int period_max)
> +{
> +	int i;
> +
> +	for (i = period_max; i >= PREDICTION_PERIOD_MIN ; i--) {
> +
> +		int *begin = &buffer[len - (i * 3)];
> +		int *ptr = begin;
> +
> +		while (!memcmp(ptr, begin, i * sizeof(*ptr))) {
> +			ptr += i;
> +			if (ptr >= &buffer[len])
> +				return begin[((i * 3) % i)];
> +		}

Some comment for these magic loops might be helpful.

> +	}
> +
> +	return -1;
> +}
> +
> +static u64 __irq_timings_next_event(struct irqt_stat *irqs, int irq, u64 now)
> +{
> +	int index, i;
> +	int period_max;
> +	int count, start;
> +	int min = INT_MAX;

Please coalesce same type variables in one line.

> +
> +	if ((now - irqs->last_ts) >= NSEC_PER_SEC) {
> +		irqs->count = irqs->last_ts = 0;
> +		return U64_MAX;
> +	}
> +
> +	period_max = irqs->count > (3 * PREDICTION_PERIOD_MAX) ?
> +		PREDICTION_PERIOD_MAX : irqs->count / 3;
> +
> +	if (period_max <= PREDICTION_PERIOD_MIN)
> +		return U64_MAX;

The above lacks a comment as it's confusing the casual reader.

> +
> +	/*
> +	 * 'count' will depends if the circular buffer wrapped or not
> +	 */
> +	count = irqs->count < IRQ_TIMINGS_SIZE ?
> +		irqs->count : IRQ_TIMINGS_SIZE;
> +
> +	start = irqs->count < IRQ_TIMINGS_SIZE ?
> +		0 : (irqs->count & IRQ_TIMINGS_MASK);
> +
> +	/*
> +	 * Copy the content of the circular buffer into another buffer
> +	 * in order to linearize the buffer instead of dealing with
> +	 * wrapping indexes and shifted array which will be prone to
> +	 * error and extremelly difficult to debug.
> +	 */
> +	for (i = 0; i < count; i++) {
> +		irqs->timings[i] = irqs->circ_timings[(start + i) &
> +						      IRQ_TIMINGS_MASK];

Using helper variables increases readability by avoiding the ugly line
breaks.

> +		min = min_t(int, irqs->timings[i], min);
> +	}
> +
> +	index = irq_timings_next_event_index(irqs->timings, count, period_max);
> +	if (index < 0)
> +		return irqs->last_ts + min;

Again type mismatch. return u64 + int 

> +
> +	return irqs->last_ts + irqs->ema_time[index];
> +}
> +
>  /**
>   * irq_timings_next_event - Return when the next event is supposed to arrive
>   *
> @@ -61,6 +434,12 @@ void irq_timings_disable(void)
>   */
>  u64 irq_timings_next_event(u64 now)
>  {
> +	struct irq_timings *irqts = this_cpu_ptr(&irq_timings);
> +	struct irqt_stat *irqs;
> +	struct irqt_stat __percpu *s;
> +	u64 ts, next_evt = U64_MAX;
> +	int i, irq = 0;
> +
>  	/*
>  	 * This function must be called with the local irq disabled in
>  	 * order to prevent the timings circular buffer to be updated
> @@ -68,7 +447,50 @@ u64 irq_timings_next_event(u64 now)
>  	 */
>  	lockdep_assert_irqs_disabled();
>  
> -	return 0;
> +	if (!irqts->count)
> +		return next_evt;
> +
> +	/*
> +	 * Number of elements in the circular buffer: If it happens it
> +	 * was flushed before, then the number of elements could be
> +	 * smaller than IRQ_TIMINGS_SIZE, so the count is used,
> +	 * otherwise the array size is used as we wrapped. The index
> +	 * begins from zero when we did not wrap. That could be done
> +	 * in a nicer way with the proper circular array structure
> +	 * type but with the cost of extra computation in the
> +	 * interrupt handler hot path. We choose efficiency.
> +	 *
> +	 * Inject measured irq/timestamp to the pattern prediction
> +	 * model while decrementing the counter because we consume the
> +	 * data from our circular buffer.
> +	 */
> +	for (i = (irqts->count & IRQ_TIMINGS_MASK) - 1,
> +		     irqts->count = min(IRQ_TIMINGS_SIZE, irqts->count);
> +	     irqts->count > 0; irqts->count--, i = (i + 1) & IRQ_TIMINGS_MASK) {

Uuurgh, this is unparseable. The initialization of i and irqts->coint can
move before the loop and then the rest becomes readable.

> +
> +		irq = irq_timing_decode(irqts->values[i], &ts);
> +		s = idr_find(&irqt_stats, irq);
> +		if (s)
> +			irq_timings_store(irq, this_cpu_ptr(s), ts);
> +	}

Other than that this looks good to me. Nice work!

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] genirq/timings: Add array suffix computation code
  2019-03-24 17:44   ` Thomas Gleixner
@ 2019-03-26 15:22     ` Daniel Lezcano
  2019-03-26 16:04       ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2019-03-26 15:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: rjw, ulf.hansson, linux-pm, linux-kernel


Hi Thomas,

thanks for reviewing this patch.

[ ... ]

>> +
>> +/*
>> + * Exponential moving average computation
>> + */
>> +static int irq_timings_ema_new(s64 value, s64 ema_old)
> 
> There is a mixed bag of s64/u64 all over this code. Please stay
> consistent. We had enough sign confusion bugs in the past.

Right.

I have a question, ema_old and value will be always u64 type and the
function irq_timings_ema_new() will return an u64 ...

> 	value = (value - ema_old) * EMA_ALPHA_VAL;
> 	return ema_old + value >> EMA_ALPHA_SHIFT;

... how can I deal with the operations above when value < ema_old ?

Shall I use an intermediate s64 ?

eg:

	s64 aux = (value - ema_old) * EMA_ALPHA_VAL;
	return ema_old + aux >> EMA_ALPHA_SHIFT;
?

[ ... ]

 > Other than that this looks good to me. Nice work!

Thanks, I appreciate.

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] genirq/timings: Add array suffix computation code
  2019-03-26 15:22     ` Daniel Lezcano
@ 2019-03-26 16:04       ` Thomas Gleixner
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2019-03-26 16:04 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, ulf.hansson, linux-pm, linux-kernel

Daniel,

On Tue, 26 Mar 2019, Daniel Lezcano wrote:
> >> +/*
> >> + * Exponential moving average computation
> >> + */
> >> +static int irq_timings_ema_new(s64 value, s64 ema_old)
> > 
> > There is a mixed bag of s64/u64 all over this code. Please stay
> > consistent. We had enough sign confusion bugs in the past.
> 
> Right.
> 
> I have a question, ema_old and value will be always u64 type and the
> function irq_timings_ema_new() will return an u64 ...
> 
> > 	value = (value - ema_old) * EMA_ALPHA_VAL;
> > 	return ema_old + value >> EMA_ALPHA_SHIFT;
> 
> ... how can I deal with the operations above when value < ema_old ?
> 
> Shall I use an intermediate s64 ?
> 
> eg:
> 
> 	s64 aux = (value - ema_old) * EMA_ALPHA_VAL;
> 	return ema_old + aux >> EMA_ALPHA_SHIFT;
> ?

That should work if ema_old is not ever having the topmost bit set :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems
  2019-03-08 21:20 ` [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems Daniel Lezcano
@ 2019-04-02 13:22   ` Quentin Perret
  2019-04-02 16:02     ` Daniel Lezcano
  0 siblings, 1 reply; 10+ messages in thread
From: Quentin Perret @ 2019-04-02 13:22 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, tglx, ulf.hansson, linux-pm, linux-kernel

Hi Daniel,

On Friday 08 Mar 2019 at 22:20:47 (+0100), Daniel Lezcano wrote:
> Despite the naive idle state selection for this first iteration, the
> governor provides a performance improvement of 30% for Jankbench
> throughout for the same amount of energy than the menu governor

Wow, that sounds really good. Which kernel and board did you use for
testing ? What's the perf. metric here ? The 99% percentile of frame
completion time ?

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems
  2019-04-02 13:22   ` Quentin Perret
@ 2019-04-02 16:02     ` Daniel Lezcano
  2019-04-02 16:26       ` Quentin Perret
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2019-04-02 16:02 UTC (permalink / raw)
  To: Quentin Perret; +Cc: rjw, tglx, ulf.hansson, linux-pm, linux-kernel

On 02/04/2019 15:22, Quentin Perret wrote:
> Hi Daniel,
> 
> On Friday 08 Mar 2019 at 22:20:47 (+0100), Daniel Lezcano wrote:
>> Despite the naive idle state selection for this first iteration, the
>> governor provides a performance improvement of 30% for Jankbench
>> throughout for the same amount of energy than the menu governor
> 
> Wow, that sounds really good. Which kernel and board did you use for
> testing ? What's the perf. metric here ? The 99% percentile of frame
> completion time ?

Hi Quentin,

I tried on a hikey960 with an android kernel 4.19.

The used percentile is the 95%, the tests are jankbench list_view and
edit_text, and exoplayer. Values are normalized against the menu governor:

list_view:
 frame duration -35%
 count: +20%
 energy: +1%

edit text:
 frame duration -45%
 count: +27%
 energy: -0.3%

For audio and video playback, there is no frame drop.

audio
 - energy: -2.8%

video
 - energy: -3.8%

These are preliminary results.

  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems
  2019-04-02 16:02     ` Daniel Lezcano
@ 2019-04-02 16:26       ` Quentin Perret
  0 siblings, 0 replies; 10+ messages in thread
From: Quentin Perret @ 2019-04-02 16:26 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, tglx, ulf.hansson, linux-pm, linux-kernel

On Tuesday 02 Apr 2019 at 18:02:02 (+0200), Daniel Lezcano wrote:
> I tried on a hikey960 with an android kernel 4.19.

OK. ISTR Hikey 960 had quite a few problems with idle states recently.
At some point it was so horribly broken that you could get something
like a 3x perf bonus in Jankbench _and_ save energy by just disabling
all idles states no ? I haven't been following closely but have these
issues been resolved ?

In any cases, your results seem to go in the right direction, and that's
not a blocker to go on with the review so :-)

> 
> The used percentile is the 95%, the tests are jankbench list_view and
> edit_text, and exoplayer. Values are normalized against the menu governor:
> 
> list_view:
>  frame duration -35%
>  count: +20%
>  energy: +1%
> 
> edit text:
>  frame duration -45%
>  count: +27%
>  energy: -0.3%
> 
> For audio and video playback, there is no frame drop.
> 
> audio
>  - energy: -2.8%
> 
> video
>  - energy: -3.8%
> 
> These are preliminary results.

Thanks for the details. Assuming the board support isn't too broken,
these look pretty good.

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-02 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 21:20 [PATCH 0/3] IRQ next prediction and mbed governor Daniel Lezcano
2019-03-08 21:20 ` [PATCH 1/3] genirq/timings: Remove variance computation code Daniel Lezcano
2019-03-08 21:20 ` [PATCH 2/3] genirq/timings: Add array suffix " Daniel Lezcano
2019-03-24 17:44   ` Thomas Gleixner
2019-03-26 15:22     ` Daniel Lezcano
2019-03-26 16:04       ` Thomas Gleixner
2019-03-08 21:20 ` [PATCH 3/3] cpuidle/drivers/mbed: Add new governor for embedded systems Daniel Lezcano
2019-04-02 13:22   ` Quentin Perret
2019-04-02 16:02     ` Daniel Lezcano
2019-04-02 16:26       ` Quentin Perret

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.