linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 0/2] CPU-Idle latency selftest framework
@ 2021-04-04  8:33 Pratik Rajesh Sampat
  2021-04-04  8:33 ` [RFC v3 1/2] cpuidle: Extract IPI based and timer based wakeup latency from idle states Pratik Rajesh Sampat
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Pratik Rajesh Sampat @ 2021-04-04  8:33 UTC (permalink / raw)
  To: rjw, daniel.lezcano, shuah, dsmythies, ego, svaidy, linux-pm,
	linux-kernel, linux-kselftest, pratik.r.sampat, psampat

Changelog
RFC v2-->v3

Based on comments by Doug Smythies,
1. Changed commit log to reflect the test must be run as super user.
2. Added a comment specifying a method to run the test bash script
without recompiling.
3. Enable all the idle states after the experiments are completed so
that the system is in a coherent state after the tests have run
4. Correct the return status of a CPU that cannot be off-lined.

RFC v2: https://lkml.org/lkml/2021/4/1/615
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.

The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,

IPI test:
    ipi_cpu_dest = Destination CPU for the IPI
    ipi_cpu_src = Origin of the IPI
    ipi_latency_ns = Measured latency time in ns
Timeout test:
    timeout_cpu_src = CPU on which the timer to be queued
    timeout_expected_ns = Timer duration
    timeout_diff_ns = Difference of actual duration vs expected timer

Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
# 
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024


Things to keep in mind:

1. This kernel module + bash driver does not guarantee idleness on a
   core when the IPI and the Timer is armed. It only invokes sleep and
   hopes that the core is idle once the IPI/Timer is invoked onto it.
   Hence this program must be run on a completely idle system for best
   results

2. Even on a completely idle system, there maybe book-keeping tasks or
   jitter tasks that can run on the core we want idle. This can create
   outliers in the latency measurement. Thankfully, these outliers
   should be large enough to easily weed them out.

3. A userspace only selftest variant was also sent out as RFC based on
   suggestions over the previous patchset to simply the kernel
   complexeity. However, a userspace only approach had more noise in
   the latency measurement due to userspace-kernel interactions
   which led to run to run variance and a lesser accurate test.
   Another downside of the nature of a userspace program is that it
   takes orders of magnitude longer to complete a full system test
   compared to the kernel framework.
   RFC patch: https://lkml.org/lkml/2020/9/2/356

4. For Intel Systems, the Timer based latencies don't exactly give out
   the measure of idle latencies. This is because of a hardware
   optimization mechanism that pre-arms a CPU when a timer is set to
   wakeup. That doesn't make this metric useless for Intel systems,
   it just means that is measuring IPI/Timer responding latency rather
   than idle wakeup latencies.
   (Source: https://lkml.org/lkml/2020/9/2/610)
   For solution to this problem, a hardware based latency analyzer is
   devised by Artem Bityutskiy from Intel.
   https://youtu.be/Opk92aQyvt0?t=8266
   https://intel.github.io/wult/

Pratik Rajesh Sampat (2):
  cpuidle: Extract IPI based and timer based wakeup latency from idle
    states
  selftest/cpuidle: Add support for cpuidle latency measurement

 drivers/cpuidle/Makefile                   |   1 +
 drivers/cpuidle/test-cpuidle_latency.c     | 157 ++++++++++
 lib/Kconfig.debug                          |  10 +
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/cpuidle/Makefile   |   6 +
 tools/testing/selftests/cpuidle/cpuidle.sh | 326 +++++++++++++++++++++
 tools/testing/selftests/cpuidle/settings   |   2 +
 7 files changed, 503 insertions(+)
 create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
 create mode 100644 tools/testing/selftests/cpuidle/Makefile
 create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
 create mode 100644 tools/testing/selftests/cpuidle/settings

-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v3 1/2] cpuidle: Extract IPI based and timer based wakeup latency from idle states
  2021-04-04  8:33 [RFC v3 0/2] CPU-Idle latency selftest framework Pratik Rajesh Sampat
@ 2021-04-04  8:33 ` Pratik Rajesh Sampat
  2021-04-04  8:33 ` [RFC v3 2/2] selftest/cpuidle: Add support for cpuidle latency measurement Pratik Rajesh Sampat
  2021-04-09  5:23 ` [RFC v3 0/2] CPU-Idle latency selftest framework Doug Smythies
  2 siblings, 0 replies; 6+ messages in thread
From: Pratik Rajesh Sampat @ 2021-04-04  8:33 UTC (permalink / raw)
  To: rjw, daniel.lezcano, shuah, dsmythies, ego, svaidy, linux-pm,
	linux-kernel, linux-kselftest, pratik.r.sampat, psampat

Introduce a mechanism to fire directed IPIs from a specified source CPU
to a specified target CPU and measure the difference in time incurred on
wakeup.

Also, introduce a mechanism to queue a HR timer on a specified CPU and
subsequently measure the time taken to wakeup the CPU.

Finally define a simple debugfs interface to control the knobs to fire
the IPI and Timer events on specified CPU and view their incurred idle
wakeup latencies.

Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com>
---
 drivers/cpuidle/Makefile               |   1 +
 drivers/cpuidle/test-cpuidle_latency.c | 157 +++++++++++++++++++++++++
 lib/Kconfig.debug                      |  10 ++
 3 files changed, 168 insertions(+)
 create mode 100644 drivers/cpuidle/test-cpuidle_latency.c

diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 26bbc5e74123..3b4ee06a9164 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
 obj-$(CONFIG_DT_IDLE_STATES)		  += dt_idle_states.o
 obj-$(CONFIG_ARCH_HAS_CPU_RELAX)	  += poll_state.o
 obj-$(CONFIG_HALTPOLL_CPUIDLE)		  += cpuidle-haltpoll.o
+obj-$(CONFIG_IDLE_LATENCY_SELFTEST)	  += test-cpuidle_latency.o
 
 ##################################################################################
 # ARM SoC drivers
diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c
new file mode 100644
index 000000000000..f138011ac225
--- /dev/null
+++ b/drivers/cpuidle/test-cpuidle_latency.c
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Module-based API test facility for cpuidle latency using IPIs and timers
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/*
+ * IPI based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a IPI sent from another CPU
+ * The latency measured also includes the latency of sending the IPI
+ */
+struct latency {
+	unsigned int src_cpu;
+	unsigned int dest_cpu;
+	ktime_t time_start;
+	ktime_t time_end;
+	u64 latency_ns;
+} ipi_wakeup;
+
+static void measure_latency(void *info)
+{
+	struct latency *v;
+	ktime_t time_diff;
+
+	v = (struct latency *)info;
+	v->time_end = ktime_get();
+	time_diff = ktime_sub(v->time_end, v->time_start);
+	v->latency_ns = ktime_to_ns(time_diff);
+}
+
+void run_smp_call_function_test(unsigned int cpu)
+{
+	ipi_wakeup.src_cpu = smp_processor_id();
+	ipi_wakeup.dest_cpu = cpu;
+	ipi_wakeup.time_start = ktime_get();
+	smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
+}
+
+/*
+ * Timer based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a timer being armed and fired
+ */
+struct timer_data {
+	unsigned int src_cpu;
+	u64 timeout;
+	ktime_t time_start;
+	ktime_t time_end;
+	struct hrtimer timer;
+	u64 timeout_diff_ns;
+} timer_wakeup;
+
+static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
+{
+	struct timer_data *w;
+	ktime_t time_diff;
+
+	w = container_of(hrtimer, struct timer_data, timer);
+	w->time_end = ktime_get();
+
+	time_diff = ktime_sub(w->time_end, w->time_start);
+	time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
+	w->timeout_diff_ns = ktime_to_ns(time_diff);
+	return HRTIMER_NORESTART;
+}
+
+static void run_timer_test(unsigned int ns)
+{
+	hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
+		     HRTIMER_MODE_REL);
+	timer_wakeup.timer.function = timer_called;
+	timer_wakeup.src_cpu = smp_processor_id();
+	timer_wakeup.timeout = ns;
+	timer_wakeup.time_start = ktime_get();
+
+	hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+static struct dentry *dir;
+
+static int cpu_read_op(void *data, u64 *dest_cpu)
+{
+	*dest_cpu = ipi_wakeup.dest_cpu;
+	return 0;
+}
+
+static int cpu_write_op(void *data, u64 value)
+{
+	run_smp_call_function_test(value);
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
+
+static int timeout_read_op(void *data, u64 *timeout)
+{
+	*timeout = timer_wakeup.timeout;
+	return 0;
+}
+
+static int timeout_write_op(void *data, u64 value)
+{
+	run_timer_test(value);
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n");
+
+static int __init latency_init(void)
+{
+	struct dentry *temp;
+
+	dir = debugfs_create_dir("latency_test", 0);
+	if (!dir) {
+		pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n");
+		return -1;
+	}
+	temp = debugfs_create_file("ipi_cpu_dest",
+				   0666,
+				   dir,
+				   NULL,
+				   &ipi_ops);
+	if (!temp) {
+		pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n");
+		return -1;
+	}
+	debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns);
+	debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu);
+
+	temp = debugfs_create_file("timeout_expected_ns",
+				   0666,
+				   dir,
+				   NULL,
+				   &timeout_ops);
+	if (!temp) {
+		pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n");
+		return -1;
+	}
+	debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns);
+	debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu);
+	pr_info("Latency Test module loaded\n");
+	return 0;
+}
+
+static void __exit latency_cleanup(void)
+{
+	pr_info("Cleaning up Latency Test module.\n");
+	debugfs_remove_recursive(dir);
+}
+
+module_init(latency_init);
+module_exit(latency_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("IBM Corporation");
+MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers");
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2779c29d9981..60fa46a99a4f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1513,6 +1513,16 @@ config DEBUG_KOBJECT
 	  If you say Y here, some extra kobject debugging messages will be sent
 	  to the syslog.
 
+config IDLE_LATENCY_SELFTEST
+	tristate "Cpuidle latency selftests"
+	depends on CPU_IDLE
+	help
+	  This option provides a kernel module that runs tests using the IPI and
+	  timers to measure latency.
+
+	  Say M if you want these self tests to build as a module.
+	  Say N if you are unsure.
+
 config DEBUG_KOBJECT_RELEASE
 	bool "kobject release debugging"
 	depends on DEBUG_OBJECTS_TIMERS
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC v3 2/2] selftest/cpuidle: Add support for cpuidle latency measurement
  2021-04-04  8:33 [RFC v3 0/2] CPU-Idle latency selftest framework Pratik Rajesh Sampat
  2021-04-04  8:33 ` [RFC v3 1/2] cpuidle: Extract IPI based and timer based wakeup latency from idle states Pratik Rajesh Sampat
@ 2021-04-04  8:33 ` Pratik Rajesh Sampat
  2021-04-09  5:23 ` [RFC v3 0/2] CPU-Idle latency selftest framework Doug Smythies
  2 siblings, 0 replies; 6+ messages in thread
From: Pratik Rajesh Sampat @ 2021-04-04  8:33 UTC (permalink / raw)
  To: rjw, daniel.lezcano, shuah, dsmythies, ego, svaidy, linux-pm,
	linux-kernel, linux-kselftest, pratik.r.sampat, psampat

The cpuidle latency selftest provides support to systematically extract,
analyse and present IPI and timer based wakeup latencies for each CPU
and each idle state available on the system.

The selftest leverages test-cpuidle_latency module's debugfs interface
to interact and extract latency information from the kernel.

The selftest inserts the module if already not inserted, disables all
the idle states and enables them one by one testing the following:
1. Keeping source CPU constant, iterate through all the CPUS measuring
  IPI latency for baseline (CPU is busy with cat /dev/random > /dev/null
  workload) and then when the CPU is allowed to be at rest
2. Iterating through all the CPUs, sending expected timer durations to
  be equivalent to the residency of the deepest idle state enabled
  and extracting the difference in time between the time of wakeup and
  the expected timer duration

The timer based test produces run to run variance on some intel based
systems that sport a mechansim "C-state pre-wake" which can
pre-wake a CPU from an idle state when timers are armed. For systems and
architectures that don't have this mechansim can leverage timer tests
with the -i option.

To run this test specifically:
$ sudo make -C tools/testing/selftests TARGETS="cpuidle" run_tests

There are a few optional arguments too that the script can take
        [-h <help>]
        [-i <run timer tests>]
        [-m <location of the module>]
        [-o <location of the output>]
        [-v <verbose> (run on all cpus)]
Default Output location in: tools/testing/selftest/cpuidle/cpuidle.log

To run the test without re-compiling:
$ cd tools/testing/selftest/cpuidle/
$ sudo ./cpuidle.sh

Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com>
---
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/cpuidle/Makefile   |   6 +
 tools/testing/selftests/cpuidle/cpuidle.sh | 326 +++++++++++++++++++++
 tools/testing/selftests/cpuidle/settings   |   2 +
 4 files changed, 335 insertions(+)
 create mode 100644 tools/testing/selftests/cpuidle/Makefile
 create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
 create mode 100644 tools/testing/selftests/cpuidle/settings

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6c575cf34a71..561eb67a4839 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -7,6 +7,7 @@ TARGETS += cgroup
 TARGETS += clone3
 TARGETS += core
 TARGETS += cpufreq
+TARGETS += cpuidle
 TARGETS += cpu-hotplug
 TARGETS += drivers/dma-buf
 TARGETS += efivarfs
diff --git a/tools/testing/selftests/cpuidle/Makefile b/tools/testing/selftests/cpuidle/Makefile
new file mode 100644
index 000000000000..cbe13feced34
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := cpuidle.sh
+
+include ../lib.mk
\ No newline at end of file
diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh
new file mode 100755
index 000000000000..e3d0c853be8f
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/cpuidle.sh
@@ -0,0 +1,326 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+LOG=cpuidle.log
+MODULE=/lib/modules/$(uname -r)/kernel/drivers/cpuidle/test-cpuidle_latency.ko
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+SMT=$(lscpu | grep "Thread(s) per core" | awk '{print $4}')
+RUN_TIMER_TEST=0
+
+DISABLE=1
+ENABLE=0
+
+helpme()
+{
+	printf "Usage: $0 [-h] [-todg args]
+	[-h <help>]
+	[-i <run timer tests>]
+	[-m <location of the module>]
+	[-o <location of the output>]
+	[-v <verbose>]
+	\n"
+	exit 2
+}
+
+parse_arguments()
+{
+	while getopts ht:m:o:vt:it: arg
+	do
+		case $arg in
+			h) # --help
+				helpme
+				;;
+			i) # run timer tests
+				RUN_TIMER_TEST=1
+				;;
+			m) # --mod-file
+				MODULE=$OPTARG
+				;;
+			o) # output log files
+				LOG=$OPTARG
+				;;
+			v) # Verbose mode
+				SMT=1
+				;;
+			\?)
+				helpme
+				;;
+		esac
+	done
+}
+
+ins_mod()
+{
+	debugfs_file=/sys/kernel/debug/latency_test/ipi_latency_ns
+	# Check if the module is already loaded
+	if [ -f "$debugfs_file" ]; then
+		printf "Module already loaded\n\n"
+		return 0
+	fi
+	# Try to load the module
+	if [ ! -f "$MODULE" ]; then
+		printf "$MODULE module does not exist. Exitting\n"
+		exit $ksft_skip
+	fi
+	printf "Inserting $MODULE module\n\n"
+	insmod $MODULE
+	if [ $? != 0 ]; then
+		printf "Insmod $MODULE failed\n"
+		exit $ksft_skip
+	fi
+}
+
+compute_average()
+{
+	arr=("$@")
+	sum=0
+	size=${#arr[@]}
+	if [ $size == 0 ]; then
+		avg=0
+		return 1
+	fi
+	for i in "${arr[@]}"
+	do
+		sum=$((sum + i))
+	done
+	avg=$((sum/size))
+}
+
+cpu_is_online()
+{
+	cpu=$1
+	if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then
+		echo 1
+		return
+	fi
+	status=$(cat /sys/devices/system/cpu/cpu$cpu/online)
+	echo $status
+}
+
+# Perform operation on each CPU for the given state
+# $1 - Operation: enable (0) / disable (1)
+# $2 - State to enable
+op_state()
+{
+	for ((cpu=0; cpu<NUM_CPUS; cpu++))
+	do
+		local cpu_status=$(cpu_is_online $cpu)
+		if [ $cpu_status == 0 ]; then
+			continue
+		fi
+		echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable
+	done
+}
+
+cpuidle_enable_state()
+{
+	state=$1
+	op_state $ENABLE $state
+}
+
+cpuidle_disable_state()
+{
+	state=$1
+	op_state $DISABLE $state
+}
+
+# Enable/Disable all stop states for all CPUs
+# $1 - Operation: enable (0) / disable (1)
+op_cpuidle()
+{
+	for ((state=0; state<NUM_STATES; state++))
+	do
+		op_state $1 $state
+	done
+}
+
+# Extract latency in microseconds and convert to nanoseconds
+extract_latency()
+{
+	for ((state=0; state<NUM_STATES; state++))
+	do
+		latency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/latency) * 1000))
+		latency_arr+=($latency)
+	done
+}
+
+# Run the IPI test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+ipi_test_once()
+{
+	dest_cpu=$2
+	if [ "$1" = "baseline" ]; then
+		# Keep the CPU busy
+		taskset -c $dest_cpu cat /dev/random > /dev/null &
+		task_pid=$!
+		# Wait for the workload to achieve 100% CPU usage
+		sleep 1
+	fi
+	taskset 0x1 echo $dest_cpu > /sys/kernel/debug/latency_test/ipi_cpu_dest
+	ipi_latency=$(cat /sys/kernel/debug/latency_test/ipi_latency_ns)
+	src_cpu=$(cat /sys/kernel/debug/latency_test/ipi_cpu_src)
+	if [ "$1" = "baseline" ]; then
+		kill $task_pid
+		wait $task_pid 2>/dev/null
+	fi
+}
+
+# Incrementally Enable idle states one by one and compute the latency
+run_ipi_tests()
+{
+	extract_latency
+	# Disable idle states for CPUs
+	op_cpuidle $DISABLE
+
+	declare -a avg_arr
+	echo -e "--IPI Latency Test---" | tee -a $LOG
+
+	echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG
+	printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+	for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
+	do
+		local cpu_status=$(cpu_is_online $cpu)
+		if [ $cpu_status == 0 ]; then
+			continue
+		fi
+		ipi_test_once "baseline" $cpu
+		printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+		avg_arr+=($ipi_latency)
+	done
+	compute_average "${avg_arr[@]}"
+	echo -e "Baseline Avg IPI latency(ns): $avg" | tee -a $LOG
+
+	for ((state=0; state<NUM_STATES; state++))
+	do
+		unset avg_arr
+		echo -e "---Enabling state: $state---" >> $LOG
+		cpuidle_enable_state $state
+		printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+		for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
+		do
+			local cpu_status=$(cpu_is_online $cpu)
+			if [ $cpu_status == 0 ]; then
+				continue
+			fi
+			# Running IPI test and logging results
+			sleep 1
+			ipi_test_once "test" $cpu
+			printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+			avg_arr+=($ipi_latency)
+		done
+		compute_average "${avg_arr[@]}"
+		echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG
+		echo -e "Observed Avg IPI latency(ns) - State $state: $avg" | tee -a $LOG
+		cpuidle_disable_state $state
+	done
+}
+
+# Extract the residency in microseconds and convert to nanoseconds.
+# Add 100 ns so that the timer stays for a little longer than the residency
+extract_residency()
+{
+	for ((state=0; state<NUM_STATES; state++))
+	do
+		residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200))
+		residency_arr+=($residency)
+	done
+}
+
+# Run the Timeout test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+# $3 timeout
+timeout_test_once()
+{
+	dest_cpu=$2
+	if [ "$1" = "baseline" ]; then
+		# Keep the CPU busy
+		taskset -c $dest_cpu cat /dev/random > /dev/null &
+		task_pid=$!
+		# Wait for the workload to achieve 100% CPU usage
+		sleep 1
+	fi
+	taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns
+	# Wait for the result to populate
+	sleep 0.1
+	timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns)
+	src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src)
+	if [ "$1" = "baseline" ]; then
+		kill $task_pid
+		wait $task_pid 2>/dev/null
+	fi
+}
+
+run_timeout_tests()
+{
+	extract_residency
+	# Disable idle states for all CPUs
+	op_cpuidle $DISABLE
+
+	declare -a avg_arr
+	echo -e "\n--Timeout Latency Test--" | tee -a $LOG
+
+	echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG
+	printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG
+	for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
+	do
+		local cpu_status=$(cpu_is_online $cpu)
+		if [ $cpu_status == 0 ]; then
+			continue
+		fi
+		timeout_test_once "baseline" $cpu 1000000
+		printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG
+		avg_arr+=($timeout_diff)
+	done
+	compute_average "${avg_arr[@]}"
+	echo -e "Baseline Avg timeout diff(ns): $avg" | tee -a $LOG
+
+	for ((state=0; state<NUM_STATES; state++))
+	do
+		unset avg_arr
+		echo -e "---Enabling state: $state---" >> $LOG
+		cpuidle_enable_state $state
+		printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG
+		for ((cpu=0; cpu<NUM_CPUS; cpu+=SMT))
+		do
+			local cpu_status=$(cpu_is_online $cpu)
+			if [ $cpu_status == 0 ]; then
+				continue
+			fi
+			timeout_test_once "test" $cpu 1000000
+			printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG
+			avg_arr+=($timeout_diff)
+		done
+		compute_average "${avg_arr[@]}"
+		echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG
+		echo -e "Observed Avg timeout diff(ns) - State $state: $avg" | tee -a $LOG
+		cpuidle_disable_state $state
+	done
+}
+
+declare -a residency_arr
+declare -a latency_arr
+
+parse_arguments $@
+
+rm -f $LOG
+touch $LOG
+NUM_CPUS=$(nproc --all)
+NUM_STATES=$(ls -1 /sys/devices/system/cpu/cpu0/cpuidle/ | wc -l)
+
+ins_mod $MODULE
+
+run_ipi_tests
+if [ $RUN_TIMER_TEST == 1 ]; then
+	run_timeout_tests
+fi
+
+# Enable all idle states for all CPUs
+op_cpuidle $ENABLE
+printf "Removing $MODULE module\n"
+printf "Full Output logged at: $LOG\n"
+rmmod $MODULE
diff --git a/tools/testing/selftests/cpuidle/settings b/tools/testing/selftests/cpuidle/settings
new file mode 100644
index 000000000000..a26c38a70d77
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/settings
@@ -0,0 +1,2 @@
+timeout=0
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC v3 0/2] CPU-Idle latency selftest framework
  2021-04-04  8:33 [RFC v3 0/2] CPU-Idle latency selftest framework Pratik Rajesh Sampat
  2021-04-04  8:33 ` [RFC v3 1/2] cpuidle: Extract IPI based and timer based wakeup latency from idle states Pratik Rajesh Sampat
  2021-04-04  8:33 ` [RFC v3 2/2] selftest/cpuidle: Add support for cpuidle latency measurement Pratik Rajesh Sampat
@ 2021-04-09  5:23 ` Doug Smythies
  2021-04-09  7:43   ` Pratik Sampat
  2 siblings, 1 reply; 6+ messages in thread
From: Doug Smythies @ 2021-04-09  5:23 UTC (permalink / raw)
  To: Pratik Rajesh Sampat
  Cc: rjw, Daniel Lezcano, shuah, ego, svaidy, Linux PM list,
	Linux Kernel Mailing List, linux-kselftest, pratik.r.sampat

Hi Pratik,

I tried V3 on a Intel i5-10600K processor with 6 cores and 12 CPUs.
The core to cpu mappings are:
core 0 has cpus 0 and 6
core 1 has cpus 1 and 7
core 2 has cpus 2 and 8
core 3 has cpus 3 and 9
core 4 has cpus 4 and 10
core 5 has cpus 5 and 11

By default, it will test CPUs 0,2,4,6,10 on cores 0,2,4,0,2,4.
wouldn't it make more sense to test each core once?
With the source CPU always 0, I think the results from the results
from the destination CPUs 0 and 6, on core 0 bias the results, at
least in the deeper idle states. They don't make much difference in
the shallow states. Myself, I wouldn't include them in the results.
Example, where I used the -v option for all CPUs:

--IPI Latency Test---
--Baseline IPI Latency measurement: CPU Busy--
SRC_CPU   DEST_CPU IPI_Latency(ns)
0            0          101
0            1          790
0            2          609
0            3          595
0            4          737
0            5          759
0            6          780
0            7          741
0            8          574
0            9          681
0           10          527
0           11          552
Baseline Avg IPI latency(ns): 620  <<<< suggest 656 here
---Enabling state: 0---
SRC_CPU   DEST_CPU IPI_Latency(ns)
0            0           76
0            1          471
0            2          420
0            3          462
0            4          454
0            5          468
0            6          453
0            7          473
0            8          380
0            9          483
0           10          492
0           11          454
Expected IPI latency(ns): 0
Observed Avg IPI latency(ns) - State 0: 423 <<<<< suggest 456 here
---Enabling state: 1---
SRC_CPU   DEST_CPU IPI_Latency(ns)
0            0          112
0            1          866
0            2          663
0            3          851
0            4         1090
0            5         1314
0            6         1941
0            7         1458
0            8          687
0            9          802
0           10         1041
0           11         1284
Expected IPI latency(ns): 1000
Observed Avg IPI latency(ns) - State 1: 1009 <<<< suggest 1006 here
---Enabling state: 2---
SRC_CPU   DEST_CPU IPI_Latency(ns)
0            0           75
0            1        16362
0            2        16785
0            3        19650
0            4        17356
0            5        17606
0            6         2217
0            7        17958
0            8        17332
0            9        16615
0           10        17382
0           11        17423
Expected IPI latency(ns): 120000
Observed Avg IPI latency(ns) - State 2: 14730 <<<< suggest 17447 here
---Enabling state: 3---
SRC_CPU   DEST_CPU IPI_Latency(ns)
0            0          103
0            1        17416
0            2        17961
0            3        16651
0            4        17867
0            5        17726
0            6         2178
0            7        16620
0            8        20951
0            9        16567
0           10        17131
0           11        17563
Expected IPI latency(ns): 1034000
Observed Avg IPI latency(ns) - State 3: 14894 <<<< suggest 17645 here

Hope this helps.

... Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v3 0/2] CPU-Idle latency selftest framework
  2021-04-09  5:23 ` [RFC v3 0/2] CPU-Idle latency selftest framework Doug Smythies
@ 2021-04-09  7:43   ` Pratik Sampat
  2021-04-09 14:26     ` Doug Smythies
  0 siblings, 1 reply; 6+ messages in thread
From: Pratik Sampat @ 2021-04-09  7:43 UTC (permalink / raw)
  To: Doug Smythies
  Cc: rjw, Daniel Lezcano, shuah, ego, svaidy, Linux PM list,
	Linux Kernel Mailing List, linux-kselftest, pratik.r.sampat

Hello Doug,

On 09/04/21 10:53 am, Doug Smythies wrote:
> Hi Pratik,
>
> I tried V3 on a Intel i5-10600K processor with 6 cores and 12 CPUs.
> The core to cpu mappings are:
> core 0 has cpus 0 and 6
> core 1 has cpus 1 and 7
> core 2 has cpus 2 and 8
> core 3 has cpus 3 and 9
> core 4 has cpus 4 and 10
> core 5 has cpus 5 and 11
>
> By default, it will test CPUs 0,2,4,6,10 on cores 0,2,4,0,2,4.
> wouldn't it make more sense to test each core once?

Ideally it would be better to run on all the CPUs, however on larger systems
that I'm testing on with hundreds of cores and a high a thread count, the
execution time increases while not particularly bringing any additional
information to the table.

That is why it made sense only run on one of the threads of each core to make
the experiment faster while preserving accuracy.

To handle various thread topologies it maybe worthwhile if we parse
/sys/devices/system/cpu/cpuX/topology/thread_siblings_list for each core and
use this information to run only once per physical core, rather than
assuming the topology.

What are your thoughts on a mechanism like this?

> With the source CPU always 0, I think the results from the results
> from the destination CPUs 0 and 6, on core 0 bias the results, at
> least in the deeper idle states. They don't make much difference in
> the shallow states. Myself, I wouldn't include them in the results.

I agree, CPU0->CPU0 same core interaction is causing a bias. I could omit that
observation while computing the average.

In the verbose mode I'll omit all the threads of CPU0 and in the default
(quick) mode just CPU0's latency can be omitted while computing average.

Thank you,
Pratik

> Example, where I used the -v option for all CPUs:
>
> --IPI Latency Test---
> --Baseline IPI Latency measurement: CPU Busy--
> SRC_CPU   DEST_CPU IPI_Latency(ns)
> 0            0          101
> 0            1          790
> 0            2          609
> 0            3          595
> 0            4          737
> 0            5          759
> 0            6          780
> 0            7          741
> 0            8          574
> 0            9          681
> 0           10          527
> 0           11          552
> Baseline Avg IPI latency(ns): 620  <<<< suggest 656 here
> ---Enabling state: 0---
> SRC_CPU   DEST_CPU IPI_Latency(ns)
> 0            0           76
> 0            1          471
> 0            2          420
> 0            3          462
> 0            4          454
> 0            5          468
> 0            6          453
> 0            7          473
> 0            8          380
> 0            9          483
> 0           10          492
> 0           11          454
> Expected IPI latency(ns): 0
> Observed Avg IPI latency(ns) - State 0: 423 <<<<< suggest 456 here
> ---Enabling state: 1---
> SRC_CPU   DEST_CPU IPI_Latency(ns)
> 0            0          112
> 0            1          866
> 0            2          663
> 0            3          851
> 0            4         1090
> 0            5         1314
> 0            6         1941
> 0            7         1458
> 0            8          687
> 0            9          802
> 0           10         1041
> 0           11         1284
> Expected IPI latency(ns): 1000
> Observed Avg IPI latency(ns) - State 1: 1009 <<<< suggest 1006 here
> ---Enabling state: 2---
> SRC_CPU   DEST_CPU IPI_Latency(ns)
> 0            0           75
> 0            1        16362
> 0            2        16785
> 0            3        19650
> 0            4        17356
> 0            5        17606
> 0            6         2217
> 0            7        17958
> 0            8        17332
> 0            9        16615
> 0           10        17382
> 0           11        17423
> Expected IPI latency(ns): 120000
> Observed Avg IPI latency(ns) - State 2: 14730 <<<< suggest 17447 here
> ---Enabling state: 3---
> SRC_CPU   DEST_CPU IPI_Latency(ns)
> 0            0          103
> 0            1        17416
> 0            2        17961
> 0            3        16651
> 0            4        17867
> 0            5        17726
> 0            6         2178
> 0            7        16620
> 0            8        20951
> 0            9        16567
> 0           10        17131
> 0           11        17563
> Expected IPI latency(ns): 1034000
> Observed Avg IPI latency(ns) - State 3: 14894 <<<< suggest 17645 here
>
> Hope this helps.
>
> ... Doug


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v3 0/2] CPU-Idle latency selftest framework
  2021-04-09  7:43   ` Pratik Sampat
@ 2021-04-09 14:26     ` Doug Smythies
  0 siblings, 0 replies; 6+ messages in thread
From: Doug Smythies @ 2021-04-09 14:26 UTC (permalink / raw)
  To: Pratik Sampat
  Cc: rjw, Daniel Lezcano, shuah, ego, svaidy, Linux PM list,
	Linux Kernel Mailing List, linux-kselftest, pratik.r.sampat,
	dsmythies

On Fri, Apr 9, 2021 at 12:43 AM Pratik Sampat <psampat@linux.ibm.com> wrote:
> On 09/04/21 10:53 am, Doug Smythies wrote:
> > I tried V3 on a Intel i5-10600K processor with 6 cores and 12 CPUs.
> > The core to cpu mappings are:
> > core 0 has cpus 0 and 6
> > core 1 has cpus 1 and 7
> > core 2 has cpus 2 and 8
> > core 3 has cpus 3 and 9
> > core 4 has cpus 4 and 10
> > core 5 has cpus 5 and 11
> >
> > By default, it will test CPUs 0,2,4,6,10 on cores 0,2,4,0,2,4.
> > wouldn't it make more sense to test each core once?
>
> Ideally it would be better to run on all the CPUs, however on larger systems
> that I'm testing on with hundreds of cores and a high a thread count, the
> execution time increases while not particularly bringing any additional
> information to the table.
>
> That is why it made sense only run on one of the threads of each core to make
> the experiment faster while preserving accuracy.
>
> To handle various thread topologies it maybe worthwhile if we parse
> /sys/devices/system/cpu/cpuX/topology/thread_siblings_list for each core and
> use this information to run only once per physical core, rather than
> assuming the topology.
>
> What are your thoughts on a mechanism like this?

Yes, seems like a good solution.

... Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-09 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-04  8:33 [RFC v3 0/2] CPU-Idle latency selftest framework Pratik Rajesh Sampat
2021-04-04  8:33 ` [RFC v3 1/2] cpuidle: Extract IPI based and timer based wakeup latency from idle states Pratik Rajesh Sampat
2021-04-04  8:33 ` [RFC v3 2/2] selftest/cpuidle: Add support for cpuidle latency measurement Pratik Rajesh Sampat
2021-04-09  5:23 ` [RFC v3 0/2] CPU-Idle latency selftest framework Doug Smythies
2021-04-09  7:43   ` Pratik Sampat
2021-04-09 14:26     ` Doug Smythies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).