linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] selftests/resctrl: Fixes to failing tests
@ 2023-09-11 11:19 Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal Ilpo Järvinen
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

Fix three issues with resctrl selftests.

The signal handling fix became necessary after the mount/umount fixes.

The other two came up when I ran resctrl selftests across the server
fleet in our lab to validate the upcoming CAT test rewrite (the rewrite
is not part of this series).

These are developed and should apply cleanly at least on top the
benchmark cleanup series (might apply cleanly also w/o the benchmark
series, I didn't test).

Ilpo Järvinen (5):
  selftests/resctrl: Extend signal handler coverage to unmount on
    receiving signal
  selftests/resctrl: Remove duplicate feature check from CMT test
  selftests/resctrl: Refactor feature check to use resource and feature
    name
  selftests/resctrl: Fix feature checks
  selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

 tools/testing/selftests/resctrl/cat_test.c    |  8 ---
 tools/testing/selftests/resctrl/cmt_test.c    |  3 -
 tools/testing/selftests/resctrl/mba_test.c    |  2 +-
 tools/testing/selftests/resctrl/mbm_test.c    |  2 +-
 tools/testing/selftests/resctrl/resctrl.h     |  6 +-
 .../testing/selftests/resctrl/resctrl_tests.c | 37 ++++++++--
 tools/testing/selftests/resctrl/resctrl_val.c | 22 +++---
 tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
 8 files changed, 73 insertions(+), 76 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
@ 2023-09-11 11:19 ` Ilpo Järvinen
  2023-09-12 22:06   ` Reinette Chatre
  2023-09-11 11:19 ` [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test Ilpo Järvinen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

Unmounting resctrl FS has been moved into the per test functions in
resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
SIGTERM, or SIGHUP) is received, the running selftest is aborted by
ctrlc_handler() which then unmounts resctrl fs before exiting. The
current section between signal_handler_register() and
signal_handler_unregister(), however, does not cover the entire
duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call
into the test functions in resctrl_tests.c to properly unmount resctrl
fs. Adjust child process kill() call in ctrlc_handler() to only be
invoked if the child was already forked.

Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
---
 tools/testing/selftests/resctrl/cat_test.c    |  8 -------
 .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
 tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index 97b87285ab2a..224ba8544d8a 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 		strcpy(param.filename, RESULT_FILE_NAME1);
 		param.num_of_runs = 0;
 		param.cpu_no = sibling_cpu_no;
-	} else {
-		ret = signal_handler_register();
-		if (ret) {
-			kill(bm_pid, SIGKILL);
-			goto out;
-		}
 	}
 
 	remove(param.filename);
@@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
 		}
 		close(pipefd[0]);
 		kill(bm_pid, SIGKILL);
-		signal_handler_unregister();
 	}
 
-out:
 	cat_test_cleanup();
 
 	return ret;
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index 823672a20a43..3d66fbdc2df3 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
 
 	ksft_print_msg("Starting MBM BW change ...\n");
 
+	res = signal_handler_register();
+	if (res)
+		return;
+
 	res = mount_resctrlfs();
 	if (res) {
+		signal_handler_unregister();
 		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
 		return;
 	}
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
 
 umount:
 	umount_resctrlfs();
+	signal_handler_unregister();
 }
 
 static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
@@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
 
 	ksft_print_msg("Starting MBA Schemata change ...\n");
 
+	res = signal_handler_register();
+	if (res)
+		return;
+
 	res = mount_resctrlfs();
 	if (res) {
+		signal_handler_unregister();
 		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
 		return;
 	}
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
 
 umount:
 	umount_resctrlfs();
+	signal_handler_unregister();
 }
 
 static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
@@ -123,8 +135,13 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
 
 	ksft_print_msg("Starting CMT test ...\n");
 
+	res = signal_handler_register();
+	if (res)
+		return;
+
 	res = mount_resctrlfs();
 	if (res) {
+		signal_handler_unregister();
 		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
 		return;
 	}
@@ -141,6 +158,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
 
 umount:
 	umount_resctrlfs();
+	signal_handler_unregister();
 }
 
 static void run_cat_test(int cpu_no, int no_of_bits)
@@ -149,8 +167,13 @@ static void run_cat_test(int cpu_no, int no_of_bits)
 
 	ksft_print_msg("Starting CAT test ...\n");
 
+	res = signal_handler_register();
+	if (res)
+		return;
+
 	res = mount_resctrlfs();
 	if (res) {
+		signal_handler_unregister();
 		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
 		return;
 	}
@@ -165,6 +188,7 @@ static void run_cat_test(int cpu_no, int no_of_bits)
 
 umount:
 	umount_resctrlfs();
+	signal_handler_unregister();
 }
 
 int main(int argc, char **argv)
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 51963a6f2186..a9fe61133119 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -468,7 +468,9 @@ pid_t bm_pid, ppid;
 
 void ctrlc_handler(int signum, siginfo_t *info, void *ptr)
 {
-	kill(bm_pid, SIGKILL);
+	/* Only kill child after bm_pid is set after fork() */
+	if (bm_pid)
+		kill(bm_pid, SIGKILL);
 	umount_resctrlfs();
 	tests_cleanup();
 	ksft_print_msg("Ending\n\n");
@@ -485,6 +487,8 @@ int signal_handler_register(void)
 	struct sigaction sigact;
 	int ret = 0;
 
+	bm_pid = 0;
+
 	sigact.sa_sigaction = ctrlc_handler;
 	sigemptyset(&sigact.sa_mask);
 	sigact.sa_flags = SA_SIGINFO;
@@ -706,10 +710,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 
 	ksft_print_msg("Benchmark PID: %d\n", bm_pid);
 
-	ret = signal_handler_register();
-	if (ret)
-		goto out;
-
 	/*
 	 * The cast removes constness but nothing mutates benchmark_cmd within
 	 * the context of this process. At the receiving process, it becomes
@@ -721,19 +721,19 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 	/* Taskset benchmark to specified cpu */
 	ret = taskset_benchmark(bm_pid, param->cpu_no);
 	if (ret)
-		goto unregister;
+		goto out;
 
 	/* Write benchmark to specified control&monitoring grp in resctrl FS */
 	ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
 				      resctrl_val);
 	if (ret)
-		goto unregister;
+		goto out;
 
 	if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
 	    !strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
 		ret = initialize_mem_bw_imc();
 		if (ret)
-			goto unregister;
+			goto out;
 
 		initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp,
 					  param->cpu_no, resctrl_val);
@@ -748,7 +748,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 		    sizeof(pipe_message)) {
 			perror("# failed reading message from child process");
 			close(pipefd[0]);
-			goto unregister;
+			goto out;
 		}
 	}
 	close(pipefd[0]);
@@ -757,7 +757,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 	if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
 		perror("# sigqueue SIGUSR1 to child");
 		ret = errno;
-		goto unregister;
+		goto out;
 	}
 
 	/* Give benchmark enough time to fully run */
@@ -786,8 +786,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 		}
 	}
 
-unregister:
-	signal_handler_unregister();
 out:
 	kill(bm_pid, SIGKILL);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal Ilpo Järvinen
@ 2023-09-11 11:19 ` Ilpo Järvinen
  2023-09-12 22:06   ` Reinette Chatre
  2023-09-11 11:19 ` [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name Ilpo Järvinen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

The test runner run_cmt_test() in resctrl_tests.c checks for CMT
feature and does not run cmt_resctrl_val() if CMT is not supported.
Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
---
 tools/testing/selftests/resctrl/cmt_test.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index cf2f5e92dea6..50bdbce9fba9 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -80,9 +80,6 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
 	size_t span;
 	int ret, i;
 
-	if (!validate_resctrl_feature_request(CMT_STR))
-		return -1;
-
 	ret = get_cbm_mask("L3", cbm_mask);
 	if (ret)
 		return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name
  2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test Ilpo Järvinen
@ 2023-09-11 11:19 ` Ilpo Järvinen
  2023-09-12 22:09   ` Reinette Chatre
  2023-09-11 11:19 ` [PATCH 4/5] selftests/resctrl: Fix feature checks Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests Ilpo Järvinen
  4 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

Feature check in validate_resctrl_feature_request() takes in the test
name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request()
directly rather than deriving them from the test name inside the
function which makes the feature check easier to extend for new test
cases.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
---
 tools/testing/selftests/resctrl/resctrl.h     |  6 +-
 .../testing/selftests/resctrl/resctrl_tests.c | 10 +--
 tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index dd07463cdf48..89ced4152933 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -28,10 +28,6 @@
 #define RESCTRL_PATH		"/sys/fs/resctrl"
 #define PHYS_ID_PATH		"/sys/devices/system/cpu/cpu"
 #define INFO_PATH		"/sys/fs/resctrl/info"
-#define L3_PATH			"/sys/fs/resctrl/info/L3"
-#define MB_PATH			"/sys/fs/resctrl/info/MB"
-#define L3_MON_PATH		"/sys/fs/resctrl/info/L3_MON"
-#define L3_MON_FEATURES_PATH	"/sys/fs/resctrl/info/L3_MON/mon_features"
 
 #define ARCH_INTEL     1
 #define ARCH_AMD       2
@@ -88,7 +84,7 @@ int get_resource_id(int cpu_no, int *resource_id);
 int mount_resctrlfs(void);
 int umount_resctrlfs(void);
 int validate_bw_report_request(char *bw_report);
-bool validate_resctrl_feature_request(const char *resctrl_val);
+bool validate_resctrl_feature_request(const char *resource, const char *feature);
 char *fgrep(FILE *inf, const char *str);
 int taskset_benchmark(pid_t bm_pid, int cpu_no);
 void run_benchmark(int signum, siginfo_t *info, void *ucontext);
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index 3d66fbdc2df3..3052394ca884 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -84,7 +84,9 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) {
+	if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") ||
+	    !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
+	    (get_vendor() != ARCH_INTEL)) {
 		ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n");
 		goto umount;
 	}
@@ -116,7 +118,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) {
+	if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) {
 		ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n");
 		goto umount;
 	}
@@ -146,7 +148,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request(CMT_STR)) {
+	if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) {
 		ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n");
 		goto umount;
 	}
@@ -178,7 +180,7 @@ static void run_cat_test(int cpu_no, int no_of_bits)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request(CAT_STR)) {
+	if (!validate_resctrl_feature_request("L3", NULL)) {
 		ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n");
 		goto umount;
 	}
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
index bd36ee206602..bd547a10791c 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -10,6 +10,8 @@
  */
 #include "resctrl.h"
 
+#include <limits.h>
+
 static int find_resctrl_mount(char *buffer)
 {
 	FILE *mounts;
@@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)
 
 /*
  * validate_resctrl_feature_request - Check if requested feature is valid.
- * @resctrl_val:	Requested feature
+ * @resource:	Required resource (e.g., MB, L3, L2, L3_MON, etc.)
+ * @feature:	Feature to be checked under resource (can be NULL). This path
+ *		is relative to the resource path.
  *
- * Return: True if the feature is supported, else false. False is also
- *         returned if resctrl FS is not mounted.
+ * Return: True if the resource/feature is supported, else false. False is
+ *         also returned if resctrl FS is not mounted.
  */
-bool validate_resctrl_feature_request(const char *resctrl_val)
+bool validate_resctrl_feature_request(const char *resource, const char *feature)
 {
 	struct stat statbuf;
-	bool found = false;
+	char res_path[PATH_MAX];
 	char *res;
 	FILE *inf;
 	int ret;
 
-	if (!resctrl_val)
+	if (!resource)
 		return false;
 
 	ret = find_resctrl_mount(NULL);
 	if (ret)
 		return false;
 
-	if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
-		if (!stat(L3_PATH, &statbuf))
-			return true;
-	} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
-		if (!stat(MB_PATH, &statbuf))
-			return true;
-	} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
-		   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
-		if (!stat(L3_MON_PATH, &statbuf)) {
-			inf = fopen(L3_MON_FEATURES_PATH, "r");
-			if (!inf)
-				return false;
-
-			if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
-				res = fgrep(inf, "llc_occupancy");
-				if (res) {
-					found = true;
-					free(res);
-				}
-			}
-
-			if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
-				res = fgrep(inf, "mbm_total_bytes");
-				if (res) {
-					free(res);
-					res = fgrep(inf, "mbm_local_bytes");
-					if (res) {
-						found = true;
-						free(res);
-					}
-				}
-			}
-			fclose(inf);
-		}
-	}
+	snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);
+
+	if (stat(res_path, &statbuf))
+		return false;
+
+	if (!feature)
+		return true;
+
+	snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
+	inf = fopen(res_path, "r");
+	if (!inf)
+		return false;
+
+	res = fgrep(inf, feature);
+	free(res);
+	fclose(inf);
 
-	return found;
+	return res;
 }
 
 int filter_dmesg(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] selftests/resctrl: Fix feature checks
  2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
                   ` (2 preceding siblings ...)
  2023-09-11 11:19 ` [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name Ilpo Järvinen
@ 2023-09-11 11:19 ` Ilpo Järvinen
  2023-09-11 11:19 ` [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests Ilpo Järvinen
  4 siblings, 0 replies; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

The MBA and CMT tests expect support of other features to be able to
run.

When platform only supports MBA but not MBM, MBA test will fail with:
Failed to open total bw file: No such file or directory

When platform only supports CMT but not CAT, CMT test will fail with:
Failed to open bit mask file '/sys/fs/resctrl/info/L3/cbm_mask': No such file or directory

Extend feature checks to cover these two conditions.

Fixes: ee0415681eb6 ("selftests/resctrl: Use resctrl/info for feature detection")
Cc: <stable@vger.kernel.org> # selftests/resctrl: Refactor feature check to use resource and feature name
Cc: <stable@vger.kernel.org> # selftests/resctrl: Remove duplicate feature check from CMT test
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/resctrl_tests.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index 3052394ca884..4e9b392d28dc 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -118,7 +118,9 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) {
+	if (!validate_resctrl_feature_request("MB", NULL) ||
+	    !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
+	    (get_vendor() != ARCH_INTEL)) {
 		ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n");
 		goto umount;
 	}
@@ -148,7 +150,8 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
 		return;
 	}
 
-	if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) {
+	if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy") ||
+	    !validate_resctrl_feature_request("L3", NULL)) {
 		ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n");
 		goto umount;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
  2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
                   ` (3 preceding siblings ...)
  2023-09-11 11:19 ` [PATCH 4/5] selftests/resctrl: Fix feature checks Ilpo Järvinen
@ 2023-09-11 11:19 ` Ilpo Järvinen
  2023-09-12 22:10   ` Reinette Chatre
  4 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 11:19 UTC (permalink / raw)
  To: Reinette Chatre, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable, Ilpo Järvinen

5% difference upper bound for success is a bit on the low side for the
MBA and MBM tests. Some platforms produce outliers that are slightly
above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures
due those frequent outliers.

Fixes: 06bd03a57f8c ("selftests/resctrl: Fix MBA/MBM results reporting format")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/mba_test.c | 2 +-
 tools/testing/selftests/resctrl/mbm_test.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index cf8284dadcb2..d3bf4368341e 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -12,7 +12,7 @@
 
 #define RESULT_FILE_NAME	"result_mba"
 #define NUM_OF_RUNS		5
-#define MAX_DIFF_PERCENT	5
+#define MAX_DIFF_PERCENT	8
 #define ALLOCATION_MAX		100
 #define ALLOCATION_MIN		10
 #define ALLOCATION_STEP		10
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 1ae131a2e246..d3c0d30c676a 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -11,7 +11,7 @@
 #include "resctrl.h"
 
 #define RESULT_FILE_NAME	"result_mbm"
-#define MAX_DIFF_PERCENT	5
+#define MAX_DIFF_PERCENT	8
 #define NUM_OF_RUNS		5
 
 static int
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-11 11:19 ` [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal Ilpo Järvinen
@ 2023-09-12 22:06   ` Reinette Chatre
  2023-09-13 10:01     ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-12 22:06 UTC (permalink / raw)
  To: Ilpo Järvinen, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> Unmounting resctrl FS has been moved into the per test functions in
> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
> ctrlc_handler() which then unmounts resctrl fs before exiting. The
> current section between signal_handler_register() and
> signal_handler_unregister(), however, does not cover the entire
> duration when resctrl FS is mounted.
> 
> Move signal_handler_register() and signal_handler_unregister() call
> into the test functions in resctrl_tests.c to properly unmount resctrl
> fs. Adjust child process kill() call in ctrlc_handler() to only be
> invoked if the child was already forked.

Thank you for catching this.

> 
> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Cc: <stable@vger.kernel.org>
> ---
>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
>  3 files changed, 34 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> index 97b87285ab2a..224ba8544d8a 100644
> --- a/tools/testing/selftests/resctrl/cat_test.c
> +++ b/tools/testing/selftests/resctrl/cat_test.c
> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>  		strcpy(param.filename, RESULT_FILE_NAME1);
>  		param.num_of_runs = 0;
>  		param.cpu_no = sibling_cpu_no;
> -	} else {
> -		ret = signal_handler_register();
> -		if (ret) {
> -			kill(bm_pid, SIGKILL);
> -			goto out;
> -		}
>  	}
>  
>  	remove(param.filename);
> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>  		}
>  		close(pipefd[0]);
>  		kill(bm_pid, SIGKILL);
> -		signal_handler_unregister();
>  	}
>  
> -out:
>  	cat_test_cleanup();
>  
>  	return ret;
> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> index 823672a20a43..3d66fbdc2df3 100644
> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>  
>  	ksft_print_msg("Starting MBM BW change ...\n");
>  
> +	res = signal_handler_register();
> +	if (res)
> +		return;
> +
>  	res = mount_resctrlfs();
>  	if (res) {
> +		signal_handler_unregister();
>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>  		return;
>  	}
> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>  
>  umount:
>  	umount_resctrlfs();
> +	signal_handler_unregister();
>  }
>  
>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>  
>  	ksft_print_msg("Starting MBA Schemata change ...\n");
>  
> +	res = signal_handler_register();
> +	if (res)
> +		return;
> +
>  	res = mount_resctrlfs();
>  	if (res) {
> +		signal_handler_unregister();
>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>  		return;
>  	}
> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>  
>  umount:
>  	umount_resctrlfs();
> +	signal_handler_unregister();
>  }
>  

This adds more duplicated code for every test. Have you considered a
single test setup function that can be used to mount resctrl FS and setup
the signal handler paired with a single test teardown function?

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-11 11:19 ` [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test Ilpo Järvinen
@ 2023-09-12 22:06   ` Reinette Chatre
  2023-09-13 11:11     ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-12 22:06 UTC (permalink / raw)
  To: Ilpo Järvinen, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> The test runner run_cmt_test() in resctrl_tests.c checks for CMT
> feature and does not run cmt_resctrl_val() if CMT is not supported.
> Then cmt_resctrl_val() also check is CMT is supported.
> 
> Remove the duplicated feature check for CMT from cmt_resctrl_val().
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Cc: <stable@vger.kernel.org>

This does not look like stable material to me. 

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name
  2023-09-11 11:19 ` [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name Ilpo Järvinen
@ 2023-09-12 22:09   ` Reinette Chatre
  2023-09-13 11:02     ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-12 22:09 UTC (permalink / raw)
  To: Ilpo Järvinen, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> Feature check in validate_resctrl_feature_request() takes in the test
> name string and maps that to what to check per test.
> 
> Pass resource and feature names to validate_resctrl_feature_request()
> directly rather than deriving them from the test name inside the
> function which makes the feature check easier to extend for new test
> cases.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Cc: <stable@vger.kernel.org>

This does not seem to be stable material.

> ---
>  tools/testing/selftests/resctrl/resctrl.h     |  6 +-
>  .../testing/selftests/resctrl/resctrl_tests.c | 10 +--
>  tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
>  3 files changed, 34 insertions(+), 51 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index dd07463cdf48..89ced4152933 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -28,10 +28,6 @@
>  #define RESCTRL_PATH		"/sys/fs/resctrl"
>  #define PHYS_ID_PATH		"/sys/devices/system/cpu/cpu"
>  #define INFO_PATH		"/sys/fs/resctrl/info"
> -#define L3_PATH			"/sys/fs/resctrl/info/L3"
> -#define MB_PATH			"/sys/fs/resctrl/info/MB"
> -#define L3_MON_PATH		"/sys/fs/resctrl/info/L3_MON"
> -#define L3_MON_FEATURES_PATH	"/sys/fs/resctrl/info/L3_MON/mon_features"
>  
>  #define ARCH_INTEL     1
>  #define ARCH_AMD       2
> @@ -88,7 +84,7 @@ int get_resource_id(int cpu_no, int *resource_id);
>  int mount_resctrlfs(void);
>  int umount_resctrlfs(void);
>  int validate_bw_report_request(char *bw_report);
> -bool validate_resctrl_feature_request(const char *resctrl_val);
> +bool validate_resctrl_feature_request(const char *resource, const char *feature);
>  char *fgrep(FILE *inf, const char *str);
>  int taskset_benchmark(pid_t bm_pid, int cpu_no);
>  void run_benchmark(int signum, siginfo_t *info, void *ucontext);
> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> index 3d66fbdc2df3..3052394ca884 100644
> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> @@ -84,7 +84,9 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>  		return;
>  	}
>  
> -	if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) {
> +	if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") ||
> +	    !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
> +	    (get_vendor() != ARCH_INTEL)) {
>  		ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n");
>  		goto umount;
>  	}
> @@ -116,7 +118,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>  		return;
>  	}
>  
> -	if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) {
> +	if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) {
>  		ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n");
>  		goto umount;
>  	}
> @@ -146,7 +148,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
>  		return;
>  	}
>  
> -	if (!validate_resctrl_feature_request(CMT_STR)) {
> +	if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) {
>  		ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n");
>  		goto umount;
>  	}
> @@ -178,7 +180,7 @@ static void run_cat_test(int cpu_no, int no_of_bits)
>  		return;
>  	}
>  
> -	if (!validate_resctrl_feature_request(CAT_STR)) {
> +	if (!validate_resctrl_feature_request("L3", NULL)) {
>  		ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n");
>  		goto umount;
>  	}
> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
> index bd36ee206602..bd547a10791c 100644
> --- a/tools/testing/selftests/resctrl/resctrlfs.c
> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> @@ -10,6 +10,8 @@
>   */
>  #include "resctrl.h"
>  
> +#include <limits.h>
> +

Could you please include <limits.h> before the local resctrl.h?

>  static int find_resctrl_mount(char *buffer)
>  {
>  	FILE *mounts;
> @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)
>  
>  /*
>   * validate_resctrl_feature_request - Check if requested feature is valid.
> - * @resctrl_val:	Requested feature
> + * @resource:	Required resource (e.g., MB, L3, L2, L3_MON, etc.)
> + * @feature:	Feature to be checked under resource (can be NULL). This path
> + *		is relative to the resource path.

I do not think "this path" is accurate. @feature is not a path but an entry
within the mon_features file.

Also please note that mon_features only exists for L3_MON, none of the other
listed resources have an associated mon_features file in resctrl. This
function is created to be generic has specific requirements on what
valid (never checked) parameters should be. This may be ok with the usage
but it should not pretend to be generic.

>   *
> - * Return: True if the feature is supported, else false. False is also
> - *         returned if resctrl FS is not mounted.
> + * Return: True if the resource/feature is supported, else false. False is
> + *         also returned if resctrl FS is not mounted.
>   */
> -bool validate_resctrl_feature_request(const char *resctrl_val)
> +bool validate_resctrl_feature_request(const char *resource, const char *feature)
>  {
>  	struct stat statbuf;
> -	bool found = false;
> +	char res_path[PATH_MAX];

Please maintain reverse fir order.

>  	char *res;
>  	FILE *inf;
>  	int ret;
>  
> -	if (!resctrl_val)
> +	if (!resource)
>  		return false;
>  
>  	ret = find_resctrl_mount(NULL);
>  	if (ret)
>  		return false;
>  
> -	if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
> -		if (!stat(L3_PATH, &statbuf))
> -			return true;
> -	} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
> -		if (!stat(MB_PATH, &statbuf))
> -			return true;
> -	} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
> -		   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
> -		if (!stat(L3_MON_PATH, &statbuf)) {
> -			inf = fopen(L3_MON_FEATURES_PATH, "r");
> -			if (!inf)
> -				return false;
> -
> -			if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
> -				res = fgrep(inf, "llc_occupancy");
> -				if (res) {
> -					found = true;
> -					free(res);
> -				}
> -			}
> -
> -			if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
> -				res = fgrep(inf, "mbm_total_bytes");
> -				if (res) {
> -					free(res);
> -					res = fgrep(inf, "mbm_local_bytes");
> -					if (res) {
> -						found = true;
> -						free(res);
> -					}
> -				}
> -			}
> -			fclose(inf);
> -		}
> -	}
> +	snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);
> +
> +	if (stat(res_path, &statbuf))
> +		return false;
> +
> +	if (!feature)
> +		return true;
> +
> +	snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
> +	inf = fopen(res_path, "r");
> +	if (!inf)
> +		return false;
> +
> +	res = fgrep(inf, feature);
> +	free(res);
> +	fclose(inf);
>  
> -	return found;
> +	return res;

This is unexpected. Function should return bool but instead returns a char * that
has been freed?

>  }
>  
>  int filter_dmesg(void)


Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
  2023-09-11 11:19 ` [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests Ilpo Järvinen
@ 2023-09-12 22:10   ` Reinette Chatre
  2023-09-13 11:43     ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-12 22:10 UTC (permalink / raw)
  To: Ilpo Järvinen, Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman
  Cc: linux-kernel, Shaopeng Tan, stable

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> 5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

> MBA and MBM tests. Some platforms produce outliers that are slightly
> above that, typically 6-7%.
> 
> Relaxing the MBA/MBM success bound to 8% removes most of the failures
> due those frequent outliers.

This description needs more context on what issue is being solved here.
What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From
what I understand these tests measure memory bandwidth using perf and
resctrl and then compare the difference. Are there interesting things
about the platforms on which the difference is higher than 5%? Could
those be systems with multiple sockets (and thus multiple PMUs that need
to be setup, reset, and read)? Can the reading of the counters be improved
instead of relaxing the success criteria? A quick comparison between
get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
is not surprising ... note how the PMU counters are started and reset
(potentially on multiple sockets) at every iteration while the resctrl
counters keep rolling and new values are just subtracted from previous.

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-12 22:06   ` Reinette Chatre
@ 2023-09-13 10:01     ` Ilpo Järvinen
  2023-09-13 20:58       ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-13 10:01 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 4186 bytes --]

On Tue, 12 Sep 2023, Reinette Chatre wrote:
> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> > Unmounting resctrl FS has been moved into the per test functions in
> > resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
> > resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
> > SIGTERM, or SIGHUP) is received, the running selftest is aborted by
> > ctrlc_handler() which then unmounts resctrl fs before exiting. The
> > current section between signal_handler_register() and
> > signal_handler_unregister(), however, does not cover the entire
> > duration when resctrl FS is mounted.
> > 
> > Move signal_handler_register() and signal_handler_unregister() call
> > into the test functions in resctrl_tests.c to properly unmount resctrl
> > fs. Adjust child process kill() call in ctrlc_handler() to only be
> > invoked if the child was already forked.
> 
> Thank you for catching this.
> 
> > 
> > Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> >  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
> >  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
> >  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
> >  3 files changed, 34 insertions(+), 20 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> > index 97b87285ab2a..224ba8544d8a 100644
> > --- a/tools/testing/selftests/resctrl/cat_test.c
> > +++ b/tools/testing/selftests/resctrl/cat_test.c
> > @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >  		strcpy(param.filename, RESULT_FILE_NAME1);
> >  		param.num_of_runs = 0;
> >  		param.cpu_no = sibling_cpu_no;
> > -	} else {
> > -		ret = signal_handler_register();
> > -		if (ret) {
> > -			kill(bm_pid, SIGKILL);
> > -			goto out;
> > -		}
> >  	}
> >  
> >  	remove(param.filename);
> > @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >  		}
> >  		close(pipefd[0]);
> >  		kill(bm_pid, SIGKILL);
> > -		signal_handler_unregister();
> >  	}
> >  
> > -out:
> >  	cat_test_cleanup();
> >  
> >  	return ret;
> > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> > index 823672a20a43..3d66fbdc2df3 100644
> > --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> > @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >  
> >  	ksft_print_msg("Starting MBM BW change ...\n");
> >  
> > +	res = signal_handler_register();
> > +	if (res)
> > +		return;
> > +
> >  	res = mount_resctrlfs();
> >  	if (res) {
> > +		signal_handler_unregister();
> >  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >  		return;
> >  	}
> > @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >  
> >  umount:
> >  	umount_resctrlfs();
> > +	signal_handler_unregister();
> >  }
> >  
> >  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> > @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >  
> >  	ksft_print_msg("Starting MBA Schemata change ...\n");
> >  
> > +	res = signal_handler_register();
> > +	if (res)
> > +		return;
> > +
> >  	res = mount_resctrlfs();
> >  	if (res) {
> > +		signal_handler_unregister();
> >  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >  		return;
> >  	}
> > @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >  
> >  umount:
> >  	umount_resctrlfs();
> > +	signal_handler_unregister();
> >  }
> >  
> 
> This adds more duplicated code for every test. Have you considered a
> single test setup function that can be used to mount resctrl FS and setup
> the signal handler paired with a single test teardown function?

Yes. Consolidating all these is among my not-yet submitted patches.
I just had to do a backport-friendly Fixes patch first for this.

-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name
  2023-09-12 22:09   ` Reinette Chatre
@ 2023-09-13 11:02     ` Ilpo Järvinen
  2023-09-13 20:59       ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-13 11:02 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 5802 bytes --]

On Tue, 12 Sep 2023, Reinette Chatre wrote:
> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> > Feature check in validate_resctrl_feature_request() takes in the test
> > name string and maps that to what to check per test.
> > 
> > Pass resource and feature names to validate_resctrl_feature_request()
> > directly rather than deriving them from the test name inside the
> > function which makes the feature check easier to extend for new test
> > cases.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org>
> 
> This does not seem to be stable material.

Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as 
shown by the tags there.

> > ---
> >  tools/testing/selftests/resctrl/resctrl.h     |  6 +-
> >  .../testing/selftests/resctrl/resctrl_tests.c | 10 +--
> >  tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
> >  3 files changed, 34 insertions(+), 51 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> > index dd07463cdf48..89ced4152933 100644
> > --- a/tools/testing/selftests/resctrl/resctrl.h
> > +++ b/tools/testing/selftests/resctrl/resctrl.h

> > diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
> > index bd36ee206602..bd547a10791c 100644
> > --- a/tools/testing/selftests/resctrl/resctrlfs.c
> > +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> > @@ -10,6 +10,8 @@
> >   */
> >  #include "resctrl.h"
> >  
> > +#include <limits.h>
> > +
> 
> Could you please include <limits.h> before the local resctrl.h?

Believe me I tried that first but it did not work. So this intentionally 
in the current order as resctrl.h defines _GNU_SOURCE which is among 
things that tends to alter many things. If I reorder them, the build gives 
me these issues:

resctrlfs.c: In function ‘taskset_benchmark’:
resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; 
did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration]
  284 |  CPU_ZERO(&my_set);
      |  ^~~~~~~~
      |  FP_ZERO
resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ 
[-Wimplicit-function-declaration]
  285 |  CPU_SET(cpu_no, &my_set);
      |  ^~~~~~~
resctrlfs.c:287:6: warning: implicit declaration of function 
‘sched_setaffinity’ [-Wimplicit-function-declaration]
  287 |  if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
      |      ^~~~~~~~~~~~~~~~~

It might be useful to move _GNU_SOURCE define into Makefile though to 
avoid these kind of issues (but that's not material for this patch).

> >  static int find_resctrl_mount(char *buffer)
> >  {
> >  	FILE *mounts;
> > @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)
> >  
> >  /*
> >   * validate_resctrl_feature_request - Check if requested feature is valid.
> > - * @resctrl_val:	Requested feature
> > + * @resource:	Required resource (e.g., MB, L3, L2, L3_MON, etc.)
> > + * @feature:	Feature to be checked under resource (can be NULL). This path
> > + *		is relative to the resource path.
> 
> I do not think "this path" is accurate. @feature is not a path but an entry
> within the mon_features file.

Yes, agreed.

> Also please note that mon_features only exists for L3_MON, none of the other
> listed resources have an associated mon_features file in resctrl. This
> function is created to be generic has specific requirements on what
> valid (never checked) parameters should be. This may be ok with the usage
> but it should not pretend to be generic.

So are you recommending I split this function into two where the new one 
would do the mon_features check?

> >  	char *res;
> >  	FILE *inf;
> >  	int ret;
> >  
> > -	if (!resctrl_val)
> > +	if (!resource)
> >  		return false;
> >  
> >  	ret = find_resctrl_mount(NULL);
> >  	if (ret)
> >  		return false;
> >  
> > -	if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
> > -		if (!stat(L3_PATH, &statbuf))
> > -			return true;
> > -	} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
> > -		if (!stat(MB_PATH, &statbuf))
> > -			return true;
> > -	} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
> > -		   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
> > -		if (!stat(L3_MON_PATH, &statbuf)) {
> > -			inf = fopen(L3_MON_FEATURES_PATH, "r");
> > -			if (!inf)
> > -				return false;
> > -
> > -			if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
> > -				res = fgrep(inf, "llc_occupancy");
> > -				if (res) {
> > -					found = true;
> > -					free(res);
> > -				}
> > -			}
> > -
> > -			if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
> > -				res = fgrep(inf, "mbm_total_bytes");
> > -				if (res) {
> > -					free(res);
> > -					res = fgrep(inf, "mbm_local_bytes");
> > -					if (res) {
> > -						found = true;
> > -						free(res);
> > -					}
> > -				}
> > -			}
> > -			fclose(inf);
> > -		}
> > -	}
> > +	snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);
> > +
> > +	if (stat(res_path, &statbuf))
> > +		return false;
> > +
> > +	if (!feature)
> > +		return true;
> > +
> > +	snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
> > +	inf = fopen(res_path, "r");
> > +	if (!inf)
> > +		return false;
> > +
> > +	res = fgrep(inf, feature);
> > +	free(res);
> > +	fclose(inf);
> >  
> > -	return found;
> > +	return res;
> 
> This is unexpected. Function should return bool but instead returns a char * that
> has been freed?

Okay, I understand it looks confusing when relying on implicit conversion 
to boolean (despite being functionally correct).

I can do this instead to make it more obvious the intention is to convert 
the result to boolean and not return a pointer:
	return !!res;

-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-12 22:06   ` Reinette Chatre
@ 2023-09-13 11:11     ` Ilpo Järvinen
  2023-09-13 20:58       ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-13 11:11 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

On Tue, 12 Sep 2023, Reinette Chatre wrote:
> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> > The test runner run_cmt_test() in resctrl_tests.c checks for CMT
> > feature and does not run cmt_resctrl_val() if CMT is not supported.
> > Then cmt_resctrl_val() also check is CMT is supported.
> > 
> > Remove the duplicated feature check for CMT from cmt_resctrl_val().
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org>
> 
> This does not look like stable material to me. 

I know but when constructing this series I had 2 options:

Either convert also this when changing validate_resctrl_feature_request() 
or remove this call entirely.

Given it's duplicate of the other CMT check, I chose to just remove it 
(which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires 
this, this has to go stable if 4/5 goes too.

-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
  2023-09-12 22:10   ` Reinette Chatre
@ 2023-09-13 11:43     ` Ilpo Järvinen
  2023-09-13 21:00       ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-13 11:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

On Tue, 12 Sep 2023, Reinette Chatre wrote:
> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> > 5% difference upper bound for success is a bit on the low side for the
> 
> "a bit on the low side" is very vague.

The commit that introduced that 5% bound plainly admitted it's "randomly 
chosen value". At least that wasn't vague, I guess. :-)

So what I'm trying to do here is to have "randomly chosen value" replaced 
with a value that seems to work well enough based on measurements on 
a large set of platforms.

Personally, I don't care much about this, I can just ignore the failures 
due to outliers (and also reports about failing MBA/MBM test if somebody 
ever sends one to me), but if I'd be one running automated tests it would 
be annoying to have a problem like this unaddressed.

> > MBA and MBM tests. Some platforms produce outliers that are slightly
> > above that, typically 6-7%.
> > 
> > Relaxing the MBA/MBM success bound to 8% removes most of the failures
> > due those frequent outliers.
> 
> This description needs more context on what issue is being solved here.
> What does the % difference represent? How was new percentage determined?
> 
> Did you investigate why there are differences between platforms? From
> what I understand these tests measure memory bandwidth using perf and
> resctrl and then compare the difference. Are there interesting things 
> about the platforms on which the difference is higher than 5%?

Not really I think. The number just isn't that stable to always remain 
below 5% (even if it usually does).

Only systematic thing I've come across is that if I play with the read 
pattern for defeating the hw prefetcher (you've seen a patch earlier and 
it will be among the series I'll send after this one), it has an impact 
which looks more systematic across all MBM/MBA tests. But it's not what 
I'm trying now address with this patch.

> Could
> those be systems with multiple sockets (and thus multiple PMUs that need
> to be setup, reset, and read)? Can the reading of the counters be improved
> instead of relaxing the success criteria? A quick comparison between
> get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
> is not surprising ... note how the PMU counters are started and reset
> (potentially on multiple sockets) at every iteration while the resctrl
> counters keep rolling and new values are just subtracted from previous.

Perhaps, I can try to look into it (add to my todo list so I won't 
forget). But in the meantime, this new value is picked using a criteria 
that looks better than "randomly chosen value". If I ever manage to 
address the outliers, the bound could be lowered again.

I'll update the changelog to explain things better.


-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-13 10:01     ` Ilpo Järvinen
@ 2023-09-13 20:58       ` Reinette Chatre
  2023-09-14 10:16         ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-13 20:58 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:
> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>> Unmounting resctrl FS has been moved into the per test functions in
>>> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
>>> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
>>> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
>>> ctrlc_handler() which then unmounts resctrl fs before exiting. The
>>> current section between signal_handler_register() and
>>> signal_handler_unregister(), however, does not cover the entire
>>> duration when resctrl FS is mounted.
>>>
>>> Move signal_handler_register() and signal_handler_unregister() call
>>> into the test functions in resctrl_tests.c to properly unmount resctrl
>>> fs. Adjust child process kill() call in ctrlc_handler() to only be
>>> invoked if the child was already forked.
>>
>> Thank you for catching this.
>>
>>>
>>> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
>>>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
>>>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
>>>  3 files changed, 34 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
>>> index 97b87285ab2a..224ba8544d8a 100644
>>> --- a/tools/testing/selftests/resctrl/cat_test.c
>>> +++ b/tools/testing/selftests/resctrl/cat_test.c
>>> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>  		strcpy(param.filename, RESULT_FILE_NAME1);
>>>  		param.num_of_runs = 0;
>>>  		param.cpu_no = sibling_cpu_no;
>>> -	} else {
>>> -		ret = signal_handler_register();
>>> -		if (ret) {
>>> -			kill(bm_pid, SIGKILL);
>>> -			goto out;
>>> -		}
>>>  	}
>>>  
>>>  	remove(param.filename);
>>> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>  		}
>>>  		close(pipefd[0]);
>>>  		kill(bm_pid, SIGKILL);
>>> -		signal_handler_unregister();
>>>  	}
>>>  
>>> -out:
>>>  	cat_test_cleanup();
>>>  
>>>  	return ret;
>>> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
>>> index 823672a20a43..3d66fbdc2df3 100644
>>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
>>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
>>> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>  
>>>  	ksft_print_msg("Starting MBM BW change ...\n");
>>>  
>>> +	res = signal_handler_register();
>>> +	if (res)
>>> +		return;
>>> +
>>>  	res = mount_resctrlfs();
>>>  	if (res) {
>>> +		signal_handler_unregister();
>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>  		return;
>>>  	}
>>> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>  
>>>  umount:
>>>  	umount_resctrlfs();
>>> +	signal_handler_unregister();
>>>  }
>>>  
>>>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>  
>>>  	ksft_print_msg("Starting MBA Schemata change ...\n");
>>>  
>>> +	res = signal_handler_register();
>>> +	if (res)
>>> +		return;
>>> +
>>>  	res = mount_resctrlfs();
>>>  	if (res) {
>>> +		signal_handler_unregister();
>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>  		return;
>>>  	}
>>> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>  
>>>  umount:
>>>  	umount_resctrlfs();
>>> +	signal_handler_unregister();
>>>  }
>>>  
>>
>> This adds more duplicated code for every test. Have you considered a
>> single test setup function that can be used to mount resctrl FS and setup
>> the signal handler paired with a single test teardown function?
> 
> Yes. Consolidating all these is among my not-yet submitted patches.
> I just had to do a backport-friendly Fixes patch first for this.
> 

Could you please help me understand how the duplicate calls are more
backport friendly?

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-13 11:11     ` Ilpo Järvinen
@ 2023-09-13 20:58       ` Reinette Chatre
  2023-09-14  9:58         ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-13 20:58 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:
> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>> The test runner run_cmt_test() in resctrl_tests.c checks for CMT
>>> feature and does not run cmt_resctrl_val() if CMT is not supported.
>>> Then cmt_resctrl_val() also check is CMT is supported.
>>>
>>> Remove the duplicated feature check for CMT from cmt_resctrl_val().
>>>
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>> Cc: <stable@vger.kernel.org>
>>
>> This does not look like stable material to me. 
> 
> I know but when constructing this series I had 2 options:
> 
> Either convert also this when changing validate_resctrl_feature_request() 
> or remove this call entirely.
> 
> Given it's duplicate of the other CMT check, I chose to just remove it 
> (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires 
> this, this has to go stable if 4/5 goes too.
> 

Understood. This makes it a dependency of an actual fix, which is addressed
in 4/5's sign-off area. This notation is new to me but it is not clear to me
that the dependency should also be tagged as stable material (without a 
fixes tag). Since it is not an actual fix by itself yet is sent to @stable
I think it may cause confusion. Is just listing it as a dependency of the
actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise
this patch can also get a note to the stable team. Something like:

	Cc: <stable@vger.kernel.org> # dependency of "selftests/resctrl: Fix feature checks"

I am not sure though - I would like to avoid confusion and not burden
the stable team. If this is a flow you have used before successfully I'd
defer to your experience.

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name
  2023-09-13 11:02     ` Ilpo Järvinen
@ 2023-09-13 20:59       ` Reinette Chatre
  2023-09-14 11:06         ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-13 20:59 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/13/2023 4:02 AM, Ilpo Järvinen wrote:
> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>> Feature check in validate_resctrl_feature_request() takes in the test
>>> name string and maps that to what to check per test.
>>>
>>> Pass resource and feature names to validate_resctrl_feature_request()
>>> directly rather than deriving them from the test name inside the
>>> function which makes the feature check easier to extend for new test
>>> cases.
>>>
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>> Cc: <stable@vger.kernel.org>
>>
>> This does not seem to be stable material.
> 
> Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as 
> shown by the tags there.
> 
>>> ---
>>>  tools/testing/selftests/resctrl/resctrl.h     |  6 +-
>>>  .../testing/selftests/resctrl/resctrl_tests.c | 10 +--
>>>  tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
>>>  3 files changed, 34 insertions(+), 51 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
>>> index dd07463cdf48..89ced4152933 100644
>>> --- a/tools/testing/selftests/resctrl/resctrl.h
>>> +++ b/tools/testing/selftests/resctrl/resctrl.h
> 
>>> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
>>> index bd36ee206602..bd547a10791c 100644
>>> --- a/tools/testing/selftests/resctrl/resctrlfs.c
>>> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
>>> @@ -10,6 +10,8 @@
>>>   */
>>>  #include "resctrl.h"
>>>  
>>> +#include <limits.h>
>>> +
>>
>> Could you please include <limits.h> before the local resctrl.h?
> 
> Believe me I tried that first but it did not work. So this intentionally 
> in the current order as resctrl.h defines _GNU_SOURCE which is among 
> things that tends to alter many things. If I reorder them, the build gives 
> me these issues:
> 
> resctrlfs.c: In function ‘taskset_benchmark’:
> resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; 
> did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration]
>   284 |  CPU_ZERO(&my_set);
>       |  ^~~~~~~~
>       |  FP_ZERO
> resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ 
> [-Wimplicit-function-declaration]
>   285 |  CPU_SET(cpu_no, &my_set);
>       |  ^~~~~~~
> resctrlfs.c:287:6: warning: implicit declaration of function 
> ‘sched_setaffinity’ [-Wimplicit-function-declaration]
>   287 |  if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
>       |      ^~~~~~~~~~~~~~~~~
> 
> It might be useful to move _GNU_SOURCE define into Makefile though to 
> avoid these kind of issues (but that's not material for this patch).

How about a #define _GNU_SOURCE in this file as an intermediate step?
I did see your patch making this change but cannot see how it is
coordinated with fixing the include order in this file.

> 
>>>  static int find_resctrl_mount(char *buffer)
>>>  {
>>>  	FILE *mounts;
>>> @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)
>>>  
>>>  /*
>>>   * validate_resctrl_feature_request - Check if requested feature is valid.
>>> - * @resctrl_val:	Requested feature
>>> + * @resource:	Required resource (e.g., MB, L3, L2, L3_MON, etc.)
>>> + * @feature:	Feature to be checked under resource (can be NULL). This path
>>> + *		is relative to the resource path.
>>
>> I do not think "this path" is accurate. @feature is not a path but an entry
>> within the mon_features file.
> 
> Yes, agreed.
> 
>> Also please note that mon_features only exists for L3_MON, none of the other
>> listed resources have an associated mon_features file in resctrl. This
>> function is created to be generic has specific requirements on what
>> valid (never checked) parameters should be. This may be ok with the usage
>> but it should not pretend to be generic.
> 
> So are you recommending I split this function into two where the new one 
> would do the mon_features check?

No need to split the function. That seems overkill considering its
captive usage. I think a snippet making its usage clear will be helpful.
Something like:

	@feature: <description>. Can only be set for L3_MON. Must be
		  NULL for all other resources.

Please feel free to improve.


> 
>>>  	char *res;
>>>  	FILE *inf;
>>>  	int ret;
>>>  
>>> -	if (!resctrl_val)
>>> +	if (!resource)
>>>  		return false;
>>>  
>>>  	ret = find_resctrl_mount(NULL);
>>>  	if (ret)
>>>  		return false;
>>>  
>>> -	if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
>>> -		if (!stat(L3_PATH, &statbuf))
>>> -			return true;
>>> -	} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
>>> -		if (!stat(MB_PATH, &statbuf))
>>> -			return true;
>>> -	} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
>>> -		   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
>>> -		if (!stat(L3_MON_PATH, &statbuf)) {
>>> -			inf = fopen(L3_MON_FEATURES_PATH, "r");
>>> -			if (!inf)
>>> -				return false;
>>> -
>>> -			if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
>>> -				res = fgrep(inf, "llc_occupancy");
>>> -				if (res) {
>>> -					found = true;
>>> -					free(res);
>>> -				}
>>> -			}
>>> -
>>> -			if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
>>> -				res = fgrep(inf, "mbm_total_bytes");
>>> -				if (res) {
>>> -					free(res);
>>> -					res = fgrep(inf, "mbm_local_bytes");
>>> -					if (res) {
>>> -						found = true;
>>> -						free(res);
>>> -					}
>>> -				}
>>> -			}
>>> -			fclose(inf);
>>> -		}
>>> -	}
>>> +	snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);
>>> +
>>> +	if (stat(res_path, &statbuf))
>>> +		return false;
>>> +
>>> +	if (!feature)
>>> +		return true;
>>> +
>>> +	snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
>>> +	inf = fopen(res_path, "r");
>>> +	if (!inf)
>>> +		return false;
>>> +
>>> +	res = fgrep(inf, feature);
>>> +	free(res);
>>> +	fclose(inf);
>>>  
>>> -	return found;
>>> +	return res;
>>
>> This is unexpected. Function should return bool but instead returns a char * that
>> has been freed?
> 
> Okay, I understand it looks confusing when relying on implicit conversion 
> to boolean (despite being functionally correct).
> 
> I can do this instead to make it more obvious the intention is to convert 
> the result to boolean and not return a pointer:
> 	return !!res;
> 

Looks good to me. 

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
  2023-09-13 11:43     ` Ilpo Järvinen
@ 2023-09-13 21:00       ` Reinette Chatre
  0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-09-13 21:00 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/13/2023 4:43 AM, Ilpo Järvinen wrote:
> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>> 5% difference upper bound for success is a bit on the low side for the
>>
>> "a bit on the low side" is very vague.
> 
> The commit that introduced that 5% bound plainly admitted it's "randomly 
> chosen value". At least that wasn't vague, I guess. :-)
> 
> So what I'm trying to do here is to have "randomly chosen value" replaced 
> with a value that seems to work well enough based on measurements on 
> a large set of platforms.

Already a better motivation for this change. Your cover letter also hints
at this but this description does not mention that this is not just
another number pulled from the air but indeed one that is based on
significant testing on a large variety of systems. This description can
surely mention all the work you did that ended with proposing this new
number, no?

> 
> Personally, I don't care much about this, I can just ignore the failures 
> due to outliers (and also reports about failing MBA/MBM test if somebody 
> ever sends one to me), but if I'd be one running automated tests it would 
> be annoying to have a problem like this unaddressed.

In no way was I suggesting that this should not be addressed.

> 
>>> MBA and MBM tests. Some platforms produce outliers that are slightly
>>> above that, typically 6-7%.
>>>
>>> Relaxing the MBA/MBM success bound to 8% removes most of the failures
>>> due those frequent outliers.
>>
>> This description needs more context on what issue is being solved here.
>> What does the % difference represent? How was new percentage determined?
>>
>> Did you investigate why there are differences between platforms? From
>> what I understand these tests measure memory bandwidth using perf and
>> resctrl and then compare the difference. Are there interesting things 
>> about the platforms on which the difference is higher than 5%?
> 
> Not really I think. The number just isn't that stable to always remain 
> below 5% (even if it usually does).
> 
> Only systematic thing I've come across is that if I play with the read 
> pattern for defeating the hw prefetcher (you've seen a patch earlier and 
> it will be among the series I'll send after this one), it has an impact 
> which looks more systematic across all MBM/MBA tests. But it's not what 
> I'm trying now address with this patch.
> 
>> Could
>> those be systems with multiple sockets (and thus multiple PMUs that need
>> to be setup, reset, and read)? Can the reading of the counters be improved
>> instead of relaxing the success criteria? A quick comparison between
>> get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
>> is not surprising ... note how the PMU counters are started and reset
>> (potentially on multiple sockets) at every iteration while the resctrl
>> counters keep rolling and new values are just subtracted from previous.
> 
> Perhaps, I can try to look into it (add to my todo list so I won't 
> forget). But in the meantime, this new value is picked using a criteria 
> that looks better than "randomly chosen value". If I ever manage to 
> address the outliers, the bound could be lowered again.
> 
> I'll update the changelog to explain things better.
> 
> 
ok, thank you.

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-13 20:58       ` Reinette Chatre
@ 2023-09-14  9:58         ` Ilpo Järvinen
  2023-09-14 15:04           ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-14  9:58 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]

On Wed, 13 Sep 2023, Reinette Chatre wrote:
> On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:
> > On Tue, 12 Sep 2023, Reinette Chatre wrote:
> >> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> >>> The test runner run_cmt_test() in resctrl_tests.c checks for CMT
> >>> feature and does not run cmt_resctrl_val() if CMT is not supported.
> >>> Then cmt_resctrl_val() also check is CMT is supported.
> >>>
> >>> Remove the duplicated feature check for CMT from cmt_resctrl_val().
> >>>
> >>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >>> Cc: <stable@vger.kernel.org>
> >>
> >> This does not look like stable material to me. 
> > 
> > I know but when constructing this series I had 2 options:
> > 
> > Either convert also this when changing validate_resctrl_feature_request() 
> > or remove this call entirely.
> > 
> > Given it's duplicate of the other CMT check, I chose to just remove it 
> > (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires 
> > this, this has to go stable if 4/5 goes too.
> > 
> 
> Understood. This makes it a dependency of an actual fix, which is addressed
> in 4/5's sign-off area. This notation is new to me but it is not clear to me
> that the dependency should also be tagged as stable material (without a 
> fixes tag). Since it is not an actual fix by itself yet is sent to @stable
> I think it may cause confusion. Is just listing it as a dependency of the
> actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise
> this patch can also get a note to the stable team. Something like:
> 
> 	Cc: <stable@vger.kernel.org> # dependency of "selftests/resctrl: Fix feature checks"
> 
> I am not sure though - I would like to avoid confusion and not burden
> the stable team. If this is a flow you have used before successfully I'd
> defer to your experience.

I came across that dependency format when Greg KH replied to somebody how 
to deal with the cases where there isn't yet a commit id 
(the cases mentioned in Documentation/process/stable-kernel-rules.rst 
assumes there is already a commit id). Unfortunately it's long time ago 
so I cannot easily find the link.

Documentation/process/stable-kernel-rules.rst doesn't state that the 
stable address should be only used for the patches with Fixes. In general, 
I believe this doesn't matter much because whether something is Cc'ed or 
not to stable@vger.kernel.org doesn't seems to impact the decision if a 
patch goes into stable or not (even if even some maintainers seem to 
pretend leaving it out makes a difference so I tend to play along and 
smile myself how incorrect that assumption is :-)). 


-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-13 20:58       ` Reinette Chatre
@ 2023-09-14 10:16         ` Ilpo Järvinen
  2023-09-14 15:04           ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-14 10:16 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 6672 bytes --]

On Wed, 13 Sep 2023, Reinette Chatre wrote:
> On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:
> > On Tue, 12 Sep 2023, Reinette Chatre wrote:
> >> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> >>> Unmounting resctrl FS has been moved into the per test functions in
> >>> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
> >>> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
> >>> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
> >>> ctrlc_handler() which then unmounts resctrl fs before exiting. The
> >>> current section between signal_handler_register() and
> >>> signal_handler_unregister(), however, does not cover the entire
> >>> duration when resctrl FS is mounted.
> >>>
> >>> Move signal_handler_register() and signal_handler_unregister() call
> >>> into the test functions in resctrl_tests.c to properly unmount resctrl
> >>> fs. Adjust child process kill() call in ctrlc_handler() to only be
> >>> invoked if the child was already forked.
> >>
> >> Thank you for catching this.
> >>
> >>>
> >>> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
> >>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >>> Cc: <stable@vger.kernel.org>
> >>> ---
> >>>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
> >>>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
> >>>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
> >>>  3 files changed, 34 insertions(+), 20 deletions(-)
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> >>> index 97b87285ab2a..224ba8544d8a 100644
> >>> --- a/tools/testing/selftests/resctrl/cat_test.c
> >>> +++ b/tools/testing/selftests/resctrl/cat_test.c
> >>> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>>  		strcpy(param.filename, RESULT_FILE_NAME1);
> >>>  		param.num_of_runs = 0;
> >>>  		param.cpu_no = sibling_cpu_no;
> >>> -	} else {
> >>> -		ret = signal_handler_register();
> >>> -		if (ret) {
> >>> -			kill(bm_pid, SIGKILL);
> >>> -			goto out;
> >>> -		}
> >>>  	}
> >>>  
> >>>  	remove(param.filename);
> >>> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>>  		}
> >>>  		close(pipefd[0]);
> >>>  		kill(bm_pid, SIGKILL);
> >>> -		signal_handler_unregister();
> >>>  	}
> >>>  
> >>> -out:
> >>>  	cat_test_cleanup();
> >>>  
> >>>  	return ret;
> >>> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> index 823672a20a43..3d66fbdc2df3 100644
> >>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >>>  
> >>>  	ksft_print_msg("Starting MBM BW change ...\n");
> >>>  
> >>> +	res = signal_handler_register();
> >>> +	if (res)
> >>> +		return;
> >>> +
> >>>  	res = mount_resctrlfs();
> >>>  	if (res) {
> >>> +		signal_handler_unregister();
> >>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >>>  		return;
> >>>  	}
> >>> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >>>  
> >>>  umount:
> >>>  	umount_resctrlfs();
> >>> +	signal_handler_unregister();
> >>>  }
> >>>  
> >>>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>>  
> >>>  	ksft_print_msg("Starting MBA Schemata change ...\n");
> >>>  
> >>> +	res = signal_handler_register();
> >>> +	if (res)
> >>> +		return;
> >>> +
> >>>  	res = mount_resctrlfs();
> >>>  	if (res) {
> >>> +		signal_handler_unregister();
> >>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >>>  		return;
> >>>  	}
> >>> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>>  
> >>>  umount:
> >>>  	umount_resctrlfs();
> >>> +	signal_handler_unregister();
> >>>  }
> >>>  
> >>
> >> This adds more duplicated code for every test. Have you considered a
> >> single test setup function that can be used to mount resctrl FS and setup
> >> the signal handler paired with a single test teardown function?
> > 
> > Yes. Consolidating all these is among my not-yet submitted patches.
> > I just had to do a backport-friendly Fixes patch first for this.
> > 
> 
> Could you please help me understand how the duplicate calls are more
> backport friendly?

Hi,

It's simply because the refactoring that has to be done to be able to 
introduce the generalized test framework is much more invasive and far 
reaching than this patch. Essentially, all the call signatures of the test 
functions need to match and the feature checks need to be done in new per 
test functions too. This is the diffstat of those changes alone:

 tools/testing/selftests/resctrl/cat_test.c      |  21 +++--
 tools/testing/selftests/resctrl/cmt_test.c      |  26 +++--
 tools/testing/selftests/resctrl/mba_test.c      |  20 +++-
 tools/testing/selftests/resctrl/mbm_test.c      |  20 +++-
 tools/testing/selftests/resctrl/resctrl.h       |  43 ++++++++-
 tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++----------------------------
 tools/testing/selftests/resctrl/resctrlfs.c     |   5 +

(tools/testing/selftests/resctrl/resctrl_tests.c --- part would 
be slightly less if I'd reorder this patch but that only 24 lines off as 
per diffstat of this patch).

But that's not all.... To be able to push the generalized test framework 
to stable, you need to also count in the benchmark cmd changes which 
worked towards making the call signatures identical. So here's the 
diffstat for that series for quick reference:

 tools/testing/selftests/resctrl/cache.c       |   5 +-
 tools/testing/selftests/resctrl/cat_test.c    |  13 +--
 tools/testing/selftests/resctrl/cmt_test.c    |  34 ++++--
 tools/testing/selftests/resctrl/mba_test.c    |   4 +-
 tools/testing/selftests/resctrl/mbm_test.c    |   7 +-
 tools/testing/selftests/resctrl/resctrl.h     |  16 +--
 .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++----------
 tools/testing/selftests/resctrl/resctrl_val.c |  10 +-

That's ~500 lines changed vs ~50 so it's a magnitude worse and much less 
localized.

And rest assured, I did not like introducing the duplicated calls any more 
than you do (I did not write the generalized test framework for nothing, 
after all) but the way taken in this patch seemed the most reasonable 
option under these circumstances.

-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name
  2023-09-13 20:59       ` Reinette Chatre
@ 2023-09-14 11:06         ` Ilpo Järvinen
  0 siblings, 0 replies; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-14 11:06 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 5204 bytes --]

On Wed, 13 Sep 2023, Reinette Chatre wrote:
> On 9/13/2023 4:02 AM, Ilpo Järvinen wrote:
> > On Tue, 12 Sep 2023, Reinette Chatre wrote:
> >> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> >>> Feature check in validate_resctrl_feature_request() takes in the test
> >>> name string and maps that to what to check per test.
> >>>
> >>> Pass resource and feature names to validate_resctrl_feature_request()
> >>> directly rather than deriving them from the test name inside the
> >>> function which makes the feature check easier to extend for new test
> >>> cases.
> >>>
> >>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >>> Cc: <stable@vger.kernel.org>
> >>
> >> This does not seem to be stable material.
> > 
> > Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as 
> > shown by the tags there.
> > 
> >>> ---
> >>>  tools/testing/selftests/resctrl/resctrl.h     |  6 +-
> >>>  .../testing/selftests/resctrl/resctrl_tests.c | 10 +--
> >>>  tools/testing/selftests/resctrl/resctrlfs.c   | 69 ++++++++-----------
> >>>  3 files changed, 34 insertions(+), 51 deletions(-)
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> >>> index dd07463cdf48..89ced4152933 100644
> >>> --- a/tools/testing/selftests/resctrl/resctrl.h
> >>> +++ b/tools/testing/selftests/resctrl/resctrl.h
> > 
> >>> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
> >>> index bd36ee206602..bd547a10791c 100644
> >>> --- a/tools/testing/selftests/resctrl/resctrlfs.c
> >>> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> >>> @@ -10,6 +10,8 @@
> >>>   */
> >>>  #include "resctrl.h"
> >>>  
> >>> +#include <limits.h>
> >>> +
> >>
> >> Could you please include <limits.h> before the local resctrl.h?
> > 
> > Believe me I tried that first but it did not work. So this intentionally 
> > in the current order as resctrl.h defines _GNU_SOURCE which is among 
> > things that tends to alter many things. If I reorder them, the build gives 
> > me these issues:
> > 
> > resctrlfs.c: In function ‘taskset_benchmark’:
> > resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; 
> > did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration]
> >   284 |  CPU_ZERO(&my_set);
> >       |  ^~~~~~~~
> >       |  FP_ZERO
> > resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ 
> > [-Wimplicit-function-declaration]
> >   285 |  CPU_SET(cpu_no, &my_set);
> >       |  ^~~~~~~
> > resctrlfs.c:287:6: warning: implicit declaration of function 
> > ‘sched_setaffinity’ [-Wimplicit-function-declaration]
> >   287 |  if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
> >       |      ^~~~~~~~~~~~~~~~~
> > 
> > It might be useful to move _GNU_SOURCE define into Makefile though to 
> > avoid these kind of issues (but that's not material for this patch).
> 
> How about a #define _GNU_SOURCE in this file as an intermediate step?
> I did see your patch making this change but cannot see how it is
> coordinated with fixing the include order in this file.

I'll just make that change part of this series and use also it as 
dependency. Making an intermediate step just for stable that is going to 
immediately removed in mainline would just causing the code to diverge 
unnecessarily, IMO.

There's also a small risk for some other bug that does not cause compile 
to fail due to differences because of a late define for _GNU_SOURCE. I 
don't find it very likely but seems possible due to differences in some 
constant values (not that the resctrl selftest code is very good at using 
those defined constants in the first place, there are plenty of literals 
still to cleanup).

> >>>  static int find_resctrl_mount(char *buffer)
> >>>  {
> >>>  	FILE *mounts;
> >>> @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)
> >>>  
> >>>  /*
> >>>   * validate_resctrl_feature_request - Check if requested feature is valid.
> >>> - * @resctrl_val:	Requested feature
> >>> + * @resource:	Required resource (e.g., MB, L3, L2, L3_MON, etc.)
> >>> + * @feature:	Feature to be checked under resource (can be NULL). This path
> >>> + *		is relative to the resource path.
> >>
> >> I do not think "this path" is accurate. @feature is not a path but an entry
> >> within the mon_features file.
> > 
> > Yes, agreed.
> > 
> >> Also please note that mon_features only exists for L3_MON, none of the other
> >> listed resources have an associated mon_features file in resctrl. This
> >> function is created to be generic has specific requirements on what
> >> valid (never checked) parameters should be. This may be ok with the usage
> >> but it should not pretend to be generic.
> > 
> > So are you recommending I split this function into two where the new one 
> > would do the mon_features check?
> 
> No need to split the function. That seems overkill considering its
> captive usage. I think a snippet making its usage clear will be helpful.
> Something like:
> 
> 	@feature: <description>. Can only be set for L3_MON. Must be
> 		  NULL for all other resources.
> 
> Please feel free to improve.

Thanks, I'll do that.

-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-14 10:16         ` Ilpo Järvinen
@ 2023-09-14 15:04           ` Reinette Chatre
  2023-09-14 17:05             ` Ilpo Järvinen
  0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-09-14 15:04 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:
> On Wed, 13 Sep 2023, Reinette Chatre wrote:
>> On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:
>>> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>>>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>>>> Unmounting resctrl FS has been moved into the per test functions in
>>>>> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
>>>>> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
>>>>> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
>>>>> ctrlc_handler() which then unmounts resctrl fs before exiting. The
>>>>> current section between signal_handler_register() and
>>>>> signal_handler_unregister(), however, does not cover the entire
>>>>> duration when resctrl FS is mounted.
>>>>>
>>>>> Move signal_handler_register() and signal_handler_unregister() call
>>>>> into the test functions in resctrl_tests.c to properly unmount resctrl
>>>>> fs. Adjust child process kill() call in ctrlc_handler() to only be
>>>>> invoked if the child was already forked.
>>>>
>>>> Thank you for catching this.
>>>>
>>>>>
>>>>> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
>>>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>> ---
>>>>>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
>>>>>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
>>>>>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
>>>>>  3 files changed, 34 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
>>>>> index 97b87285ab2a..224ba8544d8a 100644
>>>>> --- a/tools/testing/selftests/resctrl/cat_test.c
>>>>> +++ b/tools/testing/selftests/resctrl/cat_test.c
>>>>> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>>>  		strcpy(param.filename, RESULT_FILE_NAME1);
>>>>>  		param.num_of_runs = 0;
>>>>>  		param.cpu_no = sibling_cpu_no;
>>>>> -	} else {
>>>>> -		ret = signal_handler_register();
>>>>> -		if (ret) {
>>>>> -			kill(bm_pid, SIGKILL);
>>>>> -			goto out;
>>>>> -		}
>>>>>  	}
>>>>>  
>>>>>  	remove(param.filename);
>>>>> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>>>  		}
>>>>>  		close(pipefd[0]);
>>>>>  		kill(bm_pid, SIGKILL);
>>>>> -		signal_handler_unregister();
>>>>>  	}
>>>>>  
>>>>> -out:
>>>>>  	cat_test_cleanup();
>>>>>  
>>>>>  	return ret;
>>>>> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>> index 823672a20a43..3d66fbdc2df3 100644
>>>>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>  
>>>>>  	ksft_print_msg("Starting MBM BW change ...\n");
>>>>>  
>>>>> +	res = signal_handler_register();
>>>>> +	if (res)
>>>>> +		return;
>>>>> +
>>>>>  	res = mount_resctrlfs();
>>>>>  	if (res) {
>>>>> +		signal_handler_unregister();
>>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>>>  		return;
>>>>>  	}
>>>>> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>  
>>>>>  umount:
>>>>>  	umount_resctrlfs();
>>>>> +	signal_handler_unregister();
>>>>>  }
>>>>>  
>>>>>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>  
>>>>>  	ksft_print_msg("Starting MBA Schemata change ...\n");
>>>>>  
>>>>> +	res = signal_handler_register();
>>>>> +	if (res)
>>>>> +		return;
>>>>> +
>>>>>  	res = mount_resctrlfs();
>>>>>  	if (res) {
>>>>> +		signal_handler_unregister();
>>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>>>  		return;
>>>>>  	}
>>>>> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>  
>>>>>  umount:
>>>>>  	umount_resctrlfs();
>>>>> +	signal_handler_unregister();
>>>>>  }
>>>>>  
>>>>
>>>> This adds more duplicated code for every test. Have you considered a
>>>> single test setup function that can be used to mount resctrl FS and setup
>>>> the signal handler paired with a single test teardown function?
>>>
>>> Yes. Consolidating all these is among my not-yet submitted patches.
>>> I just had to do a backport-friendly Fixes patch first for this.
>>>
>>
>> Could you please help me understand how the duplicate calls are more
>> backport friendly?
> 
> Hi,
> 
> It's simply because the refactoring that has to be done to be able to 
> introduce the generalized test framework is much more invasive and far 
> reaching than this patch. Essentially, all the call signatures of the test 
> functions need to match and the feature checks need to be done in new per 
> test functions too. This is the diffstat of those changes alone:
> 
>  tools/testing/selftests/resctrl/cat_test.c      |  21 +++--
>  tools/testing/selftests/resctrl/cmt_test.c      |  26 +++--
>  tools/testing/selftests/resctrl/mba_test.c      |  20 +++-
>  tools/testing/selftests/resctrl/mbm_test.c      |  20 +++-
>  tools/testing/selftests/resctrl/resctrl.h       |  43 ++++++++-
>  tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++----------------------------
>  tools/testing/selftests/resctrl/resctrlfs.c     |   5 +
> 
> (tools/testing/selftests/resctrl/resctrl_tests.c --- part would 
> be slightly less if I'd reorder this patch but that only 24 lines off as 
> per diffstat of this patch).
> 
> But that's not all.... To be able to push the generalized test framework 
> to stable, you need to also count in the benchmark cmd changes which 
> worked towards making the call signatures identical. So here's the 
> diffstat for that series for quick reference:
> 
>  tools/testing/selftests/resctrl/cache.c       |   5 +-
>  tools/testing/selftests/resctrl/cat_test.c    |  13 +--
>  tools/testing/selftests/resctrl/cmt_test.c    |  34 ++++--
>  tools/testing/selftests/resctrl/mba_test.c    |   4 +-
>  tools/testing/selftests/resctrl/mbm_test.c    |   7 +-
>  tools/testing/selftests/resctrl/resctrl.h     |  16 +--
>  .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++----------
>  tools/testing/selftests/resctrl/resctrl_val.c |  10 +-
> 
> That's ~500 lines changed vs ~50 so it's a magnitude worse and much less 
> localized.
> 
> And rest assured, I did not like introducing the duplicated calls any more 
> than you do (I did not write the generalized test framework for nothing, 
> after all) but the way taken in this patch seemed the most reasonable 
> option under these circumstances.
> 

hmmm ... I did not expect that a total refactoring would be needed.

I was thinking about a change from this:


	testX(...) 
	{
	
		res = signal_handler_register();
		/* error handling */
		res = mount_resctrlfs();
		/* error handling */
		
		/* test */

		unmount_resctrlfs();
		signal_handler_register();

	}


to this:


	int test_setup(...)
	{
		res = signal_handler_register();
		/* error handling */
		res = mount_resctrlfs();
		/* error handling */
	}


	void test_cleanup(...)
	{
		unmount_resctrlfs();
		signal_handler_register();
	}


	testX(...)
	{

		res = test_setup(..);
		/* error handling */

		/* test */

		test_cleanup();
	}

I expect this to also support the bigger refactoring.

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test
  2023-09-14  9:58         ` Ilpo Järvinen
@ 2023-09-14 15:04           ` Reinette Chatre
  0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-09-14 15:04 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/14/2023 2:58 AM, Ilpo Järvinen wrote:
> On Wed, 13 Sep 2023, Reinette Chatre wrote:
>> On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:
>>> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>>>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>>>> The test runner run_cmt_test() in resctrl_tests.c checks for CMT
>>>>> feature and does not run cmt_resctrl_val() if CMT is not supported.
>>>>> Then cmt_resctrl_val() also check is CMT is supported.
>>>>>
>>>>> Remove the duplicated feature check for CMT from cmt_resctrl_val().
>>>>>
>>>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> This does not look like stable material to me. 
>>>
>>> I know but when constructing this series I had 2 options:
>>>
>>> Either convert also this when changing validate_resctrl_feature_request() 
>>> or remove this call entirely.
>>>
>>> Given it's duplicate of the other CMT check, I chose to just remove it 
>>> (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires 
>>> this, this has to go stable if 4/5 goes too.
>>>
>>
>> Understood. This makes it a dependency of an actual fix, which is addressed
>> in 4/5's sign-off area. This notation is new to me but it is not clear to me
>> that the dependency should also be tagged as stable material (without a 
>> fixes tag). Since it is not an actual fix by itself yet is sent to @stable
>> I think it may cause confusion. Is just listing it as a dependency of the
>> actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise
>> this patch can also get a note to the stable team. Something like:
>>
>> 	Cc: <stable@vger.kernel.org> # dependency of "selftests/resctrl: Fix feature checks"
>>
>> I am not sure though - I would like to avoid confusion and not burden
>> the stable team. If this is a flow you have used before successfully I'd
>> defer to your experience.
> 
> I came across that dependency format when Greg KH replied to somebody how 
> to deal with the cases where there isn't yet a commit id 
> (the cases mentioned in Documentation/process/stable-kernel-rules.rst 
> assumes there is already a commit id). Unfortunately it's long time ago 
> so I cannot easily find the link.

I see, thank you. I was not aware of this custom.

Reinette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-14 15:04           ` Reinette Chatre
@ 2023-09-14 17:05             ` Ilpo Järvinen
  2023-09-14 17:29               ` Reinette Chatre
  0 siblings, 1 reply; 25+ messages in thread
From: Ilpo Järvinen @ 2023-09-14 17:05 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

[-- Attachment #1: Type: text/plain, Size: 8322 bytes --]

On Thu, 14 Sep 2023, Reinette Chatre wrote:
> On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:
> > On Wed, 13 Sep 2023, Reinette Chatre wrote:
> >> On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:
> >>> On Tue, 12 Sep 2023, Reinette Chatre wrote:
> >>>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> >>>>> Unmounting resctrl FS has been moved into the per test functions in
> >>>>> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
> >>>>> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
> >>>>> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
> >>>>> ctrlc_handler() which then unmounts resctrl fs before exiting. The
> >>>>> current section between signal_handler_register() and
> >>>>> signal_handler_unregister(), however, does not cover the entire
> >>>>> duration when resctrl FS is mounted.
> >>>>>
> >>>>> Move signal_handler_register() and signal_handler_unregister() call
> >>>>> into the test functions in resctrl_tests.c to properly unmount resctrl
> >>>>> fs. Adjust child process kill() call in ctrlc_handler() to only be
> >>>>> invoked if the child was already forked.
> >>>>
> >>>> Thank you for catching this.
> >>>>
> >>>>>
> >>>>> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
> >>>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >>>>> Cc: <stable@vger.kernel.org>
> >>>>> ---
> >>>>>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
> >>>>>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
> >>>>>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
> >>>>>  3 files changed, 34 insertions(+), 20 deletions(-)
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
> >>>>> index 97b87285ab2a..224ba8544d8a 100644
> >>>>> --- a/tools/testing/selftests/resctrl/cat_test.c
> >>>>> +++ b/tools/testing/selftests/resctrl/cat_test.c
> >>>>> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>>>>  		strcpy(param.filename, RESULT_FILE_NAME1);
> >>>>>  		param.num_of_runs = 0;
> >>>>>  		param.cpu_no = sibling_cpu_no;
> >>>>> -	} else {
> >>>>> -		ret = signal_handler_register();
> >>>>> -		if (ret) {
> >>>>> -			kill(bm_pid, SIGKILL);
> >>>>> -			goto out;
> >>>>> -		}
> >>>>>  	}
> >>>>>  
> >>>>>  	remove(param.filename);
> >>>>> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
> >>>>>  		}
> >>>>>  		close(pipefd[0]);
> >>>>>  		kill(bm_pid, SIGKILL);
> >>>>> -		signal_handler_unregister();
> >>>>>  	}
> >>>>>  
> >>>>> -out:
> >>>>>  	cat_test_cleanup();
> >>>>>  
> >>>>>  	return ret;
> >>>>> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>>>> index 823672a20a43..3d66fbdc2df3 100644
> >>>>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> >>>>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>>>> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >>>>>  
> >>>>>  	ksft_print_msg("Starting MBM BW change ...\n");
> >>>>>  
> >>>>> +	res = signal_handler_register();
> >>>>> +	if (res)
> >>>>> +		return;
> >>>>> +
> >>>>>  	res = mount_resctrlfs();
> >>>>>  	if (res) {
> >>>>> +		signal_handler_unregister();
> >>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >>>>>  		return;
> >>>>>  	}
> >>>>> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
> >>>>>  
> >>>>>  umount:
> >>>>>  	umount_resctrlfs();
> >>>>> +	signal_handler_unregister();
> >>>>>  }
> >>>>>  
> >>>>>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>>>> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>>>>  
> >>>>>  	ksft_print_msg("Starting MBA Schemata change ...\n");
> >>>>>  
> >>>>> +	res = signal_handler_register();
> >>>>> +	if (res)
> >>>>> +		return;
> >>>>> +
> >>>>>  	res = mount_resctrlfs();
> >>>>>  	if (res) {
> >>>>> +		signal_handler_unregister();
> >>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
> >>>>>  		return;
> >>>>>  	}
> >>>>> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
> >>>>>  
> >>>>>  umount:
> >>>>>  	umount_resctrlfs();
> >>>>> +	signal_handler_unregister();
> >>>>>  }
> >>>>>  
> >>>>
> >>>> This adds more duplicated code for every test. Have you considered a
> >>>> single test setup function that can be used to mount resctrl FS and setup
> >>>> the signal handler paired with a single test teardown function?
> >>>
> >>> Yes. Consolidating all these is among my not-yet submitted patches.
> >>> I just had to do a backport-friendly Fixes patch first for this.
> >>>
> >>
> >> Could you please help me understand how the duplicate calls are more
> >> backport friendly?
> > 
> > Hi,
> > 
> > It's simply because the refactoring that has to be done to be able to 
> > introduce the generalized test framework is much more invasive and far 
> > reaching than this patch. Essentially, all the call signatures of the test 
> > functions need to match and the feature checks need to be done in new per 
> > test functions too. This is the diffstat of those changes alone:
> > 
> >  tools/testing/selftests/resctrl/cat_test.c      |  21 +++--
> >  tools/testing/selftests/resctrl/cmt_test.c      |  26 +++--
> >  tools/testing/selftests/resctrl/mba_test.c      |  20 +++-
> >  tools/testing/selftests/resctrl/mbm_test.c      |  20 +++-
> >  tools/testing/selftests/resctrl/resctrl.h       |  43 ++++++++-
> >  tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++----------------------------
> >  tools/testing/selftests/resctrl/resctrlfs.c     |   5 +
> > 
> > (tools/testing/selftests/resctrl/resctrl_tests.c --- part would 
> > be slightly less if I'd reorder this patch but that only 24 lines off as 
> > per diffstat of this patch).
> > 
> > But that's not all.... To be able to push the generalized test framework 
> > to stable, you need to also count in the benchmark cmd changes which 
> > worked towards making the call signatures identical. So here's the 
> > diffstat for that series for quick reference:
> > 
> >  tools/testing/selftests/resctrl/cache.c       |   5 +-
> >  tools/testing/selftests/resctrl/cat_test.c    |  13 +--
> >  tools/testing/selftests/resctrl/cmt_test.c    |  34 ++++--
> >  tools/testing/selftests/resctrl/mba_test.c    |   4 +-
> >  tools/testing/selftests/resctrl/mbm_test.c    |   7 +-
> >  tools/testing/selftests/resctrl/resctrl.h     |  16 +--
> >  .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++----------
> >  tools/testing/selftests/resctrl/resctrl_val.c |  10 +-
> > 
> > That's ~500 lines changed vs ~50 so it's a magnitude worse and much less 
> > localized.
> > 
> > And rest assured, I did not like introducing the duplicated calls any more 
> > than you do (I did not write the generalized test framework for nothing, 
> > after all) but the way taken in this patch seemed the most reasonable 
> > option under these circumstances.
> > 
> 
> hmmm ... I did not expect that a total refactoring would be needed.
> 
> I was thinking about a change from this:
> 
> 
> 	testX(...) 
> 	{
> 	
> 		res = signal_handler_register();
> 		/* error handling */
> 		res = mount_resctrlfs();
> 		/* error handling */
> 		
> 		/* test */
> 
> 		unmount_resctrlfs();
> 		signal_handler_register();
> 
> 	}
> 
> 
> to this:
> 
> 
> 	int test_setup(...)
> 	{
> 		res = signal_handler_register();
> 		/* error handling */
> 		res = mount_resctrlfs();
> 		/* error handling */
> 	}
> 
> 
> 	void test_cleanup(...)
> 	{
> 		unmount_resctrlfs();
> 		signal_handler_register();
> 	}
> 
> 
> 	testX(...)
> 	{
> 
> 		res = test_setup(..);
> 		/* error handling */
> 
> 		/* test */
> 
> 		test_cleanup();
> 	}
> 
> I expect this to also support the bigger refactoring.

Okay, I'll do so then.

However, having already written the generic run_single_test() function 
that is part of the generic test framework, I definitely don't feel those 
helpers would be that helpful for it. It more feels like they'd make the 
flow less obvious by adding those two extra calls there but that's of 
course matter of taste.


-- 
 i.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal
  2023-09-14 17:05             ` Ilpo Järvinen
@ 2023-09-14 17:29               ` Reinette Chatre
  0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-09-14 17:29 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Shuah Khan, Shuah Khan, linux-kselftest,
	Maciej Wieczór-Retman, LKML, Shaopeng Tan, stable

Hi Ilpo,

On 9/14/2023 10:05 AM, Ilpo Järvinen wrote:
> On Thu, 14 Sep 2023, Reinette Chatre wrote:
>> On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:
>>> On Wed, 13 Sep 2023, Reinette Chatre wrote:
>>>> On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:
>>>>> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>>>>>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>>>>>> Unmounting resctrl FS has been moved into the per test functions in
>>>>>>> resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move
>>>>>>> resctrl FS mount/umount to higher level"). In case a signal (SIGINT,
>>>>>>> SIGTERM, or SIGHUP) is received, the running selftest is aborted by
>>>>>>> ctrlc_handler() which then unmounts resctrl fs before exiting. The
>>>>>>> current section between signal_handler_register() and
>>>>>>> signal_handler_unregister(), however, does not cover the entire
>>>>>>> duration when resctrl FS is mounted.
>>>>>>>
>>>>>>> Move signal_handler_register() and signal_handler_unregister() call
>>>>>>> into the test functions in resctrl_tests.c to properly unmount resctrl
>>>>>>> fs. Adjust child process kill() call in ctrlc_handler() to only be
>>>>>>> invoked if the child was already forked.
>>>>>>
>>>>>> Thank you for catching this.
>>>>>>
>>>>>>>
>>>>>>> Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level")
>>>>>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>> ---
>>>>>>>  tools/testing/selftests/resctrl/cat_test.c    |  8 -------
>>>>>>>  .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++
>>>>>>>  tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++---------
>>>>>>>  3 files changed, 34 insertions(+), 20 deletions(-)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
>>>>>>> index 97b87285ab2a..224ba8544d8a 100644
>>>>>>> --- a/tools/testing/selftests/resctrl/cat_test.c
>>>>>>> +++ b/tools/testing/selftests/resctrl/cat_test.c
>>>>>>> @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>>>>>  		strcpy(param.filename, RESULT_FILE_NAME1);
>>>>>>>  		param.num_of_runs = 0;
>>>>>>>  		param.cpu_no = sibling_cpu_no;
>>>>>>> -	} else {
>>>>>>> -		ret = signal_handler_register();
>>>>>>> -		if (ret) {
>>>>>>> -			kill(bm_pid, SIGKILL);
>>>>>>> -			goto out;
>>>>>>> -		}
>>>>>>>  	}
>>>>>>>  
>>>>>>>  	remove(param.filename);
>>>>>>> @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
>>>>>>>  		}
>>>>>>>  		close(pipefd[0]);
>>>>>>>  		kill(bm_pid, SIGKILL);
>>>>>>> -		signal_handler_unregister();
>>>>>>>  	}
>>>>>>>  
>>>>>>> -out:
>>>>>>>  	cat_test_cleanup();
>>>>>>>  
>>>>>>>  	return ret;
>>>>>>> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>>>> index 823672a20a43..3d66fbdc2df3 100644
>>>>>>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>>>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
>>>>>>> @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>>>  
>>>>>>>  	ksft_print_msg("Starting MBM BW change ...\n");
>>>>>>>  
>>>>>>> +	res = signal_handler_register();
>>>>>>> +	if (res)
>>>>>>> +		return;
>>>>>>> +
>>>>>>>  	res = mount_resctrlfs();
>>>>>>>  	if (res) {
>>>>>>> +		signal_handler_unregister();
>>>>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>>>>>  		return;
>>>>>>>  	}
>>>>>>> @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>>>  
>>>>>>>  umount:
>>>>>>>  	umount_resctrlfs();
>>>>>>> +	signal_handler_unregister();
>>>>>>>  }
>>>>>>>  
>>>>>>>  static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>>> @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>>>  
>>>>>>>  	ksft_print_msg("Starting MBA Schemata change ...\n");
>>>>>>>  
>>>>>>> +	res = signal_handler_register();
>>>>>>> +	if (res)
>>>>>>> +		return;
>>>>>>> +
>>>>>>>  	res = mount_resctrlfs();
>>>>>>>  	if (res) {
>>>>>>> +		signal_handler_unregister();
>>>>>>>  		ksft_exit_fail_msg("Failed to mount resctrl FS\n");
>>>>>>>  		return;
>>>>>>>  	}
>>>>>>> @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
>>>>>>>  
>>>>>>>  umount:
>>>>>>>  	umount_resctrlfs();
>>>>>>> +	signal_handler_unregister();
>>>>>>>  }
>>>>>>>  
>>>>>>
>>>>>> This adds more duplicated code for every test. Have you considered a
>>>>>> single test setup function that can be used to mount resctrl FS and setup
>>>>>> the signal handler paired with a single test teardown function?
>>>>>
>>>>> Yes. Consolidating all these is among my not-yet submitted patches.
>>>>> I just had to do a backport-friendly Fixes patch first for this.
>>>>>
>>>>
>>>> Could you please help me understand how the duplicate calls are more
>>>> backport friendly?
>>>
>>> Hi,
>>>
>>> It's simply because the refactoring that has to be done to be able to 
>>> introduce the generalized test framework is much more invasive and far 
>>> reaching than this patch. Essentially, all the call signatures of the test 
>>> functions need to match and the feature checks need to be done in new per 
>>> test functions too. This is the diffstat of those changes alone:
>>>
>>>  tools/testing/selftests/resctrl/cat_test.c      |  21 +++--
>>>  tools/testing/selftests/resctrl/cmt_test.c      |  26 +++--
>>>  tools/testing/selftests/resctrl/mba_test.c      |  20 +++-
>>>  tools/testing/selftests/resctrl/mbm_test.c      |  20 +++-
>>>  tools/testing/selftests/resctrl/resctrl.h       |  43 ++++++++-
>>>  tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++----------------------------
>>>  tools/testing/selftests/resctrl/resctrlfs.c     |   5 +
>>>
>>> (tools/testing/selftests/resctrl/resctrl_tests.c --- part would 
>>> be slightly less if I'd reorder this patch but that only 24 lines off as 
>>> per diffstat of this patch).
>>>
>>> But that's not all.... To be able to push the generalized test framework 
>>> to stable, you need to also count in the benchmark cmd changes which 
>>> worked towards making the call signatures identical. So here's the 
>>> diffstat for that series for quick reference:
>>>
>>>  tools/testing/selftests/resctrl/cache.c       |   5 +-
>>>  tools/testing/selftests/resctrl/cat_test.c    |  13 +--
>>>  tools/testing/selftests/resctrl/cmt_test.c    |  34 ++++--
>>>  tools/testing/selftests/resctrl/mba_test.c    |   4 +-
>>>  tools/testing/selftests/resctrl/mbm_test.c    |   7 +-
>>>  tools/testing/selftests/resctrl/resctrl.h     |  16 +--
>>>  .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++----------
>>>  tools/testing/selftests/resctrl/resctrl_val.c |  10 +-
>>>
>>> That's ~500 lines changed vs ~50 so it's a magnitude worse and much less 
>>> localized.
>>>
>>> And rest assured, I did not like introducing the duplicated calls any more 
>>> than you do (I did not write the generalized test framework for nothing, 
>>> after all) but the way taken in this patch seemed the most reasonable 
>>> option under these circumstances.
>>>
>>
>> hmmm ... I did not expect that a total refactoring would be needed.
>>
>> I was thinking about a change from this:
>>
>>
>> 	testX(...) 
>> 	{
>> 	
>> 		res = signal_handler_register();
>> 		/* error handling */
>> 		res = mount_resctrlfs();
>> 		/* error handling */
>> 		
>> 		/* test */
>>
>> 		unmount_resctrlfs();
>> 		signal_handler_register();
>>
>> 	}
>>
>>
>> to this:
>>
>>
>> 	int test_setup(...)
>> 	{
>> 		res = signal_handler_register();
>> 		/* error handling */
>> 		res = mount_resctrlfs();
>> 		/* error handling */
>> 	}
>>
>>
>> 	void test_cleanup(...)
>> 	{
>> 		unmount_resctrlfs();
>> 		signal_handler_register();
>> 	}
>>
>>
>> 	testX(...)
>> 	{
>>
>> 		res = test_setup(..);
>> 		/* error handling */
>>
>> 		/* test */
>>
>> 		test_cleanup();
>> 	}
>>
>> I expect this to also support the bigger refactoring.
> 
> Okay, I'll do so then.
> 
> However, having already written the generic run_single_test() function 
> that is part of the generic test framework, I definitely don't feel those 
> helpers would be that helpful for it. It more feels like they'd make the 
> flow less obvious by adding those two extra calls there but that's of 
> course matter of taste.

Sounds like there is some room for improvement here, perhaps open coding
the test_setup() and test_cleanup() helpers within run_single_test().
This is purely speculation on my part as I have not seen the code.

Reinette
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-09-14 17:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 11:19 [PATCH 0/5] selftests/resctrl: Fixes to failing tests Ilpo Järvinen
2023-09-11 11:19 ` [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal Ilpo Järvinen
2023-09-12 22:06   ` Reinette Chatre
2023-09-13 10:01     ` Ilpo Järvinen
2023-09-13 20:58       ` Reinette Chatre
2023-09-14 10:16         ` Ilpo Järvinen
2023-09-14 15:04           ` Reinette Chatre
2023-09-14 17:05             ` Ilpo Järvinen
2023-09-14 17:29               ` Reinette Chatre
2023-09-11 11:19 ` [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test Ilpo Järvinen
2023-09-12 22:06   ` Reinette Chatre
2023-09-13 11:11     ` Ilpo Järvinen
2023-09-13 20:58       ` Reinette Chatre
2023-09-14  9:58         ` Ilpo Järvinen
2023-09-14 15:04           ` Reinette Chatre
2023-09-11 11:19 ` [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name Ilpo Järvinen
2023-09-12 22:09   ` Reinette Chatre
2023-09-13 11:02     ` Ilpo Järvinen
2023-09-13 20:59       ` Reinette Chatre
2023-09-14 11:06         ` Ilpo Järvinen
2023-09-11 11:19 ` [PATCH 4/5] selftests/resctrl: Fix feature checks Ilpo Järvinen
2023-09-11 11:19 ` [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests Ilpo Järvinen
2023-09-12 22:10   ` Reinette Chatre
2023-09-13 11:43     ` Ilpo Järvinen
2023-09-13 21:00       ` Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).