linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
@ 2022-06-07 12:05 Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX Kohei Tarumizu
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

This patch series add sysfs interface to control CPU's hardware
prefetch behavior for performance tuning from userspace for the
processor A64FX and x86 (on supported CPU).

Changes from v4:
  - remove core driver
  - fix to use attribute_group instead of attribute (patch 1, 4)
  - fix to store a pointer of CPU number in driver_data when the
    initialization of sysfs (patch 1, 4)
  - move #if in .c file to .h file (patch 3)
  - fix drivers to be loaded automatically (patch 1, 4)
    - for x86, it is tied to the discovery of a specific CPU model
    - for A64FX, it is tied to the discovery of a specific ACPI device
  - add Kconfig description (patch 2, 5)
https://lore.kernel.org/lkml/20220518063032.2377351-1-tarumizu.kohei@fujitsu.com/

[Background]
============
A64FX and some Intel processors have implementation-dependent register
for controlling CPU's hardware prefetch behavior. A64FX has
IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
(MSR_MISC_FEATURE_CONTROL)[2]. These registers cannot be accessed from
userspace.

[1]https://github.com/fujitsu/A64FX/tree/master/doc/
   A64FX_Specification_HPC_Extension_v1_EN.pdf

[2]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    Volume 4

The advantage of using these is improved performance. As an example of
performance improvements, the results of running the Stream benchmark
on the A64FX are described in section [Merit].

For MSR 0x1a4, it is also possible to change the value from userspace
via the MSR driver. However, using MSR driver is not recommended, so
it needs a proper kernel interface[3].

[3]https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/

For these reasons, we provide a new proper kernel interface to control
both IMP_PF_STREAM_DETECT_CTRL_EL0 and MSR 0x1a4.

[Overall design]
================
This driver creates "prefetch_control" directory and some attributes
in every CPU's cache/indexX directory, if CPU supports hardware
prefetch control behavior.

Detailed description of this sysfs interface is in
Documentation/ABI/testing/sysfs-devices-system-cpu (patch6).

This driver needs cache sysfs directory and cache level/type
information. In ARM processor, these information can be obtained
from registers even without ACPI PPTT.
We add processing to create a cache/index directory using only the
information from the register if the machine does not support ACPI
PPTT and CONFIG_ALLOW_INCOMPLETE_CACHE_SYSFS is true in patch3.
This action caused a problem and is described in [Known problem].

[Examples]
==========
This section provides an example of using this sysfs interface at the
x86's model of INTEL_FAM6_BROADWELL_X.

This model has the following register specifications:

[0]    L2 Hardware Prefetcher Disable (R/W)
[1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2]    DCU Hardware Prefetcher Disable (R/W)
[3]    DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

In this case, index0 (L1d cache) corresponds to bit[2,3] and index2
(L2 cache) corresponds to bit [0,1]. A list of attribute files of
index0 and index2 in CPU1 at BROADWELL_X is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

hardware_prefetcher_enable
ip_prefetcher_enable

# ls /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/

adjacent_cache_line_prefetcher_enable
hardware_prefetcher_enable
```

If user would like to disable the setting of "L2 Adjacent Cache Line
Prefetcher Disable (R/W)" in CPU1, do the following:

```
# echo 0 >
# /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/adjacent_cache_line_prefetcher_enable
```

In another example, a list of index0 at A64FX is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

stream_detect_prefetcher_dist
stream_detect_prefetcher_enable
stream_detect_prefetcher_strength
stream_detect_prefetcher_strength_available
```

[Patch organizations]
=====================
This patch series add hardware prefetch control driver for A64FX
and x86.

- patch1: Add hardware prefetch control driver for A64FX

  Adds module init/exit code to create sysfs attributes for A64FX with
  "stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
  and "stream_detect_prefetcher_dist".

- patch2: Add Kconfig/Makefile to build driver for A64FX

- patch3: Create cache sysfs directory without ACPI PPTT for hardware
  prefetch control

  Hardware Prefetch control driver needs cache sysfs directory and cache
  level/type information. In ARM processor, these information can be
  obtained from register(CLIDR_EL1) even without PPTT. Therefore, we
  set the cpu_map_populated to true to create cache sysfs directory, if
  the machine doesn't have PPTT.

- patch4: Add hardware prefetch control driver for x86

  Adds module init/exit code to create sysfs attributes for x86 with
  "hardware_prefetcher_enable", "ip_prefetcher_enable" and
  "adjacent_cache_line_prefetcher_enable".

- patch5: Add Kconfig/Makefile to build driver for x86

- patch6: Add documentation for the new sysfs interface

[Known problem]
===============
- `lscpu` command terminates with -ENOENT because cache/index directory
  is exists but shared_cpu_map file does not exist. This is due to
  patch5, which creates a cache/index directory containing only level
  and type without ACPI PPTT.

[Merit]
=======
For reference, here is the result of STREAM Triad when tuning with
the "s file in L1 and L2 cache on A64FX.

| dist combination  | Pattern A   | Pattern B   |
|-------------------|-------------|-------------|
| L1:256,  L2:1024  | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024  | 279172.8742 | 118979.4542 |
| L1:256,  L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

Best regards,
Kohei Tarumizu

Kohei Tarumizu (6):
  soc: fujitsu: Add hardware prefetch control driver for A64FX
  soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control
    driver
  arm64: Create cache sysfs directory without ACPI PPTT for hardware
    prefetch control
  x86: Add hardware prefetch control driver for x86
  x86: Add Kconfig/Makefile to build hardware prefetch control driver
  docs: ABI: Add sysfs documentation interface of hardware prefetch
    control driver

 .../ABI/testing/sysfs-devices-system-cpu      |  98 ++++
 MAINTAINERS                                   |   6 +
 arch/arm64/kernel/cacheinfo.c                 |  27 +
 arch/x86/Kconfig                              |  17 +
 arch/x86/kernel/cpu/Makefile                  |   2 +
 arch/x86/kernel/cpu/x86-pfctl.c               | 363 +++++++++++++
 drivers/acpi/pptt.c                           |  18 +
 drivers/soc/Kconfig                           |   1 +
 drivers/soc/Makefile                          |   1 +
 drivers/soc/fujitsu/Kconfig                   |  26 +
 drivers/soc/fujitsu/Makefile                  |   2 +
 drivers/soc/fujitsu/a64fx-pfctl.c             | 484 ++++++++++++++++++
 include/linux/acpi.h                          |   5 +
 include/linux/cacheinfo.h                     |  10 +
 14 files changed, 1060 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile
 create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-10 13:20   ` Greg KH
  2022-06-07 12:05 ` [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

Adds module init/exit code to create sysfs attributes for A64FX with
"stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
and "stream_detect_prefetcher_dist".

This driver works only if part number is FUJITSU_CPU_PART_A64FX. The
details of the registers to be read and written in this patch are
described below.

"https://github.com/fujitsu/A64FX/tree/master/doc/"
    A64FX_Specification_HPC_Extension_v1_EN.pdf

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/soc/fujitsu/a64fx-pfctl.c | 484 ++++++++++++++++++++++++++++++
 1 file changed, 484 insertions(+)
 create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c

diff --git a/drivers/soc/fujitsu/a64fx-pfctl.c b/drivers/soc/fujitsu/a64fx-pfctl.c
new file mode 100644
index 000000000000..1bb45a365c0f
--- /dev/null
+++ b/drivers/soc/fujitsu/a64fx-pfctl.c
@@ -0,0 +1,484 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * A64FX Hardware Prefetch Control support
+ */
+#include <asm/cputype.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/smp.h>
+#include <linux/sysfs.h>
+
+/*
+ * Constants for these add the "A64FX_SDPF" prefix to the name described in
+ * section "1.3.4.2. IMP_PF_STREAM_DETECT_CTRL_EL0" of "A64FX specification".
+ * (https://github.com/fujitsu/A64FX/tree/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf")
+ * See this document for register specification details.
+ */
+#define A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0	sys_reg(3, 3, 11, 4, 0)
+#define A64FX_SDPF_V					BIT_ULL(63)
+#define A64FX_SDPF_L1PF_DIS				BIT_ULL(59)
+#define A64FX_SDPF_L2PF_DIS				BIT_ULL(58)
+#define A64FX_SDPF_L1W					BIT_ULL(55)
+#define A64FX_SDPF_L2W					BIT_ULL(54)
+#define A64FX_SDPF_L1_DIST				GENMASK_ULL(27, 24)
+#define A64FX_SDPF_L2_DIST				GENMASK_ULL(19, 16)
+
+#define PFCTL_MIN_L1_DIST				256
+#define PFCTL_MIN_L2_DIST				1024
+#define PFCTL_DIST_AUTO_VAL				0
+#define PFCTL_STRONG_VAL				0
+#define PFCTL_WEAK_VAL					1
+
+/*
+ * Define bitfield access macros for non-constant mask, because macros such as
+ * FIELD_FIT defined in include/linux/bitfield.h require constant mask.
+ */
+#define NC_FIELD_FIT(_mask, _val)					\
+	!((((typeof(_mask))_val) << __bf_shf(_mask)) & ~(_mask))
+
+#define NC_FIELD_PREP(_mask, _val)					\
+	(((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask))
+
+#define NC_FIELD_GET(_mask, _reg)					\
+	((typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)))
+
+struct a64fx_pfctl_attr {
+	struct device_attribute	attr;
+	u64			mask;
+	void			*data;
+};
+
+struct pfctl_group {
+	unsigned int			level;
+	enum cache_type			type;
+	const struct attribute_group	**groups;
+};
+
+enum cpuhp_state hp_online;
+
+static const char strength_strong_string[] = "strong";
+static const char strength_weak_string[] = "weak";
+static const char dist_auto_string[] = "auto";
+
+static inline unsigned int pfctl_dev_get_cpu(struct device *pfctl_dev)
+{
+	return *(u32 *)dev_get_drvdata(pfctl_dev);
+}
+
+/* Enable the value written to the A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0 */
+static inline void enable_verify(u64 *reg)
+{
+	*reg |= NC_FIELD_PREP(A64FX_SDPF_V, 1);
+}
+
+static void _pfctl_read_mask(void *_reg)
+{
+	u64 *reg = (u64 *)_reg;
+
+	*reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+}
+
+static u64 pfctl_read_mask(unsigned int cpu, u64 mask)
+{
+	u64 reg;
+
+	smp_call_function_single(cpu, _pfctl_read_mask, &reg, true);
+
+	return NC_FIELD_GET(mask, reg);
+}
+
+struct write_info {
+	u64 mask;
+	u64 val;
+};
+
+static void _pfctl_write_mask(void *info)
+{
+	struct write_info *winfo = info;
+	u64 reg;
+
+	reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	reg &= ~winfo->mask;
+	reg |= NC_FIELD_PREP(winfo->mask, winfo->val);
+
+	enable_verify(&reg);
+
+	write_sysreg_s(reg, A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+}
+
+static int pfctl_write_mask(unsigned int cpu, u64 mask, u64 val)
+{
+	struct write_info info = {
+		.mask	= mask,
+		.val	= val,
+	};
+
+	if (!NC_FIELD_FIT(mask, val))
+		return -EINVAL;
+
+	smp_call_function_single(cpu, _pfctl_write_mask, &info, true);
+
+	return 0;
+}
+
+static ssize_t
+pfctl_enable_show(struct device *pfctl_dev, struct device_attribute *attr,
+		char *buf)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	bool val;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+
+	val = (bool)pfctl_read_mask(cpu, aa->mask);
+
+	return sysfs_emit(buf, "%d\n", val ? 0 : 1);
+}
+
+static ssize_t
+pfctl_enable_store(struct device *pfctl_dev, struct device_attribute *attr,
+		 const char *buf, size_t size)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	bool val;
+	int ret;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+
+	if (strtobool(buf, &val) < 0)
+		return -EINVAL;
+
+	ret = pfctl_write_mask(cpu, aa->mask, val ? 0 : 1);
+	if (ret < 0)
+		return ret;
+
+	return size;
+}
+
+#define A64FX_PFCTL_ENABLE_ATTR(_level, _mask)				\
+	struct a64fx_pfctl_attr attr_l##_level##_enable = {		\
+		.attr = __ATTR(stream_detect_prefetcher_enable, 0600,	\
+			       pfctl_enable_show, pfctl_enable_store),	\
+		.mask = _mask, }
+
+
+static A64FX_PFCTL_ENABLE_ATTR(1, A64FX_SDPF_L1PF_DIS);
+static A64FX_PFCTL_ENABLE_ATTR(2, A64FX_SDPF_L2PF_DIS);
+
+static ssize_t
+pfctl_dist_show(struct device *pfctl_dev, struct device_attribute *attr,
+		char *buf)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	u64 val, min_dist;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+	min_dist = (u64)aa->data;
+
+	val = pfctl_read_mask(cpu, aa->mask);
+
+	if (val == PFCTL_DIST_AUTO_VAL)
+		return sysfs_emit(buf, "%s\n", dist_auto_string);
+	else
+		return sysfs_emit(buf, "%llu\n", val * min_dist);
+}
+
+static ssize_t
+pfctl_dist_store(struct device *pfctl_dev, struct device_attribute *attr,
+		 const char *buf, size_t size)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	u64 min_dist, val;
+	int ret;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+	min_dist = (u64)aa->data;
+
+	if (sysfs_streq(buf, dist_auto_string)) {
+		val = PFCTL_DIST_AUTO_VAL;
+	} else {
+		ret = kstrtoull(buf, 10, &val);
+		if (ret < 0 || val < 1)
+			return -EINVAL;
+	}
+
+	val = roundup(val, min_dist) / min_dist;
+
+	if (!NC_FIELD_FIT(aa->mask, val))
+		return -EINVAL;
+
+	ret = pfctl_write_mask(cpu, aa->mask, val);
+	if (ret < 0)
+		return ret;
+
+	return size;
+}
+
+#define PFCTL_DIST_ATTR(_level, _mask, _min_dist)			\
+	struct a64fx_pfctl_attr attr_l##_level##_dist = {		\
+		.attr = __ATTR(stream_detect_prefetcher_dist, 0600,	\
+			       pfctl_dist_show, pfctl_dist_store),	\
+		.mask = _mask,						\
+		.data = (void *)(u64)_min_dist, }
+
+static PFCTL_DIST_ATTR(1, A64FX_SDPF_L1_DIST, PFCTL_MIN_L1_DIST);
+static PFCTL_DIST_ATTR(2, A64FX_SDPF_L2_DIST, PFCTL_MIN_L2_DIST);
+
+static ssize_t
+pfctl_strength_show(struct device *pfctl_dev, struct device_attribute *attr,
+		    char *buf)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	u64 val;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+
+	val = pfctl_read_mask(cpu, aa->mask);
+
+	switch (val) {
+	case PFCTL_STRONG_VAL:
+		return sysfs_emit(buf, "%s\n", strength_strong_string);
+	case PFCTL_WEAK_VAL:
+		return sysfs_emit(buf, "%s\n", strength_weak_string);
+	default:
+		return -EINVAL;
+	}
+}
+
+static ssize_t
+pfctl_strength_store(struct device *pfctl_dev, struct device_attribute *attr,
+		     const char *buf, size_t size)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct a64fx_pfctl_attr *aa;
+	u64 val;
+	int ret;
+
+	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
+
+	if (sysfs_streq(buf, strength_strong_string))
+		val = PFCTL_STRONG_VAL;
+	else if (sysfs_streq(buf, strength_weak_string))
+		val = PFCTL_WEAK_VAL;
+	else
+		return -EINVAL;
+
+	ret = pfctl_write_mask(cpu, aa->mask, val);
+	if (ret < 0)
+		return ret;
+
+	return size;
+}
+
+#define PFCTL_STRENGTH_ATTR(_level, _mask)				\
+	struct a64fx_pfctl_attr attr_l##_level##_strength = {		\
+		.attr = __ATTR(stream_detect_prefetcher_strength, 0600, \
+			       pfctl_strength_show,			\
+			       pfctl_strength_store),			\
+		.mask = _mask, }
+
+static PFCTL_STRENGTH_ATTR(1, A64FX_SDPF_L1W);
+static PFCTL_STRENGTH_ATTR(2, A64FX_SDPF_L2W);
+
+static ssize_t
+pfctl_strength_available_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	return sysfs_emit(buf, "%s %s\n", strength_strong_string,
+			  strength_weak_string);
+}
+
+/*
+ * A64FX has same kind of available strength for any caches, so define only one
+ * attribute.
+ */
+struct a64fx_pfctl_attr attr_strength_available = {
+	.attr = __ATTR(stream_detect_prefetcher_strength_available, 0400,
+		       pfctl_strength_available_show, NULL), };
+
+static struct attribute *l1_attrs[] = {
+	&attr_l1_enable.attr.attr,
+	&attr_l1_dist.attr.attr,
+	&attr_l1_strength.attr.attr,
+	&attr_strength_available.attr.attr,
+	NULL,
+};
+
+static struct attribute *l2_attrs[] = {
+	&attr_l2_enable.attr.attr,
+	&attr_l2_dist.attr.attr,
+	&attr_l2_strength.attr.attr,
+	&attr_strength_available.attr.attr,
+	NULL,
+};
+
+static struct attribute_group l1_group = {
+	.attrs = l1_attrs,
+};
+
+static struct attribute_group l2_group = {
+	.attrs = l2_attrs,
+};
+
+static const struct attribute_group *l1_groups[] = {
+	&l1_group,
+	NULL,
+};
+
+static const struct attribute_group *l2_groups[] = {
+	&l2_group,
+	NULL,
+};
+
+static const struct pfctl_group pfctl_groups[] = {
+	{
+		.level = 1,
+		.type = CACHE_TYPE_DATA,
+		.groups = l1_groups,
+	},
+	{
+		.level = 2,
+		.type = CACHE_TYPE_UNIFIED,
+		.groups = l2_groups,
+	},
+	{
+		.groups = NULL,
+	},
+};
+
+static const struct attribute_group **
+get_pfctl_attribute_groups(unsigned int level, enum cache_type type)
+{
+	int i;
+
+	for (i = 0; pfctl_groups[i].groups; i++)
+		if ((level == pfctl_groups[i].level) &&
+		    (type == pfctl_groups[i].type))
+			return pfctl_groups[i].groups;
+
+	return NULL;
+}
+
+static int create_pfctl_attr(struct device *index_dev, void *data)
+{
+	struct cacheinfo *leaf = dev_get_drvdata(index_dev);
+	const struct attribute_group **groups;
+	struct device *pfctl_dev;
+
+	groups = get_pfctl_attribute_groups(leaf->level, leaf->type);
+	if (!groups)
+		return 0;
+
+	pfctl_dev = cpu_device_create(index_dev, data, groups,
+				      "prefetch_control");
+	if (IS_ERR(pfctl_dev))
+		return PTR_ERR(pfctl_dev);
+
+	return 0;
+}
+
+static int remove_pfctl_attr(struct device *index_dev, void *data)
+{
+	struct device *pfctl_dev;
+
+	pfctl_dev = device_find_child_by_name(index_dev, "prefetch_control");
+	if (!pfctl_dev)
+		return 0;
+
+	device_unregister(pfctl_dev);
+	put_device(pfctl_dev);
+
+	return 0;
+}
+
+static int pfctl_online(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct device *cache_dev;
+	int ret;
+
+	cache_dev = device_find_child_by_name(cpu_dev, "cache");
+	if (!cache_dev)
+		return -ENODEV;
+
+	ret = device_for_each_child(cache_dev, &cpu_dev->id, create_pfctl_attr);
+
+	put_device(cache_dev);
+
+	return ret;
+}
+
+static int pfctl_prepare_down(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct device *cache_dev;
+
+	cache_dev = device_find_child_by_name(cpu_dev, "cache");
+	if (!cache_dev)
+		return 0;
+
+	device_for_each_child(cache_dev, NULL, remove_pfctl_attr);
+
+	put_device(cache_dev);
+
+	return 0;
+}
+
+static const struct midr_range pfctl_list[] __initconst = {
+	MIDR_ALL_VERSIONS(MIDR_FUJITSU_A64FX),
+};
+
+/*
+ * This driver returns a negative value if it does not support the Hardware
+ * Prefetch Control or if it is running on a VM guest.
+ */
+static int __init a64fx_pfctl_init(void)
+{
+	int ret;
+
+	if (!is_kernel_in_hyp_mode())
+		return -EINVAL;
+
+	if (!is_midr_in_range_list(read_cpuid_id(), pfctl_list))
+		return -ENODEV;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "soc/a64fx-pfctl:online",
+				pfctl_online, pfctl_prepare_down);
+	if (ret < 0) {
+		pr_err("failed to register hotplug callbacks\n");
+		return ret;
+	}
+
+	hp_online = ret;
+
+	return 0;
+}
+
+static void __exit a64fx_pfctl_exit(void)
+{
+	cpuhp_remove_state(hp_online);
+}
+
+late_initcall(a64fx_pfctl_init);
+module_exit(a64fx_pfctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("A64FX Hardware Prefetch Control Driver");
+
+static const struct acpi_device_id fujitsu_acpi_match[] = {
+	{ "FUJI2006", },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, fujitsu_acpi_match);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-07 14:40   ` Randy Dunlap
  2022-06-07 12:05 ` [PATCH v5 3/6] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

Adds Kconfig/Makefile to build hardware prefetch control driver for
A64FX support, and also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS                  |  5 +++++
 drivers/soc/Kconfig          |  1 +
 drivers/soc/Makefile         |  1 +
 drivers/soc/fujitsu/Kconfig  | 26 ++++++++++++++++++++++++++
 drivers/soc/fujitsu/Makefile |  2 ++
 5 files changed, 35 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index d6d879cb0afd..b3e920a8a044 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8620,6 +8620,11 @@ F:	include/linux/hwmon*.h
 F:	include/trace/events/hwmon*.h
 K:	(devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
 
+HARDWARE PREFETCH CONTROL DRIVERS
+M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
+S:	Maintained
+F:	drivers/soc/fujitsu/a64fx-pfctl.c
+
 HARDWARE RANDOM NUMBER GENERATOR CORE
 M:	Matt Mackall <mpm@selenic.com>
 M:	Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index c5aae42673d3..d87754799d90 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -9,6 +9,7 @@ source "drivers/soc/atmel/Kconfig"
 source "drivers/soc/bcm/Kconfig"
 source "drivers/soc/canaan/Kconfig"
 source "drivers/soc/fsl/Kconfig"
+source "drivers/soc/fujitsu/Kconfig"
 source "drivers/soc/imx/Kconfig"
 source "drivers/soc/ixp4xx/Kconfig"
 source "drivers/soc/litex/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 904eec2a7871..6c8ff1792cda 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SOC_CANAAN)	+= canaan/
 obj-$(CONFIG_ARCH_DOVE)		+= dove/
 obj-$(CONFIG_MACH_DOVE)		+= dove/
 obj-y				+= fsl/
+obj-y				+= fujitsu/
 obj-$(CONFIG_ARCH_GEMINI)	+= gemini/
 obj-y				+= imx/
 obj-y				+= ixp4xx/
diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
new file mode 100644
index 000000000000..6c5990b75cc5
--- /dev/null
+++ b/drivers/soc/fujitsu/Kconfig
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "Fujitsu SoC drivers"
+
+config ALLOW_INCOMPLETE_CACHE_SYSFS
+	bool
+
+config A64FX_HWPF_CONTROL
+	tristate "A64FX Hardware Prefetch Control driver"
+	depends on ARM64
+	select ALLOW_INCOMPLETE_CACHE_SYSFS
+	help
+	  This provides a sysfs interface to control the Hardware Prefetch
+	  behavior for A64FX.
+
+	  A64FX has IMP_PF_STREAM_DETECT_CTRL_EL0, which can control the
+	  hardware prefech behavior. If the processor supports this, the
+	  module can be loaded with the name a64fx-pfctl.
+
+	  Depending on the characteristics of the application, this register
+	  parameters improve or degrade performance.
+
+	  Please see Documentation/ABI/testing/sysfs-devices-system-cpu for
+	  more information.
+
+endmenu
diff --git a/drivers/soc/fujitsu/Makefile b/drivers/soc/fujitsu/Makefile
new file mode 100644
index 000000000000..35e284a548bb
--- /dev/null
+++ b/drivers/soc/fujitsu/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_A64FX_HWPF_CONTROL)	+= a64fx-pfctl.o
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 3/6] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 4/6] x86: Add hardware prefetch control driver for x86 Kohei Tarumizu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

Create a cache sysfs directory without ACPI PPTT if the CPU model is
A64FX and CONFIG_ALLOW_INCOMPLETE_CACHE_SYSFS is true. Currentry,
CONFIG_ALLOW_INCOMPLETE_CACHE_SYSFS is set only when
CONFIG_A64FX_HWPF_CONTROL is enabled.

Hardware prefetch control driver need cache sysfs directory and cache
level/type information. In ARM processor, these information can be
obtained from the register even without PPTT.

This patch set the cpu_map_populated to true if the machine doesn't
have PPTT. It use only the level/type information obtained from
CLIDR_EL1, and don't use CCSIDR information.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 arch/arm64/kernel/cacheinfo.c | 27 +++++++++++++++++++++++++++
 drivers/acpi/pptt.c           | 18 ++++++++++++++++++
 include/linux/acpi.h          |  5 +++++
 include/linux/cacheinfo.h     | 10 ++++++++++
 4 files changed, 60 insertions(+)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 587543c6c51c..3b337a425acb 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -43,6 +43,30 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+static const struct midr_range allow_list[] = {
+	MIDR_ALL_VERSIONS(MIDR_FUJITSU_A64FX),
+};
+
+/*
+ * This function works only for FUJITSU A64FX processor.
+ * If CONFIG_ALLOW_INCOMPLETE_CACHE_SYSFS is not defined, do nothing.
+ *
+ * Hardware prefetch functions need cache sysfs directory and cache
+ * level/type information. In ARM processor, these information can be
+ * obtained from registers even without PPTT. Therefore, we set the
+ * cpu_map_populated to true to create cache sysfs directory, if the
+ * machine doesn't have PPTT.
+ */
+static void allow_incomplete_cache_sysfs(struct cpu_cacheinfo *cpu_ci)
+{
+	if (!is_midr_in_range_list(read_cpuid_id(), allow_list))
+		return;
+
+	if (!acpi_disabled)
+		if (!acpi_has_pptt())
+			enable_cpu_map_populated(cpu_ci);
+}
+
 int init_cache_level(unsigned int cpu)
 {
 	unsigned int ctype, level, leaves, fw_level;
@@ -95,5 +119,8 @@ int populate_cache_leaves(unsigned int cpu)
 			ci_leaf_init(this_leaf++, type, level);
 		}
 	}
+
+	allow_incomplete_cache_sysfs(this_cpu_ci);
+
 	return 0;
 }
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 701f61c01359..fcd7295b81d7 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -838,3 +838,21 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * acpi_has_pptt() - Determine if ACPI has PPTT table or not
+ *
+ * Return: true if ACPI has PPTT, false if ACPI doesn't have PPTT.
+ */
+bool acpi_has_pptt(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	acpi_put_table(table);
+	return true;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index d7136d13aa44..9daea11edad7 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1391,6 +1391,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 int find_acpi_cpu_cache_topology(unsigned int cpu, int level);
+bool acpi_has_pptt(void);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1416,6 +1417,10 @@ static inline int find_acpi_cpu_cache_topology(unsigned int cpu, int level)
 {
 	return -EINVAL;
 }
+static inline bool acpi_has_pptt(void)
+{
+	return false;
+}
 #endif
 
 #ifdef CONFIG_ACPI_PCC
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 4ff37cb763ae..71f89da6792b 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -99,6 +99,16 @@ static inline int acpi_find_last_cache_level(unsigned int cpu)
 int acpi_find_last_cache_level(unsigned int cpu);
 #endif
 
+#ifndef CONFIG_ALLOW_INCOMPLETE_CACHE_SYSFS
+#define enable_cpu_map_populated(cpu_ci)
+#else
+static inline void enable_cpu_map_populated(struct cpu_cacheinfo *cpu_ci)
+{
+	cpu_ci->cpu_map_populated = true;
+}
+#endif
+
+
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 4/6] x86: Add hardware prefetch control driver for x86
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (2 preceding siblings ...)
  2022-06-07 12:05 ` [PATCH v5 3/6] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-07 12:05 ` [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

Adds module init/exit code to create sysfs attributes for x86 with
"hardware_prefetcher_enable", "ip_prefetcher_enable" and
"adjacent_cache_line_prefetcher_enable".

This driver works only if a CPU model is mapped to type of register
specification(e.g. TYPE_L12_BASE) in pfctl_match[].

The details of the registers(MSR_MISC_FEATURE_CONTROL) to be read and
written in this patch are described below:

"https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html"
    Volume 4

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 arch/x86/kernel/cpu/x86-pfctl.c | 363 ++++++++++++++++++++++++++++++++
 1 file changed, 363 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c

diff --git a/arch/x86/kernel/cpu/x86-pfctl.c b/arch/x86/kernel/cpu/x86-pfctl.c
new file mode 100644
index 000000000000..154e927d092c
--- /dev/null
+++ b/arch/x86/kernel/cpu/x86-pfctl.c
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * x86 Hardware Prefetch Control support
+ */
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/sysfs.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
+
+/*
+ * MSR_MISC_FEATURE_CONTROL has three type of register specifications.
+ *
+ * The register specification of TYPE_L12_BASE is as follow:
+ * [0]    L2 Hardware Prefetcher Disable (R/W)
+ * [1]    Reserved
+ * [2]    DCU Hardware Prefetcher Disable (R/W)
+ * [63:3] Reserved
+ *
+ * The register specification of TYPE_L12_PLUS is as follow:
+ * [0]    L2 Hardware Prefetcher Disable (R/W)
+ * [1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
+ * [2]    DCU Hardware Prefetcher Disable (R/W)
+ * [3]    DCU IP Prefetcher Disable (R/W)
+ * [63:4] Reserved
+ *
+ * The register specification of TYPE_L12_XPHI is as follow:
+ * [0]    L2 Hardware Prefetcher Disable (R/W)
+ * [1]    DCU Hardware Prefetcher Disable (R/W)
+ * [63:2] Reserved
+ *
+ * See "Intel 64 and IA-32 Architectures Software Developer's Manual"
+ * (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
+ * for register specification details.
+ */
+enum {
+	TYPE_L12_BASE,
+	TYPE_L12_PLUS,
+	TYPE_L12_XPHI,
+};
+
+struct x86_pfctl_attr {
+	struct device_attribute		attr;
+	u64				mask;
+};
+
+struct pfctl_group {
+	unsigned int			level;
+	enum cache_type			type;
+	const struct attribute_group	**groups;
+};
+
+enum cpuhp_state hp_online;
+
+static inline unsigned int pfctl_dev_get_cpu(struct device *pfctl_dev)
+{
+	return *(u32 *)dev_get_drvdata(pfctl_dev);
+}
+
+static ssize_t
+pfctl_show(struct device *pfctl_dev, struct device_attribute *attr, char *buf)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct x86_pfctl_attr *xa;
+	u64 val;
+
+	xa = container_of(attr, struct x86_pfctl_attr, attr);
+
+	rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, &val);
+	return sysfs_emit(buf, "%d\n", val & xa->mask ? 0 : 1);
+}
+
+struct write_info {
+	u64 mask;
+	bool enable;
+};
+
+/*
+ * wrmsrl() in this patch is only done inside of an interrupt-disabled region
+ * to avoid a conflict of write access from other drivers,
+ */
+static void pfctl_write(void *info)
+{
+	struct write_info *winfo = info;
+	u64 reg;
+
+	reg = 0;
+	rdmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+
+	if (winfo->enable)
+		reg &= ~winfo->mask;
+	else
+		reg |= winfo->mask;
+
+	wrmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+}
+
+/*
+ * MSR_MISC_FEATURE_CONTROL has "core" scope, so define the lock to avoid a
+ * conflict of write access from different logical processors in the same core.
+ */
+static DEFINE_MUTEX(pfctl_mutex);
+
+static ssize_t
+pfctl_store(struct device *pfctl_dev, struct device_attribute *attr,
+	    const char *buf, size_t size)
+{
+	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
+	struct x86_pfctl_attr *xa;
+	struct write_info info;
+
+	xa = container_of(attr, struct x86_pfctl_attr, attr);
+	info.mask = xa->mask;
+
+	if (strtobool(buf, &info.enable) < 0)
+		return -EINVAL;
+
+	mutex_lock(&pfctl_mutex);
+	smp_call_function_single(cpu, pfctl_write, &info, true);
+	mutex_unlock(&pfctl_mutex);
+
+	return size;
+}
+
+#define PFCTL_ATTR(_name, _level, _bit)					\
+	struct x86_pfctl_attr attr_l##_level##_##_name = {		\
+		.attr = __ATTR(_name, 0600, pfctl_show, pfctl_store),	\
+		.mask = BIT_ULL(_bit), }
+
+static PFCTL_ATTR(hardware_prefetcher_enable,			1, 2);
+static PFCTL_ATTR(hardware_prefetcher_enable,			2, 0);
+static PFCTL_ATTR(ip_prefetcher_enable,				1, 3);
+static PFCTL_ATTR(adjacent_cache_line_prefetcher_enable,	2, 1);
+
+static struct attribute *l1_attrs[] = {
+	&attr_l1_hardware_prefetcher_enable.attr.attr,
+	&attr_l1_ip_prefetcher_enable.attr.attr,
+	NULL,
+};
+
+static struct attribute *l2_attrs[] = {
+	&attr_l2_hardware_prefetcher_enable.attr.attr,
+	&attr_l2_adjacent_cache_line_prefetcher_enable.attr.attr,
+	NULL,
+};
+
+static struct attribute_group l1_group = {
+	.attrs = l1_attrs,
+};
+
+static struct attribute_group l2_group = {
+	.attrs = l2_attrs,
+};
+
+static const struct attribute_group *l1_groups[] = {
+	&l1_group,
+	NULL,
+};
+
+static const struct attribute_group *l2_groups[] = {
+	&l2_group,
+	NULL,
+};
+
+static const struct pfctl_group pfctl_groups[] = {
+	{
+		.level = 1,
+		.type = CACHE_TYPE_DATA,
+		.groups = l1_groups,
+	},
+	{
+		.level = 2,
+		.type = CACHE_TYPE_UNIFIED,
+		.groups = l2_groups,
+	},
+	{
+		.groups = NULL,
+	},
+};
+
+static const struct attribute_group **
+get_pfctl_attribute_groups(unsigned int level, enum cache_type type)
+{
+	int i;
+
+	for (i = 0; pfctl_groups[i].groups; i++)
+		if ((level == pfctl_groups[i].level) &&
+		    (type == pfctl_groups[i].type))
+			return pfctl_groups[i].groups;
+
+	return NULL;
+}
+
+static int remove_pfctl_attr(struct device *index_dev, void *data)
+{
+	struct device *pfctl_dev;
+
+	pfctl_dev = device_find_child_by_name(index_dev, "prefetch_control");
+	if (!pfctl_dev)
+		return 0;
+
+	device_unregister(pfctl_dev);
+	put_device(pfctl_dev);
+
+	return 0;
+}
+
+static int create_pfctl_attr(struct device *index_dev, void *data)
+{
+	struct cacheinfo *leaf = dev_get_drvdata(index_dev);
+	const struct attribute_group **groups;
+	struct device *pfctl_dev;
+
+	groups = get_pfctl_attribute_groups(leaf->level, leaf->type);
+	if (!groups)
+		return 0;
+
+	pfctl_dev = cpu_device_create(index_dev, data, groups,
+				      "prefetch_control");
+	if (IS_ERR(pfctl_dev))
+		return PTR_ERR(pfctl_dev);
+
+	return 0;
+}
+
+static int pfctl_online(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct device *cache_dev;
+	int ret;
+
+	cache_dev = device_find_child_by_name(cpu_dev, "cache");
+	if (!cache_dev)
+		return -ENODEV;
+
+	ret = device_for_each_child(cache_dev, &cpu_dev->id, create_pfctl_attr);
+
+	put_device(cache_dev);
+
+	return ret;
+}
+
+static int pfctl_prepare_down(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct device *cache_dev;
+
+	cache_dev = device_find_child_by_name(cpu_dev, "cache");
+	if (!cache_dev)
+		return 0;
+
+	device_for_each_child(cache_dev, NULL, remove_pfctl_attr);
+
+	put_device(cache_dev);
+
+	return 0;
+}
+
+/*
+ * Only BROADWELL_X has been tested in the actual machine at this point. Other
+ * models were defined based on the information in the "Intel 64 and IA-32
+ * Architectures Software Developer's Manual"
+ */
+static const struct x86_cpu_id pfctl_match[] __initconst = {
+	X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT_D,	TYPE_L12_BASE),
+	X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT,	TYPE_L12_BASE),
+	X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_PLUS,	TYPE_L12_BASE),
+	X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,	TYPE_L12_BASE),
+	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_G,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EX,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,	TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE_X,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(HASWELL,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(HASWELL_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(HASWELL_G,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_G,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(CANNONLAKE_L,	TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE,		TYPE_L12_PLUS),
+	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,	TYPE_L12_XPHI),
+	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,	TYPE_L12_XPHI),
+	{},
+};
+MODULE_DEVICE_TABLE(x86cpu, pfctl_match);
+
+static int __init x86_pfctl_init(void)
+{
+	const struct x86_cpu_id *m;
+	int ret;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return -ENODEV;
+
+	m = x86_match_cpu(pfctl_match);
+	if (!m)
+		return -ENODEV;
+
+	switch (m->driver_data) {
+	case TYPE_L12_BASE:
+		l1_attrs[1] = NULL;
+		l2_attrs[1] = NULL;
+		break;
+	case TYPE_L12_PLUS:
+		break;
+	case TYPE_L12_XPHI:
+		attr_l1_hardware_prefetcher_enable.mask = BIT_ULL(1);
+		l1_attrs[1] = NULL;
+		l2_attrs[1] = NULL;
+		break;
+	default:
+		return -ENODEV;
+	};
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/x86-pfctl:online",
+				pfctl_online, pfctl_prepare_down);
+	if (ret < 0) {
+		pr_err("failed to register hotplug callbacks\n");
+		return ret;
+	}
+
+	hp_online = ret;
+
+	return 0;
+}
+
+static void __exit x86_pfctl_exit(void)
+{
+	cpuhp_remove_state(hp_online);
+}
+
+late_initcall(x86_pfctl_init);
+module_exit(x86_pfctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("x86 Hardware Prefetch Control Driver");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (3 preceding siblings ...)
  2022-06-07 12:05 ` [PATCH v5 4/6] x86: Add hardware prefetch control driver for x86 Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-07 14:40   ` Randy Dunlap
  2022-06-07 12:05 ` [PATCH v5 6/6] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

This adds Kconfig/Makefile to build hardware prefetch control driver
for x86 support. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS                  |  1 +
 arch/x86/Kconfig             | 17 +++++++++++++++++
 arch/x86/kernel/cpu/Makefile |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b3e920a8a044..4f44bbef2614 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8623,6 +8623,7 @@ K:	(devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
 HARDWARE PREFETCH CONTROL DRIVERS
 M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
 S:	Maintained
+F:	arch/x86/kernel/cpu/x86-pfctl.c
 F:	drivers/soc/fujitsu/a64fx-pfctl.c
 
 HARDWARE RANDOM NUMBER GENERATOR CORE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4bed3abf444d..3ee173483f9f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1359,6 +1359,23 @@ config X86_CPUID
 	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
 	  /dev/cpu/31/cpuid.
 
+config X86_HWPF_CONTROL
+	tristate "x86 Hardware Prefetch Control support"
+	depends on X86_64
+	help
+	  This provides a sysfs interface to control the Hardware Prefetch
+	  behavior for X86.
+
+	  Some Intel processors have MSR 0x1a4 (MSR_MISC_FEATURE_CONTROL),
+	  which can control the hardware prefech behavior. If the processor
+	  supports this, the module can be loaded with the name x86-pfctl.
+
+	  Depending on the characteristics of the application, this register
+	  parameters improve or degrade performance.
+
+	  Please see Documentation/ABI/testing/sysfs-devices-system-cpu for
+	  more information.
+
 choice
 	prompt "High Memory Support"
 	default HIGHMEM4G
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be..1aa13dad17a3 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -56,6 +56,8 @@ obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr-watchdog.o
 obj-$(CONFIG_HYPERVISOR_GUEST)		+= vmware.o hypervisor.o mshyperv.o
 obj-$(CONFIG_ACRN_GUEST)		+= acrn.o
 
+obj-$(CONFIG_X86_HWPF_CONTROL)		+= x86-pfctl.o
+
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 6/6] docs: ABI: Add sysfs documentation interface of hardware prefetch control driver
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (4 preceding siblings ...)
  2022-06-07 12:05 ` [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-06-07 12:05 ` Kohei Tarumizu
  2022-06-10 13:07 ` [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Greg KH
  2022-06-10 13:48 ` Linus Walleij
  7 siblings, 0 replies; 26+ messages in thread
From: Kohei Tarumizu @ 2022-06-07 12:05 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi
  Cc: tarumizu.kohei

This describes the sysfs interface implemented by the hardware prefetch
control driver.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 98 +++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2ad01cad7f1c..0da4c1bac51e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -688,3 +688,101 @@ Description:
 		(RO) the list of CPUs that are isolated and don't
 		participate in load balancing. These CPUs are set by
 		boot parameter "isolcpus=".
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/hardware_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/ip_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/adjacent_cache_line_prefetcher_enable
+Date:		March 2022
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Parameters for some Intel CPU's hardware prefetch control
+
+		This sysfs interface provides Hardware Prefetch control
+		attribute for some Intel processors. Attributes are only
+		present if the particular cache implements the relevant
+		prefetcher controls.
+
+		*_prefetcher_enable:
+		    (RW) control this prefetcher's enablement state.
+		    Read returns current status:
+			0: this prefetcher is disabled
+			1: this prefetcher is enabled
+
+		- Attribute mapping
+
+		    Some Intel processors have MSR 0x1a4. This register has several
+		    specifications depending on the model. This interface provides
+		    a one-to-one attribute file to control all the tunable
+		    parameters the CPU provides of the following.
+
+			- "* Hardware Prefetcher Disable (R/W)"
+			    corresponds to the "hardware_prefetcher_enable"
+
+			- "* Adjacent Cache Line Prefetcher Disable (R/W)"
+			    corresponds to the "adjacent_cache_line_prefetcher_enable"
+
+			- "* IP Prefetcher Disable (R/W)"
+			    corresponds to the "ip_prefetcher_enable"
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength_available
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_dist
+Date:		March 2022
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Parameters for A64FX's hardware prefetch control
+
+		This sysfs interface provides Hardware Prefetch control
+		attribute for the processor A64FX. Attributes are only
+		present if the particular cache implements the relevant
+		prefetcher controls.
+
+		stream_detect_prefetcher_enable:
+		    (RW) control the prefetcher's enablement state.
+		    Read returns current status:
+			0: this prefetcher is disabled
+			1: this prefetcher is enabled
+
+		stream_detect_prefetcher_strength:
+		    (RW) control the prefetcher operation's strongness state.
+		    Read returns current status:
+			weak: prefetch operation is weak
+			strong: prefetch operation is strong
+
+		    Strong prefetch operation is surely executed, if there is
+		    no corresponding data in cache.
+		    Weak prefetch operation allows the hardware not to execute
+		    operation depending on hardware state.
+
+
+		stream_detect_prefetcher_strength_available:
+		    (RO) displays a space separated list of available strongness
+		    state.
+
+		stream_detect_prefetcher_dist:
+		    (RW) control the prefetcher distance value.
+		    Read return current prefetcher distance value in bytes
+		    or the string "auto".
+
+		    Write either a value in byte or the string "auto" to this
+		    parameter. If you write a value less than multiples of a
+		    specific value, it is rounded up.
+
+		    The string "auto" have a special meaning. This means that
+		    instead of setting dist to a user-specified value, it
+		    operates using hardware-specific values.
+
+		- Attribute mapping
+
+		    The processor A64FX has register IMP_PF_STREAM_DETECT_CTRL_EL0
+		    for Hardware Prefetch Control. This attribute maps each
+		    specification to the following.
+
+			- "L*PF_DIS": enablement of hardware prefetcher
+			    corresponds to the "stream_detect_prefetcher_enable"
+
+			- "L*W": strongness of hardware prefetcher
+			    corresponds to "stream_detect_prefetcher_strength"
+			    and "stream_detect_prefetcher_strength_available"
+
+			- "L*_DIST": distance of hardware prefetcher
+			    corresponds to the "stream_detect_prefetcher_dist"
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-06-07 12:05 ` [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-06-07 14:40   ` Randy Dunlap
  2022-06-08  6:11     ` tarumizu.kohei
  0 siblings, 1 reply; 26+ messages in thread
From: Randy Dunlap @ 2022-06-07 14:40 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, lenb, gregkh, mchehab+huawei,
	eugenis, tony.luck, pcc, peterz, marcos, marcan, linus.walleij,
	nicolas.ferre, conor.dooley, arnd, ast, peter.chen, kuba,
	linux-kernel, linux-arm-kernel, linux-acpi



On 6/7/22 05:05, Kohei Tarumizu wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4bed3abf444d..3ee173483f9f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1359,6 +1359,23 @@ config X86_CPUID
>  	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
>  	  /dev/cpu/31/cpuid.
>  
> +config X86_HWPF_CONTROL
> +	tristate "x86 Hardware Prefetch Control support"
> +	depends on X86_64
> +	help
> +	  This provides a sysfs interface to control the Hardware Prefetch
> +	  behavior for X86.
> +
> +	  Some Intel processors have MSR 0x1a4 (MSR_MISC_FEATURE_CONTROL),
> +	  which can control the hardware prefech behavior. If the processor

	                                 prefetch

> +	  supports this, the module can be loaded with the name x86-pfctl.
> +
> +	  Depending on the characteristics of the application, this register
> +	  parameters improve or degrade performance.
> +
> +	  Please see Documentation/ABI/testing/sysfs-devices-system-cpu for
> +	  more information.
> +

-- 
~Randy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-06-07 12:05 ` [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-06-07 14:40   ` Randy Dunlap
  0 siblings, 0 replies; 26+ messages in thread
From: Randy Dunlap @ 2022-06-07 14:40 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, lenb, gregkh, mchehab+huawei,
	eugenis, tony.luck, pcc, peterz, marcos, marcan, linus.walleij,
	nicolas.ferre, conor.dooley, arnd, ast, peter.chen, kuba,
	linux-kernel, linux-arm-kernel, linux-acpi



On 6/7/22 05:05, Kohei Tarumizu wrote:
> diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
> new file mode 100644
> index 000000000000..6c5990b75cc5
> --- /dev/null
> +++ b/drivers/soc/fujitsu/Kconfig
> @@ -0,0 +1,26 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +menu "Fujitsu SoC drivers"
> +
> +config ALLOW_INCOMPLETE_CACHE_SYSFS
> +	bool
> +
> +config A64FX_HWPF_CONTROL
> +	tristate "A64FX Hardware Prefetch Control driver"
> +	depends on ARM64
> +	select ALLOW_INCOMPLETE_CACHE_SYSFS
> +	help
> +	  This provides a sysfs interface to control the Hardware Prefetch
> +	  behavior for A64FX.
> +
> +	  A64FX has IMP_PF_STREAM_DETECT_CTRL_EL0, which can control the
> +	  hardware prefech behavior. If the processor supports this, the

	           prefetch

> +	  module can be loaded with the name a64fx-pfctl.
> +
> +	  Depending on the characteristics of the application, this register
> +	  parameters improve or degrade performance.
> +
> +	  Please see Documentation/ABI/testing/sysfs-devices-system-cpu for
> +	  more information.

-- 
~Randy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-06-07 14:40   ` Randy Dunlap
@ 2022-06-08  6:11     ` tarumizu.kohei
  0 siblings, 0 replies; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-08  6:11 UTC (permalink / raw)
  To: 'Randy Dunlap',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, linus.walleij, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi

> > +	  which can control the hardware prefech behavior. If the processor
> 
> 	                                 prefetch
> 

I'll fix it, thank you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (5 preceding siblings ...)
  2022-06-07 12:05 ` [PATCH v5 6/6] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
@ 2022-06-10 13:07 ` Greg KH
  2022-06-14 11:55   ` tarumizu.kohei
  2022-06-10 13:48 ` Linus Walleij
  7 siblings, 1 reply; 26+ messages in thread
From: Greg KH @ 2022-06-10 13:07 UTC (permalink / raw)
  To: Kohei Tarumizu
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc, peterz,
	marcos, marcan, linus.walleij, nicolas.ferre, conor.dooley, arnd,
	ast, peter.chen, kuba, linux-kernel, linux-arm-kernel,
	linux-acpi

On Tue, Jun 07, 2022 at 09:05:24PM +0900, Kohei Tarumizu wrote:
> This patch series add sysfs interface to control CPU's hardware
> prefetch behavior for performance tuning from userspace for the
> processor A64FX and x86 (on supported CPU).

Why does userspace want to even do this?

How will they do this?

What programs will do this?

And why isn't just automatic and why does this hardware require manual
intervention to work properly?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX
  2022-06-07 12:05 ` [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX Kohei Tarumizu
@ 2022-06-10 13:20   ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2022-06-10 13:20 UTC (permalink / raw)
  To: Kohei Tarumizu
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc, peterz,
	marcos, marcan, linus.walleij, nicolas.ferre, conor.dooley, arnd,
	ast, peter.chen, kuba, linux-kernel, linux-arm-kernel,
	linux-acpi

On Tue, Jun 07, 2022 at 09:05:25PM +0900, Kohei Tarumizu wrote:
> Adds module init/exit code to create sysfs attributes for A64FX with
> "stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
> and "stream_detect_prefetcher_dist".
> 
> This driver works only if part number is FUJITSU_CPU_PART_A64FX. The
> details of the registers to be read and written in this patch are
> described below.
> 
> "https://github.com/fujitsu/A64FX/tree/master/doc/"
>     A64FX_Specification_HPC_Extension_v1_EN.pdf
> 
> Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
> ---
>  drivers/soc/fujitsu/a64fx-pfctl.c | 484 ++++++++++++++++++++++++++++++
>  1 file changed, 484 insertions(+)
>  create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c
> 
> diff --git a/drivers/soc/fujitsu/a64fx-pfctl.c b/drivers/soc/fujitsu/a64fx-pfctl.c
> new file mode 100644
> index 000000000000..1bb45a365c0f
> --- /dev/null
> +++ b/drivers/soc/fujitsu/a64fx-pfctl.c
> @@ -0,0 +1,484 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2022 FUJITSU LIMITED
> + *
> + * A64FX Hardware Prefetch Control support
> + */
> +#include <asm/cputype.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/smp.h>
> +#include <linux/sysfs.h>
> +
> +/*
> + * Constants for these add the "A64FX_SDPF" prefix to the name described in
> + * section "1.3.4.2. IMP_PF_STREAM_DETECT_CTRL_EL0" of "A64FX specification".
> + * (https://github.com/fujitsu/A64FX/tree/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf")
> + * See this document for register specification details.
> + */
> +#define A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0	sys_reg(3, 3, 11, 4, 0)
> +#define A64FX_SDPF_V					BIT_ULL(63)
> +#define A64FX_SDPF_L1PF_DIS				BIT_ULL(59)
> +#define A64FX_SDPF_L2PF_DIS				BIT_ULL(58)
> +#define A64FX_SDPF_L1W					BIT_ULL(55)
> +#define A64FX_SDPF_L2W					BIT_ULL(54)
> +#define A64FX_SDPF_L1_DIST				GENMASK_ULL(27, 24)
> +#define A64FX_SDPF_L2_DIST				GENMASK_ULL(19, 16)
> +
> +#define PFCTL_MIN_L1_DIST				256
> +#define PFCTL_MIN_L2_DIST				1024
> +#define PFCTL_DIST_AUTO_VAL				0
> +#define PFCTL_STRONG_VAL				0
> +#define PFCTL_WEAK_VAL					1
> +
> +/*
> + * Define bitfield access macros for non-constant mask, because macros such as
> + * FIELD_FIT defined in include/linux/bitfield.h require constant mask.
> + */
> +#define NC_FIELD_FIT(_mask, _val)					\
> +	!((((typeof(_mask))_val) << __bf_shf(_mask)) & ~(_mask))
> +
> +#define NC_FIELD_PREP(_mask, _val)					\
> +	(((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask))
> +
> +#define NC_FIELD_GET(_mask, _reg)					\
> +	((typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask)))
> +
> +struct a64fx_pfctl_attr {
> +	struct device_attribute	attr;
> +	u64			mask;
> +	void			*data;
> +};
> +
> +struct pfctl_group {
> +	unsigned int			level;
> +	enum cache_type			type;
> +	const struct attribute_group	**groups;
> +};
> +
> +enum cpuhp_state hp_online;
> +
> +static const char strength_strong_string[] = "strong";
> +static const char strength_weak_string[] = "weak";
> +static const char dist_auto_string[] = "auto";
> +
> +static inline unsigned int pfctl_dev_get_cpu(struct device *pfctl_dev)
> +{
> +	return *(u32 *)dev_get_drvdata(pfctl_dev);
> +}
> +
> +/* Enable the value written to the A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0 */
> +static inline void enable_verify(u64 *reg)
> +{
> +	*reg |= NC_FIELD_PREP(A64FX_SDPF_V, 1);
> +}
> +
> +static void _pfctl_read_mask(void *_reg)
> +{
> +	u64 *reg = (u64 *)_reg;
> +
> +	*reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
> +}
> +
> +static u64 pfctl_read_mask(unsigned int cpu, u64 mask)
> +{
> +	u64 reg;
> +
> +	smp_call_function_single(cpu, _pfctl_read_mask, &reg, true);
> +
> +	return NC_FIELD_GET(mask, reg);
> +}
> +
> +struct write_info {
> +	u64 mask;
> +	u64 val;
> +};
> +
> +static void _pfctl_write_mask(void *info)
> +{
> +	struct write_info *winfo = info;
> +	u64 reg;
> +
> +	reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
> +
> +	reg &= ~winfo->mask;
> +	reg |= NC_FIELD_PREP(winfo->mask, winfo->val);
> +
> +	enable_verify(&reg);
> +
> +	write_sysreg_s(reg, A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
> +}
> +
> +static int pfctl_write_mask(unsigned int cpu, u64 mask, u64 val)
> +{
> +	struct write_info info = {
> +		.mask	= mask,
> +		.val	= val,
> +	};
> +
> +	if (!NC_FIELD_FIT(mask, val))
> +		return -EINVAL;
> +
> +	smp_call_function_single(cpu, _pfctl_write_mask, &info, true);
> +
> +	return 0;
> +}
> +
> +static ssize_t
> +pfctl_enable_show(struct device *pfctl_dev, struct device_attribute *attr,
> +		char *buf)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	bool val;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +
> +	val = (bool)pfctl_read_mask(cpu, aa->mask);
> +
> +	return sysfs_emit(buf, "%d\n", val ? 0 : 1);
> +}
> +
> +static ssize_t
> +pfctl_enable_store(struct device *pfctl_dev, struct device_attribute *attr,
> +		 const char *buf, size_t size)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	bool val;
> +	int ret;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +
> +	if (strtobool(buf, &val) < 0)
> +		return -EINVAL;
> +
> +	ret = pfctl_write_mask(cpu, aa->mask, val ? 0 : 1);
> +	if (ret < 0)
> +		return ret;
> +
> +	return size;
> +}
> +
> +#define A64FX_PFCTL_ENABLE_ATTR(_level, _mask)				\
> +	struct a64fx_pfctl_attr attr_l##_level##_enable = {		\
> +		.attr = __ATTR(stream_detect_prefetcher_enable, 0600,	\
> +			       pfctl_enable_show, pfctl_enable_store),	\
> +		.mask = _mask, }
> +
> +
> +static A64FX_PFCTL_ENABLE_ATTR(1, A64FX_SDPF_L1PF_DIS);
> +static A64FX_PFCTL_ENABLE_ATTR(2, A64FX_SDPF_L2PF_DIS);
> +
> +static ssize_t
> +pfctl_dist_show(struct device *pfctl_dev, struct device_attribute *attr,
> +		char *buf)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	u64 val, min_dist;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +	min_dist = (u64)aa->data;
> +
> +	val = pfctl_read_mask(cpu, aa->mask);
> +
> +	if (val == PFCTL_DIST_AUTO_VAL)
> +		return sysfs_emit(buf, "%s\n", dist_auto_string);
> +	else
> +		return sysfs_emit(buf, "%llu\n", val * min_dist);
> +}
> +
> +static ssize_t
> +pfctl_dist_store(struct device *pfctl_dev, struct device_attribute *attr,
> +		 const char *buf, size_t size)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	u64 min_dist, val;
> +	int ret;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +	min_dist = (u64)aa->data;
> +
> +	if (sysfs_streq(buf, dist_auto_string)) {
> +		val = PFCTL_DIST_AUTO_VAL;
> +	} else {
> +		ret = kstrtoull(buf, 10, &val);
> +		if (ret < 0 || val < 1)
> +			return -EINVAL;
> +	}
> +
> +	val = roundup(val, min_dist) / min_dist;
> +
> +	if (!NC_FIELD_FIT(aa->mask, val))
> +		return -EINVAL;
> +
> +	ret = pfctl_write_mask(cpu, aa->mask, val);
> +	if (ret < 0)
> +		return ret;
> +
> +	return size;
> +}
> +
> +#define PFCTL_DIST_ATTR(_level, _mask, _min_dist)			\
> +	struct a64fx_pfctl_attr attr_l##_level##_dist = {		\
> +		.attr = __ATTR(stream_detect_prefetcher_dist, 0600,	\
> +			       pfctl_dist_show, pfctl_dist_store),	\
> +		.mask = _mask,						\
> +		.data = (void *)(u64)_min_dist, }
> +
> +static PFCTL_DIST_ATTR(1, A64FX_SDPF_L1_DIST, PFCTL_MIN_L1_DIST);
> +static PFCTL_DIST_ATTR(2, A64FX_SDPF_L2_DIST, PFCTL_MIN_L2_DIST);
> +
> +static ssize_t
> +pfctl_strength_show(struct device *pfctl_dev, struct device_attribute *attr,
> +		    char *buf)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	u64 val;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +
> +	val = pfctl_read_mask(cpu, aa->mask);
> +
> +	switch (val) {
> +	case PFCTL_STRONG_VAL:
> +		return sysfs_emit(buf, "%s\n", strength_strong_string);
> +	case PFCTL_WEAK_VAL:
> +		return sysfs_emit(buf, "%s\n", strength_weak_string);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static ssize_t
> +pfctl_strength_store(struct device *pfctl_dev, struct device_attribute *attr,
> +		     const char *buf, size_t size)
> +{
> +	unsigned int cpu = pfctl_dev_get_cpu(pfctl_dev);
> +	struct a64fx_pfctl_attr *aa;
> +	u64 val;
> +	int ret;
> +
> +	aa = container_of(attr, struct a64fx_pfctl_attr, attr);
> +
> +	if (sysfs_streq(buf, strength_strong_string))
> +		val = PFCTL_STRONG_VAL;
> +	else if (sysfs_streq(buf, strength_weak_string))
> +		val = PFCTL_WEAK_VAL;
> +	else
> +		return -EINVAL;
> +
> +	ret = pfctl_write_mask(cpu, aa->mask, val);
> +	if (ret < 0)
> +		return ret;
> +
> +	return size;
> +}
> +
> +#define PFCTL_STRENGTH_ATTR(_level, _mask)				\
> +	struct a64fx_pfctl_attr attr_l##_level##_strength = {		\
> +		.attr = __ATTR(stream_detect_prefetcher_strength, 0600, \
> +			       pfctl_strength_show,			\
> +			       pfctl_strength_store),			\
> +		.mask = _mask, }
> +
> +static PFCTL_STRENGTH_ATTR(1, A64FX_SDPF_L1W);
> +static PFCTL_STRENGTH_ATTR(2, A64FX_SDPF_L2W);
> +
> +static ssize_t
> +pfctl_strength_available_show(struct device *dev, struct device_attribute *attr,
> +			      char *buf)
> +{
> +	return sysfs_emit(buf, "%s %s\n", strength_strong_string,
> +			  strength_weak_string);
> +}
> +
> +/*
> + * A64FX has same kind of available strength for any caches, so define only one
> + * attribute.
> + */
> +struct a64fx_pfctl_attr attr_strength_available = {
> +	.attr = __ATTR(stream_detect_prefetcher_strength_available, 0400,
> +		       pfctl_strength_available_show, NULL), };
> +
> +static struct attribute *l1_attrs[] = {
> +	&attr_l1_enable.attr.attr,
> +	&attr_l1_dist.attr.attr,
> +	&attr_l1_strength.attr.attr,
> +	&attr_strength_available.attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute *l2_attrs[] = {
> +	&attr_l2_enable.attr.attr,
> +	&attr_l2_dist.attr.attr,
> +	&attr_l2_strength.attr.attr,
> +	&attr_strength_available.attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group l1_group = {
> +	.attrs = l1_attrs,
> +};
> +
> +static struct attribute_group l2_group = {
> +	.attrs = l2_attrs,
> +};
> +
> +static const struct attribute_group *l1_groups[] = {
> +	&l1_group,
> +	NULL,
> +};
> +
> +static const struct attribute_group *l2_groups[] = {
> +	&l2_group,
> +	NULL,
> +};
> +
> +static const struct pfctl_group pfctl_groups[] = {
> +	{
> +		.level = 1,
> +		.type = CACHE_TYPE_DATA,
> +		.groups = l1_groups,
> +	},
> +	{
> +		.level = 2,
> +		.type = CACHE_TYPE_UNIFIED,
> +		.groups = l2_groups,
> +	},
> +	{
> +		.groups = NULL,
> +	},
> +};
> +
> +static const struct attribute_group **
> +get_pfctl_attribute_groups(unsigned int level, enum cache_type type)
> +{
> +	int i;
> +
> +	for (i = 0; pfctl_groups[i].groups; i++)
> +		if ((level == pfctl_groups[i].level) &&
> +		    (type == pfctl_groups[i].type))
> +			return pfctl_groups[i].groups;
> +
> +	return NULL;
> +}
> +
> +static int create_pfctl_attr(struct device *index_dev, void *data)
> +{
> +	struct cacheinfo *leaf = dev_get_drvdata(index_dev);
> +	const struct attribute_group **groups;
> +	struct device *pfctl_dev;
> +
> +	groups = get_pfctl_attribute_groups(leaf->level, leaf->type);
> +	if (!groups)
> +		return 0;
> +
> +	pfctl_dev = cpu_device_create(index_dev, data, groups,
> +				      "prefetch_control");
> +	if (IS_ERR(pfctl_dev))
> +		return PTR_ERR(pfctl_dev);

So you create this device structure and then throw it away?

> +
> +	return 0;
> +}
> +
> +static int remove_pfctl_attr(struct device *index_dev, void *data)
> +{
> +	struct device *pfctl_dev;
> +
> +	pfctl_dev = device_find_child_by_name(index_dev, "prefetch_control");

That's bold, how do you know that your device really has that name?
That's not how you do device lifecycle management.  You have a list of
the devices you control and go from there.

But larger question is, why do you need a device at all?  Shouldn't the
attribute group name work properly for all of this?  The attributes
should be on the CPU device, not on a random child device, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (6 preceding siblings ...)
  2022-06-10 13:07 ` [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Greg KH
@ 2022-06-10 13:48 ` Linus Walleij
  2022-06-17  9:06   ` tarumizu.kohei
  7 siblings, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2022-06-10 13:48 UTC (permalink / raw)
  To: Kohei Tarumizu, Mel Gorman, Linux Memory Management List,
	Michal Hocko, Andrew Morton
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, eugenis, tony.luck, pcc, peterz, marcos,
	marcan, nicolas.ferre, conor.dooley, arnd, ast, peter.chen, kuba,
	linux-kernel, linux-arm-kernel, linux-acpi, Paolo Valente

On Tue, Jun 7, 2022 at 2:07 PM Kohei Tarumizu
<tarumizu.kohei@fujitsu.com> wrote:

> This patch series add sysfs interface to control CPU's hardware
> prefetch behavior for performance tuning from userspace for the
> processor A64FX and x86 (on supported CPU).

OK

> A64FX and some Intel processors have implementation-dependent register
> for controlling CPU's hardware prefetch behavior. A64FX has
> IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
> (MSR_MISC_FEATURE_CONTROL)[2].

Hardware prefetch (I guess of memory contents) is a memory hierarchy feature.

Linux has a memory hierarchy manager, conveniently named "mm",
developed by some of the smartest people I know. The main problem
addressed by that is paging, but prefetching into the CPU from the
next lowest level in the memory hierarchy is just another memory
hierarchy hardware feature, such as hard disks, primary RAM etc.

> These registers cannot be accessed from userspace.

Good. The kernel managed hardware. If the memory hierarchy people have
userspace now doing stuff behind their back, through some special
interface, that makes their world more complicated.

This looks like it needs information from the generic memory manager,
from the scheduler, and possibly all the way down from the block
layer to do the right thing, so it has no business in userspace.
Have you seen mm/damon for example? Access to statistics for
memory access patterns seems really useful for tuning the behaviour
of this hardware. Just my €0.01.

If it does interact with userspace I suppose it should be using control
groups, like everything else of this type, see e.g. mm/memcontrol.c,
not custom sysfs files.

Just an example from one of the patches:

+                       - "* Adjacent Cache Line Prefetcher Disable (R/W)"
+                           corresponds to the
"adjacent_cache_line_prefetcher_enable"

I might only be on "a little knowledge is dangerous" on the memory
manager topics, but I know for sure that they at times adjust the members
of structs to fit nicely on cache lines. And now this? It looks really useful
for kernel machinery that know very well what needs to go into the cache
line next and when.

Talk to the people on linux-mm and memory maintainer Andrew Morton on
how to do this right, it's a really interesting feature! Also given
that people say
that the memory hierarchy is an important part in the performance of the Apple
M1 (M2) silicon, I expect that machine to have this too?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-10 13:07 ` [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Greg KH
@ 2022-06-14 11:55   ` tarumizu.kohei
  2022-06-14 12:34     ` 'Greg KH'
  0 siblings, 1 reply; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-14 11:55 UTC (permalink / raw)
  To: 'Greg KH'
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc, peterz,
	marcos, marcan, linus.walleij, nicolas.ferre, conor.dooley, arnd,
	ast, peter.chen, kuba, linux-kernel, linux-arm-kernel,
	linux-acpi

Thanks for the comment.

> Why does userspace want to even do this?

This is because the optimal settings may differ from application to
application.

Examples of performance improvements for applications with simple
memory access characteristics are described in [merit] section.
However, some applications have complex characteristics, so it is
difficult to predict if an application will improve without actually
trying it out.

This is not necessary for all applications. However, I want to provide
as a minimal interface that can be used by those who want to improve
their application even a little.

> How will they do this?

I assume to be used to tune a specific core and execute an application
on that core. The execution example is as follows.

1) The user tunes the parameters of a specific core before executing
   the program.

```
# echo 1024 > /sys/devices/system/cpu/cpu12/cache/index0/prefetch_control/stream_detect_prefetcher_dist
# echo 1024 > /sys/devices/system/cpu/cpu12/cache/index2/prefetch_control/stream_detect_prefetcher_dist
# echo 1024 > /sys/devices/system/cpu/cpu13/cache/index0/prefetch_control/stream_detect_prefetcher_dist
# echo 1024 > /sys/devices/system/cpu/cpu13/cache/index2/prefetch_control/stream_detect_prefetcher_dist
```

2) Execute the program bound to the target core.

```
# taskset -c 12-13 a.out
```

If the interface is exposed, the user can develop a library to execute
1) and 2) operation instead.

> What programs will do this?

It is assumed to be used by programs that execute many continuous
memory access. It may be useful for other applications, but I can't
explain them in detail right away.

> And why isn't just automatic and why does this hardware require manual
> intervention to work properly?

It is difficult for the hardware to determine the optimal parameters
in advance. Therefore, I think that the register is provided to change
the behavior of the hardware.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-14 11:55   ` tarumizu.kohei
@ 2022-06-14 12:34     ` 'Greg KH'
  2022-06-17  9:20       ` tarumizu.kohei
  0 siblings, 1 reply; 26+ messages in thread
From: 'Greg KH' @ 2022-06-14 12:34 UTC (permalink / raw)
  To: tarumizu.kohei
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc, peterz,
	marcos, marcan, linus.walleij, nicolas.ferre, conor.dooley, arnd,
	ast, peter.chen, kuba, linux-kernel, linux-arm-kernel,
	linux-acpi

On Tue, Jun 14, 2022 at 11:55:39AM +0000, tarumizu.kohei@fujitsu.com wrote:
> Thanks for the comment.
> 
> > Why does userspace want to even do this?
> 
> This is because the optimal settings may differ from application to
> application.

That's not ok.  Linux is a "general purpose" operating system and needs
to work well for all applications.  Doing application-specific-tuning
based on the specific hardware like this is a nightmare for users, and
will be for you as you will now have to support this specific model to
work correctly on all future kernel releases for the next 20+ years.
Are you willing to do that?

> Examples of performance improvements for applications with simple
> memory access characteristics are described in [merit] section.
> However, some applications have complex characteristics, so it is
> difficult to predict if an application will improve without actually
> trying it out.

Then perhaps it isn't anything that they should try out :)

Shouldn't the kernel know how the application works (based on the
resources it asks for) and tune itself based on that automatically?

If not, how is a user supposed to know how to do this?

> This is not necessary for all applications. However, I want to provide
> as a minimal interface that can be used by those who want to improve
> their application even a little.
> 
> > How will they do this?
> 
> I assume to be used to tune a specific core and execute an application
> on that core. The execution example is as follows.
> 
> 1) The user tunes the parameters of a specific core before executing
>    the program.
> 
> ```
> # echo 1024 > /sys/devices/system/cpu/cpu12/cache/index0/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu12/cache/index2/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu13/cache/index0/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu13/cache/index2/prefetch_control/stream_detect_prefetcher_dist
> ```

What is "1024" here?  Where is any of this documented?  And why these
specific sysfs files and not others?

> 2) Execute the program bound to the target core.
> 
> ```
> # taskset -c 12-13 a.out
> ```
> 
> If the interface is exposed, the user can develop a library to execute
> 1) and 2) operation instead.

If you have no such user today, nor a library, how do you know any of
this works well?  And again, how will you support this going forward?
Or is this specific api only going to be for one specific piece of
hardware and never any future ones?

> > What programs will do this?
> 
> It is assumed to be used by programs that execute many continuous
> memory access. It may be useful for other applications, but I can't
> explain them in detail right away.

So you haven't tested this on any real applications?  We need real users
before being able to add new apis.  Otherwise we can just remove the
apis :)

> > And why isn't just automatic and why does this hardware require manual
> > intervention to work properly?
> 
> It is difficult for the hardware to determine the optimal parameters
> in advance. Therefore, I think that the register is provided to change
> the behavior of the hardware.

Kernel programming for a general purpose operating system is hard, but
it is possible :)

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-10 13:48 ` Linus Walleij
@ 2022-06-17  9:06   ` tarumizu.kohei
  0 siblings, 0 replies; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-17  9:06 UTC (permalink / raw)
  To: 'Linus Walleij',
	Mel Gorman, Linux Memory Management List, Michal Hocko,
	Andrew Morton
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, gregkh, eugenis, tony.luck, pcc, peterz, marcos,
	marcan, nicolas.ferre, conor.dooley, arnd, ast, peter.chen, kuba,
	linux-kernel, linux-arm-kernel, linux-acpi, Paolo Valente

Hi Linus,

Thanks for the comment.

> OK
> 
> > A64FX and some Intel processors have implementation-dependent register
> > for controlling CPU's hardware prefetch behavior. A64FX has
> > IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR
> 0x1a4
> > (MSR_MISC_FEATURE_CONTROL)[2].
> 
> Hardware prefetch (I guess of memory contents) is a memory hierarchy feature.
> 
> Linux has a memory hierarchy manager, conveniently named "mm", developed
> by some of the smartest people I know. The main problem addressed by that is
> paging, but prefetching into the CPU from the next lowest level in the memory
> hierarchy is just another memory hierarchy hardware feature, such as hard
> disks, primary RAM etc.
> 
> > These registers cannot be accessed from userspace.
> 
> Good. The kernel managed hardware. If the memory hierarchy people have
> userspace now doing stuff behind their back, through some special interface,
> that makes their world more complicated.
> 
> This looks like it needs information from the generic memory manager, from the
> scheduler, and possibly all the way down from the block layer to do the right
> thing, so it has no business in userspace.
> Have you seen mm/damon for example? Access to statistics for memory
> access patterns seems really useful for tuning the behaviour of this hardware.
> Just my €0.01.

Thank you for the information. I will see if mm/damon statistics can
be used for tuning.

> If it does interact with userspace I suppose it should be using control groups,
> like everything else of this type, see e.g. mm/memcontrol.c, not custom sysfs
> files.

Hardware prefetch registers exist for each core, and the settings are
independent for each cache. Therefore, currently, I created it under
/sys/devices/system/cpu/cpu*/cache/index*.
However, when user actually configure it for an application, they may
want to set it on a per-process basis. Considering that, I think
control groups is suitable for this usage.

For example, is your idea of interface like the following?

```
  /sys/fs/cgroup/memory/memory.hardware_prefetcher.enable
```

Cpuset controller has information about which CPU a process belonging
to a group is bound to, so maybe cpuset controller is more appropriate.

Control groups has hierarchical structure, so it is necessary to
consider whether they can map hardware prefetch behavior well.
Currentry I have two concerns.
First, upper hierarchy contains the same CPU as the lower hierarchy.
In this case, it may not be possible to configure independent setting
in each hierarchy.
Next, context switch considerations. This function rewrites the
value of the register that exists for each core. Therefore, the
register value must be changed at the timing of the context switch
with a process belonging to a different group.

> Just an example from one of the patches:
> 
> +                       - "* Adjacent Cache Line Prefetcher Disable (R/W)"
> +                           corresponds to the
> "adjacent_cache_line_prefetcher_enable"
> 
> I might only be on "a little knowledge is dangerous" on the memory manager
> topics, but I know for sure that they at times adjust the members of structs to fit
> nicely on cache lines. And now this? It looks really useful for kernel machinery
> that know very well what needs to go into the cache line next and when.
> 
> Talk to the people on linux-mm and memory maintainer Andrew Morton on how
> to do this right, it's a really interesting feature! Also given that people say that
> the memory hierarchy is an important part in the performance of the Apple
> M1 (M2) silicon, I expect that machine to have this too?

I think this proposal will be useful for users, so I will proceed
with concrete studies and talk to the MM people.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-14 12:34     ` 'Greg KH'
@ 2022-06-17  9:20       ` tarumizu.kohei
  2022-06-27  9:36         ` Linus Walleij
  0 siblings, 1 reply; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-17  9:20 UTC (permalink / raw)
  To: 'Greg KH'
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc, peterz,
	marcos, marcan, linus.walleij, nicolas.ferre, conor.dooley, arnd,
	ast, peter.chen, kuba, linux-kernel, linux-arm-kernel,
	linux-acpi

Hi Greg,

> That's not ok.  Linux is a "general purpose" operating system and needs to
> work well for all applications.  Doing application-specific-tuning based on the
> specific hardware like this is a nightmare for users,

Hardware prefetch behavior is enabled by default in x86 and A64FX.
Many applications can perform well without changing the register
setting. Use this feature for some applications that want to be
improved performance.

In particular, A64FX's hardware prefetch behavior is used for HPC
applications. The user running HPC applications needs to improve
performance as much as possible. This feature is useful for such
users. Therefore, some A64FX machines have their own drivers that
control hardware prefetch behavior. It is built into the software
products for A64FX and cannot be used without purchase.

I want to make this feature available to people who want to improve
performance without purchase the product. This is limited in use and
depends on the characteristics of the application. Isn't this match
with "general purpose"?

> and will be for you as you
> will now have to support this specific model to work correctly on all future
> kernel releases for the next 20+ years.
> Are you willing to do that?

Rather than relying on a specific model of this API, I want to make it
generally available. However, it may not be so now. I am willing to
support this if I could make it a community-approved interface.

> Then perhaps it isn't anything that they should try out :)
> 
> Shouldn't the kernel know how the application works (based on the resources
> it asks for) and tune itself based on that automatically?
> 
> If not, how is a user supposed to know how to do this?

It is useful for users if it can be done automatically by the kernel.
I will consider if there is anything I can do using statistical
information.

> What is "1024" here?  Where is any of this documented?

This parameter means the difference in bytes between the memory
address the program is currently accessing and the memory address
accessed by the hardware prefetch. My document in
sysfs-devices-system-cpu does not specify what the distance means, so
I will add it.

For reference, the hardware prefetch details are described below.

"https://github.com/fujitsu/A64FX/tree/master/doc"
    A64FX_Microarchitecture_Manual_en_1.7.pdf

> And why these
> specific sysfs files and not others?

I wanted to show an example of changing only the hardware prefetch
distance. There is no special reason not to specify with other sysfs
files.

> If you have no such user today, nor a library, how do you know any of this works
> well?

The prefetch control function included in the software product for
A64FX does the similar operation, and it works well.

> And again, how will you support this going forward?
> Or is this specific api only going to be for one specific piece of hardware and
> never any future ones?

In order to make the interface widely usable in the future, I will
consider a different specification from the current one. For example
control groups which Linus proposaled is one of them.

> So you haven't tested this on any real applications?  We need real users before
> being able to add new apis.  Otherwise we can just remove the apis :)

At least, some A64FX users use this behavior. However, currently, I
don't have which applications and how much performance will be
improved. I will try to get the application actually used and confirm
that it is effective.

> Kernel programming for a general purpose operating system is hard, but it is
> possible :)

I will try to do kernel programming for a general purpose operating
system.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-17  9:20       ` tarumizu.kohei
@ 2022-06-27  9:36         ` Linus Walleij
  2022-06-28 13:55           ` tarumizu.kohei
  2022-06-28 15:46           ` Dave Hansen
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Walleij @ 2022-06-27  9:36 UTC (permalink / raw)
  To: tarumizu.kohei
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi

On Fri, Jun 17, 2022 at 11:21 AM tarumizu.kohei@fujitsu.com
<tarumizu.kohei@fujitsu.com> wrote:

Jumping in here.

> Hi Greg,
>
> > That's not ok.  Linux is a "general purpose" operating system and needs to
> > work well for all applications.  Doing application-specific-tuning based on the
> > specific hardware like this is a nightmare for users,
>
> Hardware prefetch behavior is enabled by default in x86 and A64FX.
> Many applications can perform well without changing the register
> setting. Use this feature for some applications that want to be
> improved performance.

The right way to solve this is to make the Linux kernel contain the necessary
heuristics to identify which tasks and thus cores need this to improve
efficiency and then apply it automatically.

Putting it in userspace is making a human do a machines job which isn't
sustainable.

By putting the heuristics in kernelspace Linux will improve performance also
on workloads the human operator didn't think of as the machine will
detect them from
statictical or other behaviour patterns.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-27  9:36         ` Linus Walleij
@ 2022-06-28 13:55           ` tarumizu.kohei
  2022-06-28 16:39             ` Luck, Tony
  2022-06-28 15:46           ` Dave Hansen
  1 sibling, 1 reply; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-28 13:55 UTC (permalink / raw)
  To: 'Linus Walleij'
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi

Hi Linus,

> The right way to solve this is to make the Linux kernel contain the necessary
> heuristics to identify which tasks and thus cores need this to improve efficiency
> and then apply it automatically.
> 
> Putting it in userspace is making a human do a machines job which isn't
> sustainable.
> 
> By putting the heuristics in kernelspace Linux will improve performance also on
> workloads the human operator didn't think of as the machine will detect them from
> statictical or other behaviour patterns.

In order to put the heuristics into kernelspace Linux, I think it
necessary to consider the following two points.

1) Which cores are tied with the process?
This is different from the core on which the process can run. It
probably need to combine some CPU resource limit to avoid affecting
non-target processes.

2) How to derive the value to set in the register?
It is necessary to verify whether an appropriate set value can be
derived using statistical information, etc. In addition, to prevent
the cost of automatic derivation from exceeding the value that would
be improved by it.

I don't have a prospect for resolving these issues yet. I will
continue these considerations.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-27  9:36         ` Linus Walleij
  2022-06-28 13:55           ` tarumizu.kohei
@ 2022-06-28 15:46           ` Dave Hansen
  2022-06-28 20:20             ` Linus Walleij
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2022-06-28 15:46 UTC (permalink / raw)
  To: Linus Walleij, tarumizu.kohei
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi

On 6/27/22 02:36, Linus Walleij wrote:
> The right way to solve this is to make the Linux kernel contain the
> necessary heuristics to identify which tasks and thus cores need this
> to improve efficiency and then apply it automatically.

I agree in theory.  But, I also want a pony in theory.

Any suggestions for how to do this in the real world?

Otherwise, I'm inclined to say that this series incrementally makes
things better in the real world by at least moving folks away from wrmsr(1).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 13:55           ` tarumizu.kohei
@ 2022-06-28 16:39             ` Luck, Tony
  2022-06-30  5:24               ` tarumizu.kohei
  0 siblings, 1 reply; 26+ messages in thread
From: Luck, Tony @ 2022-06-28 16:39 UTC (permalink / raw)
  To: tarumizu.kohei, 'Linus Walleij'
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, pcc, peterz,
	marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi

>> The right way to solve this is to make the Linux kernel contain the necessary
>> heuristics to identify which tasks and thus cores need this to improve efficiency
>> and then apply it automatically.
>>
>> Putting it in userspace is making a human do a machines job which isn't
>> sustainable.
>>
>> By putting the heuristics in kernelspace Linux will improve performance also on
>> workloads the human operator didn't think of as the machine will detect them from
>> statictical or other behaviour patterns.
>
>In order to put the heuristics into kernelspace Linux, I think it
>necessary to consider the following two points.
>
>1) Which cores are tied with the process?
>This is different from the core on which the process can run. It
>probably need to combine some CPU resource limit to avoid affecting
>non-target processes.
>
>2) How to derive the value to set in the register?
>It is necessary to verify whether an appropriate set value can be
>derived using statistical information, etc. In addition, to prevent
>the cost of automatic derivation from exceeding the value that would
>be improved by it.
>
>I don't have a prospect for resolving these issues yet. I will
>continue these considerations.

Another approach would be to make the set of prefetch settings
a task attribute. Then set them in the context switch code when
the process is about to run on a CPU.

But that assumes you can cheaply change the attributes. If doing
so requires multiple MSR writes (on x86) it might be a non-starter.

-Tony

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 15:46           ` Dave Hansen
@ 2022-06-28 20:20             ` Linus Walleij
  2022-06-28 21:01               ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2022-06-28 20:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: tarumizu.kohei, Greg KH, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, lenb, mchehab+huawei, eugenis,
	tony.luck, pcc, peterz, marcos, marcan, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi, Paolo Valente, Andrew Morton

On Tue, Jun 28, 2022 at 5:47 PM Dave Hansen <dave.hansen@intel.com> wrote:

> On 6/27/22 02:36, Linus Walleij wrote:

> > The right way to solve this is to make the Linux kernel contain the
> > necessary heuristics to identify which tasks and thus cores need this
> > to improve efficiency and then apply it automatically.
>
> I agree in theory.  But, I also want a pony in theory.
>
> Any suggestions for how to do this in the real world?

Well if the knobs are exposed to userspace, how do people using
these knobs know when to turn them? A profiler? perf? All that
data is available to the kernel too.

The memory access pattern statistics from mm/damon
was what I suggested as a starting point.

We have pretty elaborate heuristics in the kernel to identify the
behaviour of processes, one example is the BFQ block scheduler
which determines I/O priority weights of processed based on
how interactive they are.

If we can determine things like that I am pretty sure we can determine how
computing intense a task is for example, by using memory access
statistics and scheduler information: if the process is constantly
READY to run over a few context switches and PC also stays in a
certain rage of memory like two adjacent pages then it is probably
running a hard kernel, if that is what we need to know here. It
doesn't seem too far-fetched?

We have the performance counters as well. That should be possible to
utilize to get even more precise heuristics? Maybe that is what userspace
is using to determine this already.

I'm not saying there has to be a simple solution, but maybe there
is something like a really complicated solution? We have academic
researchers that like to look at things like this.

> Otherwise, I'm inclined to say that this series incrementally makes
> things better in the real world by at least moving folks away from wrmsr(1).

I don't know if yet another ABI that needs to be maintained helps
the situation much, it's just a contract that we will have to
maintain for no gain. However if userspace is messing with that
register behind our back and we know better, we
can just overwrite it with the policy we determine is better in the
kernel.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 20:20             ` Linus Walleij
@ 2022-06-28 21:01               ` Dave Hansen
  2022-06-28 21:17                 ` Linus Walleij
  2022-06-30  9:43                 ` tarumizu.kohei
  0 siblings, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2022-06-28 21:01 UTC (permalink / raw)
  To: Linus Walleij
  Cc: tarumizu.kohei, Greg KH, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, lenb, mchehab+huawei, eugenis,
	tony.luck, pcc, peterz, marcos, marcan, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi, Paolo Valente, Andrew Morton

On 6/28/22 13:20, Linus Walleij wrote:
> On Tue, Jun 28, 2022 at 5:47 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 6/27/22 02:36, Linus Walleij wrote:
>>> The right way to solve this is to make the Linux kernel contain the
>>> necessary heuristics to identify which tasks and thus cores need this
>>> to improve efficiency and then apply it automatically.
>>
>> I agree in theory.  But, I also want a pony in theory.
>>
>> Any suggestions for how to do this in the real world?
> 
> Well if the knobs are exposed to userspace, how do people using
> these knobs know when to turn them? A profiler? perf? All that
> data is available to the kernel too.

They run their fortran app.  Change the MSRs.  Run it again.  See if it
simulated the nuclear weapon blast any faster or slower.  Rinse.  Repeat.

One thing that is missing from the changelog and cover letter here: On
x86, there's a 'wrmsr(1)' tool.  That took pokes at Model Specific
Registers (MSRs) via the /dev/cpu/X/msr interface.  That interface is a
very, very thinly-veiled wrapper around the WRMSR (WRite MSR) instruction.

In other words, on x86, our current interface allows userspace programs
to arbitrarily poke at our most sensitive hardware configuration
registers.  One of the most common reasons users have reported doing
this (we have pr_warn()ings about it) is controlling the prefetch hardware.

This interface would take a good chunk of the x86 wrmsr(1) audience and
convert them over to a less dangerous interface.  That's a win on x86.
We don't even *remotely* have line-of-sight for a generic solution for
the kernel to figure out a single "best" value for these registers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 21:01               ` Dave Hansen
@ 2022-06-28 21:17                 ` Linus Walleij
  2022-06-30  9:43                 ` tarumizu.kohei
  1 sibling, 0 replies; 26+ messages in thread
From: Linus Walleij @ 2022-06-28 21:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: tarumizu.kohei, Greg KH, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, lenb, mchehab+huawei, eugenis,
	tony.luck, pcc, peterz, marcos, marcan, nicolas.ferre,
	conor.dooley, arnd, ast, peter.chen, kuba, linux-kernel,
	linux-arm-kernel, linux-acpi, Paolo Valente, Andrew Morton

On Tue, Jun 28, 2022 at 11:02 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 6/28/22 13:20, Linus Walleij wrote:
> >
> > Well if the knobs are exposed to userspace, how do people using
> > these knobs know when to turn them? A profiler? perf? All that
> > data is available to the kernel too.
>
> They run their fortran app.  Change the MSRs.  Run it again.  See if it
> simulated the nuclear weapon blast any faster or slower.  Rinse.  Repeat.

That sounds like a schoolbook definition of the trial-and-error method.
https://en.wikipedia.org/wiki/Trial_and_error

That's fair. But these people really need a better hammer.

> This interface would take a good chunk of the x86 wrmsr(1) audience and
> convert them over to a less dangerous interface.  That's a win on x86.
> We don't even *remotely* have line-of-sight for a generic solution for
> the kernel to figure out a single "best" value for these registers.

Maybe less dangerous for them, but maybe more dangerous for the kernel
community who signs up to maintain the behaviour of that interface
perpetually.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 16:39             ` Luck, Tony
@ 2022-06-30  5:24               ` tarumizu.kohei
  0 siblings, 0 replies; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-30  5:24 UTC (permalink / raw)
  To: 'Luck, Tony', 'Linus Walleij'
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, pcc, peterz,
	marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi

Hi Tony,

Thanks for the comment.

> Another approach would be to make the set of prefetch settings a task attribute.
> Then set them in the context switch code when the process is about to run on
> a CPU.
> 
> But that assumes you can cheaply change the attributes. If doing so requires
> multiple MSR writes (on x86) it might be a non-starter.

On the x86 and A64FX, each parameter for controlling hardware prefetch
is contained in one register. The current specification makes each
parameter a separate attribute, so we need to write as many times as
there are parameters to change. However it is possible to change the
attribute with one MSR write per core by changing multiple parameters
before the context switch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86
  2022-06-28 21:01               ` Dave Hansen
  2022-06-28 21:17                 ` Linus Walleij
@ 2022-06-30  9:43                 ` tarumizu.kohei
  1 sibling, 0 replies; 26+ messages in thread
From: tarumizu.kohei @ 2022-06-30  9:43 UTC (permalink / raw)
  To: 'Dave Hansen', Linus Walleij
  Cc: Greg KH, catalin.marinas, will, tglx, mingo, bp, dave.hansen,
	x86, hpa, rafael, lenb, mchehab+huawei, eugenis, tony.luck, pcc,
	peterz, marcos, marcan, nicolas.ferre, conor.dooley, arnd, ast,
	peter.chen, kuba, linux-kernel, linux-arm-kernel, linux-acpi,
	Paolo Valente, Andrew Morton

Hi Dave,

> They run their fortran app.  Change the MSRs.  Run it again.  See if it
> simulated the nuclear weapon blast any faster or slower.  Rinse.  Repeat.
> 
> One thing that is missing from the changelog and cover letter here: On x86,
> there's a 'wrmsr(1)' tool.  That took pokes at Model Specific Registers (MSRs)
> via the /dev/cpu/X/msr interface.  That interface is a very, very thinly-veiled
> wrapper around the WRMSR (WRite MSR) instruction.
> 
> In other words, on x86, our current interface allows userspace programs to
> arbitrarily poke at our most sensitive hardware configuration registers.  One of
> the most common reasons users have reported doing this (we have
> pr_warn()ings about it) is controlling the prefetch hardware.
> 
> This interface would take a good chunk of the x86 wrmsr(1) audience and
> convert them over to a less dangerous interface.  That's a win on x86.
> We don't even *remotely* have line-of-sight for a generic solution for the kernel
> to figure out a single "best" value for these registers.

Thank you for mentioning wrmsr tool.

This is one of the reason why I want to add the sysfs interface. I
will add the description that this interface can be used instead of
wrmsr tool (or MSR driver) for hardware prefetch control usage to the
cover letter.

I read below that we should not accesse any MSR directly from
userspace without restriction.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-06-30  9:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 12:05 [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
2022-06-07 12:05 ` [PATCH v5 1/6] soc: fujitsu: Add hardware prefetch control driver for A64FX Kohei Tarumizu
2022-06-10 13:20   ` Greg KH
2022-06-07 12:05 ` [PATCH v5 2/6] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-06-07 14:40   ` Randy Dunlap
2022-06-07 12:05 ` [PATCH v5 3/6] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
2022-06-07 12:05 ` [PATCH v5 4/6] x86: Add hardware prefetch control driver for x86 Kohei Tarumizu
2022-06-07 12:05 ` [PATCH v5 5/6] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-06-07 14:40   ` Randy Dunlap
2022-06-08  6:11     ` tarumizu.kohei
2022-06-07 12:05 ` [PATCH v5 6/6] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
2022-06-10 13:07 ` [PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86 Greg KH
2022-06-14 11:55   ` tarumizu.kohei
2022-06-14 12:34     ` 'Greg KH'
2022-06-17  9:20       ` tarumizu.kohei
2022-06-27  9:36         ` Linus Walleij
2022-06-28 13:55           ` tarumizu.kohei
2022-06-28 16:39             ` Luck, Tony
2022-06-30  5:24               ` tarumizu.kohei
2022-06-28 15:46           ` Dave Hansen
2022-06-28 20:20             ` Linus Walleij
2022-06-28 21:01               ` Dave Hansen
2022-06-28 21:17                 ` Linus Walleij
2022-06-30  9:43                 ` tarumizu.kohei
2022-06-10 13:48 ` Linus Walleij
2022-06-17  9:06   ` tarumizu.kohei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).