linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86
@ 2022-04-20  3:02 Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This patch series add sysfs interface to control CPU's hardware
prefetch behavior for performance tuning from userspace for the
processor A64FX and x86 (on supported CPU).

Changes from v2:
  - move arm64 driver (arch/arm64) to A64FX only (drivers/soc/fujitsu)
  - prohibit writing 0 to stream_detect_prefetcher_dist
  - change the type of strongness state handled from bool to string
    (e.g. "strong"), and rename to stream_detect_prefetcher_strength
  - change x86 code to work correctly with resctrl's pseudo lock
    - read and write registers in one smp_call_function_single() to
      prevent context switch when writing registers in x86-pfctl.c
    - restore to original value when re-enabling the register in
      pseudo_lock.c
  - add prefix to driver's name for A64FX(a64fx-) and x86(x86-)
  - modify the document
    - split the description of pfctl into blocks for x86 and A64FX
    - remove unnecessary descriptions
https://lore.kernel.org/lkml/20220311101940.3403607-1-tarumizu.kohei@fujitsu.com/

[Background]
============
A64FX and some Intel processors have implementation-dependent register
for controlling CPU's hardware prefetch behavior. A64FX has
IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
(MSR_MISC_FEATURE_CONTROL)[2]. These registers cannot be accessed from
userspace.

[1]https://github.com/fujitsu/A64FX/tree/master/doc/
   A64FX_Specification_HPC_Extension_v1_EN.pdf

[2]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    Volume 4

The advantage of using this is improved performance. As an example of
performance improvements, the results of running the Stream benchmark
on the A64FX are described in section [Merit].

For MSR 0x1a4, it is also possible to change the value from userspace
via the MSR driver. However, using MSR driver is not recommended, so
it needs a proper kernel interface[3].

[3]https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/

For these reasons, we provide a new proper kernel interface to control
both IMP_PF_STREAM_DETECT_CTRL_EL0 and MSR 0x1a4.

[Overall design]
================
The source code for this driver is divided into common parts
(driver/base/pfctl.c) and architecture parts (arch/XXX/XXX/pfctl.c).
Common parts is described architecture-independent processing, such as
creating sysfs.
Architecture parts is described architecture-dependent processing. It
must contain at least the what type of hardware prefetcher is supported
and how to read/write to the register. These information are set
through registration function in common parts.

This driver creates "prefetch_control" directory and some attribute
files in every CPU's cache/index[0,2] directory, if CPU supports
hardware prefetch control behavior. Each attribute file corresponds to
the cache level of the parent index directory.

Detailed description of this sysfs interface is in
Documentation/ABI/testing/sysfs-devices-system-cpu (patch8).

This driver needs cache sysfs directory and cache level/type
information. In ARM processor, these information can be obtained
from registers even without ACPI PPTT.
We add processing to create a cache/index directory using only the
information from the register if the machine does not support ACPI
PPTT and Kconfig for hardware prefetch control (CONFIG_HWPF_CONTROL)
is true in patch5.
This action caused a problem and is described in [Known problem].

[Examples]
==========
This section provides an example of using this sysfs interface at the
x86's model of INTEL_FAM6_BROADWELL_X.

This model has the following register specifications:

[0]    L2 Hardware Prefetcher Disable (R/W)
[1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2]    DCU Hardware Prefetcher Disable (R/W)
[3]    DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

In this case, index0 (L1d cache) corresponds to bit[2,3] and index2
(L2 cache) corresponds to bit [0,1]. A list of attribute files of
index0 and index2 in CPU1 at BROADWELL_X is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

hardware_prefetcher_enable
ip_prefetcher_enable

# ls /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/

adjacent_cache_line_prefetcher_enable
hardware_prefetcher_enable
```

If user would like to disable the setting of "L2 Adjacent Cache Line
Prefetcher Disable (R/W)" in CPU1, do the following:

```
# echo 0 > /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/adjacent_cache_line_prefetcher_enable
```

In another example, a list of index0 at A64FX is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

stream_detect_prefetcher_dist
stream_detect_prefetcher_enable
stream_detect_prefetcher_strength
stream_detect_prefetcher_strength_available
```

[Patch organizations]
=====================
This patch series add hardware prefetch control core driver for A64FX
and x86. Also, we add support for A64FX and BROADWELL_X at x86.

- patch1: Add hardware prefetch core driver

  This driver provides a register/unregister function to create the
  "prefetch_control" directory and some attribute files in every CPU's
  cache/index[0,2] directory.
  If the architecture has control of the CPU's hardware prefetch
  behavior, use this function to create sysfs. When registering, it
  is necessary to provide what type of Hardware Prefetcher is
  supported and how to read/write to the register.

- patch2: Add Kconfig/Makefile to build hardware prefetch control core
  driver

- patch3: Add support for A64FX

  This adds module init/exit code, and creates sysfs attribute file
  "stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
  and "stream_detect_prefetcher_dist" for A64FX. This driver works only
  if part number is FUJITSU_CPU_PART_A64FX at this point.

- patch4: Add Kconfig/Makefile to build driver for A64FX

- patch5: Create cache sysfs directory without ACPI PPTT for hardware
  prefetch control

  Hardware Prefetch control driver needs cache sysfs directory and cache
  level/type information. In ARM processor, these information can be
  obtained from registers(CLIDR_EL1) even without PPTT. Therefore, we
  set the cpu_map_populated to true to create cache sysfs directory, if
  the machine doesn't have PPTT.

- patch6: Fix to restore to original value when re-enabling hardware
  prefetch register in pseudo_lock.c

  The current pseudo_lock.c code overwrittes the value of the
  MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
  Therefore, modify it to save and restore the original values.

- patch7: Add support for x86

  This adds module init/exit code, and creates sysfs attribute file
  "hardware_prefetcher_enable", "ip_prefetcher_enable" and
  "adjacent_cache_line_prefetcher_enable" for x86. This driver works
  only if the model is INTEL_FAM6_BROADWELL_X at this point.

- patch8: Add Kconfig/Makefile to build driver for x86

- patch9: Add documentation for the new sysfs interface

[Known problem]
===============
- `lscpu` command terminates with -ENOENT because cache/index directory
  is exists but shared_cpu_map file does not exist. This is due to
  patch5, which creates a cache/index directory containing only level
  and type without ACPI PPTT.

[Merit]
=======
For reference, here is the result of STREAM Triad when tuning with
the "s file in L1 and L2 cache on A64FX.

| dist combination  | Pattern A   | Pattern B   |
|-------------------|-------------|-------------|
| L1:256,  L2:1024  | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024  | 279172.8742 | 118979.4542 |
| L1:256,  L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

Best regards,
Kohei Tarumizu

Kohei Tarumizu (9):
  drivers: base: Add hardware prefetch control core driver
  drivers: base: Add Kconfig/Makefile to build hardware prefetch control
    core driver
  soc: fujitsu: Add hardware prefetch control support for A64FX
  soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control
    driver
  arm64: Create cache sysfs directory without ACPI PPTT for hardware
    prefetch control
  x86: resctrl: pseudo_lock: Fix to restore to original value when
    re-enabling hardware prefetch register
  x86: Add hardware prefetch control support for x86
  x86: Add Kconfig/Makefile to build hardware prefetch control driver
  docs: ABI: Add sysfs documentation interface of hardware prefetch
    control driver

 .../ABI/testing/sysfs-devices-system-cpu      |  98 ++++
 MAINTAINERS                                   |   8 +
 arch/arm64/kernel/cacheinfo.c                 |  29 ++
 arch/x86/Kconfig                              |   6 +
 arch/x86/kernel/cpu/Makefile                  |   2 +
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  12 +-
 arch/x86/kernel/cpu/x86-pfctl.c               | 347 +++++++++++++
 drivers/base/Kconfig                          |   9 +
 drivers/base/Makefile                         |   1 +
 drivers/base/pfctl.c                          | 458 ++++++++++++++++++
 drivers/soc/Kconfig                           |   1 +
 drivers/soc/Makefile                          |   1 +
 drivers/soc/fujitsu/Kconfig                   |  11 +
 drivers/soc/fujitsu/Makefile                  |   2 +
 drivers/soc/fujitsu/a64fx-pfctl.c             | 356 ++++++++++++++
 include/linux/pfctl.h                         |  49 ++
 16 files changed, 1387 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c
 create mode 100644 drivers/base/pfctl.c
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile
 create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c
 create mode 100644 include/linux/pfctl.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20 21:40   ` Thomas Gleixner
  2022-04-21  6:18   ` Greg KH
  2022-04-20  3:02 ` [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build " Kohei Tarumizu
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This driver adds the register/unregister function to create the
"prefetch_control" directory and some attribute files. Attributes are
only present if the particular cache implements the relevant
prefetcher controls

If the architecture has control of the CPU's hardware prefetcher
behavior, use this function to create sysfs. When registering, it is
necessary to provide what type of hardware prefetcher is supported
and how to read/write to the register.

Following patches add support for A64FX and x86.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/base/pfctl.c  | 458 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pfctl.h |  49 +++++
 2 files changed, 507 insertions(+)
 create mode 100644 drivers/base/pfctl.c
 create mode 100644 include/linux/pfctl.h

diff --git a/drivers/base/pfctl.c b/drivers/base/pfctl.c
new file mode 100644
index 000000000000..035d51c64e33
--- /dev/null
+++ b/drivers/base/pfctl.c
@@ -0,0 +1,458 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * This driver provides tunable sysfs interface for Hardware Prefetch Control.
+ * See Documentation/ABI/testing/sysfs-devices-system-cpu for more information.
+ *
+ * This code provides architecture-independent functions such as create and
+ * remove attribute file.
+ * The implementation of reads and writes to the Hardware Prefetch Control
+ * register is architecture-dependent. Therefore, each architecture register
+ * a callback to read and write the register via pfctl_register_driver().
+ */
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/pfctl.h>
+#include <linux/parser.h>
+#include <linux/slab.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+static DEFINE_PER_CPU(struct device *, cache_device_pcpu);
+#define per_cpu_cache_device(cpu) (per_cpu(cache_device_pcpu, cpu))
+
+struct pfctl_driver *pdriver;
+enum cpuhp_state hp_online;
+
+static const char dist_auto_string[] = "auto";
+static const char strength_strong_string[] = "strong";
+static const char strength_weak_string[] = "weak";
+
+static bool prefetcher_is_available(unsigned int level, enum cache_type type,
+				    int prefetcher)
+{
+	if ((level == 1) && (type == CACHE_TYPE_DATA)) {
+		if (pdriver->supported_l1d_prefetcher & prefetcher)
+			return true;
+	} else if ((level == 2) && (type == CACHE_TYPE_UNIFIED)) {
+		if (pdriver->supported_l2_prefetcher & prefetcher)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Returns the cpu number of the cpu_device(/sys/devices/system/cpu/cpuX)
+ * in the ancestor directory of prefetch_control.
+ *
+ * When initializing this driver, it is verified that the cache directory exists
+ * under cpu_device. Therefore, the third level up from prefetch_control is
+ * cpu_device as shown below.
+ *
+ * /sys/devices/system/cpu/cpuX/cache/indexX/prefetch_control
+ */
+static inline unsigned int get_pfctl_device_cpunum(struct device *pfctl_dev)
+{
+	return pfctl_dev->parent->parent->parent->id;
+}
+
+#define pfctl_enable_show(prefetcher, pattr)				\
+static ssize_t								\
+prefetcher##_enable_show(struct device *dev,				\
+			 struct device_attribute *attr, char *buf)	\
+{									\
+	int ret;							\
+	u64 val;							\
+	unsigned int cpu;						\
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);	\
+									\
+	cpu = get_pfctl_device_cpunum(dev);				\
+									\
+	ret = pdriver->read_pfreg(pattr, cpu, this_leaf->level, &val);	\
+	if (ret < 0)							\
+		return ret;						\
+									\
+	if ((val == PFCTL_ENABLE_VAL) || (val == PFCTL_DISABLE_VAL))	\
+		return sysfs_emit(buf, "%llu\n", val);			\
+	else								\
+		return -EINVAL;						\
+}
+
+pfctl_enable_show(hardware_prefetcher, HWPF_ENABLE);
+pfctl_enable_show(ip_prefetcher, IPPF_ENABLE);
+pfctl_enable_show(adjacent_cache_line_prefetcher, ACLPF_ENABLE);
+pfctl_enable_show(stream_detect_prefetcher, SDPF_ENABLE);
+
+static ssize_t
+stream_detect_prefetcher_strength_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	int ret;
+	u64 val;
+	unsigned int cpu;
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);
+
+	cpu = get_pfctl_device_cpunum(dev);
+
+	ret = pdriver->read_pfreg(SDPF_STRENGTH, cpu, this_leaf->level, &val);
+	if (ret < 0)
+		return ret;
+
+	if (val == PFCTL_STRONG_VAL)
+		return sysfs_emit(buf, "%s\n", strength_strong_string);
+	if (val == PFCTL_WEAK_VAL)
+		return sysfs_emit(buf, "%s\n", strength_weak_string);
+	else
+		return -EINVAL;
+}
+
+static ssize_t
+stream_detect_prefetcher_strength_available_show(struct device *dev,
+						 struct device_attribute *attr,
+						 char *buf)
+{
+	ssize_t len = 0;
+
+	if (pdriver->supported_sdpf_strength & STRENGTH_STRONG)
+		len += sysfs_emit_at(buf, len, "%s ", strength_strong_string);
+
+	if (pdriver->supported_sdpf_strength & STRENGTH_WEAK)
+		len += sysfs_emit_at(buf, len, "%s ", strength_weak_string);
+
+	len += sysfs_emit_at(buf, len, "\n");
+
+	return len;
+}
+
+static ssize_t
+stream_detect_prefetcher_dist_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	int ret;
+	u64 val;
+	unsigned int cpu;
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);
+
+	cpu = get_pfctl_device_cpunum(dev);
+
+	ret = pdriver->read_pfreg(SDPF_DIST, cpu, this_leaf->level, &val);
+	if (ret < 0)
+		return ret;
+
+	if (val == PFCTL_DIST_AUTO_VAL)
+		return sysfs_emit(buf, "%s\n", dist_auto_string);
+	else
+		return sysfs_emit(buf, "%llu\n", val);
+}
+
+#define pfctl_enable_store(prefetcher, pattr)				\
+static ssize_t								\
+prefetcher##_enable_store(struct device *dev,				\
+			  struct device_attribute *attr,		\
+			  const char *buf, size_t count)		\
+{									\
+	int ret;							\
+	u64 val;							\
+	unsigned int cpu;						\
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);	\
+									\
+	ret = kstrtoull(buf, 10, &val);					\
+	if (ret < 0)							\
+		return -EINVAL;						\
+									\
+	if ((val != PFCTL_ENABLE_VAL) && (val != PFCTL_DISABLE_VAL))	\
+		return -EINVAL;						\
+									\
+	cpu = get_pfctl_device_cpunum(dev);				\
+									\
+	ret = pdriver->write_pfreg(pattr, cpu, this_leaf->level, val);	\
+	if (ret < 0)							\
+		return ret;						\
+									\
+	return count;							\
+}
+
+pfctl_enable_store(hardware_prefetcher, HWPF_ENABLE);
+pfctl_enable_store(ip_prefetcher, IPPF_ENABLE);
+pfctl_enable_store(adjacent_cache_line_prefetcher, ACLPF_ENABLE);
+pfctl_enable_store(stream_detect_prefetcher, SDPF_ENABLE);
+
+static ssize_t
+stream_detect_prefetcher_strength_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	int ret;
+	u64 val;
+	unsigned int cpu;
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);
+
+	if (sysfs_streq(buf, strength_strong_string))
+		val = PFCTL_STRONG_VAL;
+	else if (sysfs_streq(buf, strength_weak_string))
+		val = PFCTL_WEAK_VAL;
+	else
+		return -EINVAL;
+
+	cpu = get_pfctl_device_cpunum(dev);
+
+	ret = pdriver->write_pfreg(SDPF_STRENGTH, cpu, this_leaf->level, val);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+static ssize_t
+stream_detect_prefetcher_dist_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	int ret;
+	u64 val;
+	unsigned int cpu;
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);
+
+	if (sysfs_streq(buf, dist_auto_string)) {
+		val = PFCTL_DIST_AUTO_VAL;
+	} else {
+		ret = kstrtoull(buf, 10, &val);
+		if (ret < 0 || val < 1)
+			return -EINVAL;
+	}
+
+	cpu = get_pfctl_device_cpunum(dev);
+
+	ret = pdriver->write_pfreg(SDPF_DIST, cpu, this_leaf->level, val);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+static DEVICE_ATTR_ADMIN_RW(hardware_prefetcher_enable);
+static DEVICE_ATTR_ADMIN_RW(ip_prefetcher_enable);
+static DEVICE_ATTR_ADMIN_RW(adjacent_cache_line_prefetcher_enable);
+static DEVICE_ATTR_ADMIN_RW(stream_detect_prefetcher_enable);
+static DEVICE_ATTR_ADMIN_RW(stream_detect_prefetcher_strength);
+static DEVICE_ATTR_ADMIN_RO(stream_detect_prefetcher_strength_available);
+static DEVICE_ATTR_ADMIN_RW(stream_detect_prefetcher_dist);
+
+static umode_t
+pfctl_attrs_is_visible(struct kobject *kobj, struct attribute *attr, int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cacheinfo *this_leaf = dev_get_drvdata(dev->parent);
+	umode_t mode = attr->mode;
+
+	if ((attr == &dev_attr_hardware_prefetcher_enable.attr) &&
+	    (prefetcher_is_available(this_leaf->level, this_leaf->type, HWPF)))
+		return mode;
+
+	if ((attr == &dev_attr_ip_prefetcher_enable.attr) &&
+	    (prefetcher_is_available(this_leaf->level, this_leaf->type, IPPF)))
+		return mode;
+
+	if ((attr == &dev_attr_adjacent_cache_line_prefetcher_enable.attr) &&
+	    (prefetcher_is_available(this_leaf->level, this_leaf->type, ACLPF)))
+		return mode;
+
+	if (((attr == &dev_attr_stream_detect_prefetcher_enable.attr) ||
+	     (attr == &dev_attr_stream_detect_prefetcher_strength.attr) ||
+	     (attr == &dev_attr_stream_detect_prefetcher_strength_available.attr)
+	     || (attr == &dev_attr_stream_detect_prefetcher_dist.attr)) &&
+	    (prefetcher_is_available(this_leaf->level, this_leaf->type, SDPF)))
+		return mode;
+
+	return 0;
+}
+
+static struct attribute *pfctl_attrs[] = {
+	&dev_attr_hardware_prefetcher_enable.attr,
+	&dev_attr_ip_prefetcher_enable.attr,
+	&dev_attr_adjacent_cache_line_prefetcher_enable.attr,
+	&dev_attr_stream_detect_prefetcher_enable.attr,
+	&dev_attr_stream_detect_prefetcher_strength.attr,
+	&dev_attr_stream_detect_prefetcher_strength_available.attr,
+	&dev_attr_stream_detect_prefetcher_dist.attr,
+	NULL,
+};
+
+static const struct attribute_group pfctl_group = {
+	.attrs = pfctl_attrs,
+	.is_visible = pfctl_attrs_is_visible,
+};
+
+static const struct attribute_group *pfctl_groups[] = {
+	&pfctl_group,
+	NULL,
+};
+
+static int find_cache_device(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct device *cache_dev;
+
+	cache_dev = device_find_child_by_name(cpu_dev, "cache");
+	if (!cache_dev)
+		return -ENODEV;
+
+	per_cpu_cache_device(cpu) = cache_dev;
+	return 0;
+}
+
+static int _remove_pfctl_attr(struct device *dev, void *data)
+{
+	struct cacheinfo *leaf = dev_get_drvdata(dev);
+	struct device *pfctl_dev;
+
+	if (!prefetcher_is_available(leaf->level, leaf->type, ANYPF))
+		return 0;
+
+	pfctl_dev = device_find_child_by_name(dev, "prefetch_control");
+	if (!pfctl_dev)
+		return 0;
+
+	device_unregister(pfctl_dev);
+	put_device(pfctl_dev);
+	return 0;
+}
+
+static void remove_pfctl_attr(unsigned int cpu)
+{
+	struct device *cache_dev = per_cpu_cache_device(cpu);
+
+	if (!cache_dev)
+		return;
+
+	device_for_each_child(cache_dev, NULL, _remove_pfctl_attr);
+	put_device(cache_dev);
+}
+
+static int _create_pfctl_attr(struct device *dev, void *data)
+{
+	struct cacheinfo *leaf = dev_get_drvdata(dev);
+	struct device *pfctl_dev;
+
+	if (!prefetcher_is_available(leaf->level, leaf->type, ANYPF))
+		return 0;
+
+	pfctl_dev = cpu_device_create(dev, NULL, pfctl_groups,
+				      "prefetch_control");
+	if (IS_ERR(pfctl_dev))
+		return PTR_ERR(pfctl_dev);
+
+	return 0;
+}
+
+static int create_pfctl_attr(unsigned int cpu)
+{
+	int ret;
+	struct device *cache_dev = per_cpu_cache_device(cpu);
+
+	if (!cache_dev)
+		return -ENODEV;
+
+	ret = device_for_each_child(cache_dev, NULL, _create_pfctl_attr);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int pfctl_online(unsigned int cpu)
+{
+	int ret;
+
+	ret = find_cache_device(cpu);
+	if (ret < 0)
+		return ret;
+
+	ret = create_pfctl_attr(cpu);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int pfctl_prepare_down(unsigned int cpu)
+{
+	remove_pfctl_attr(cpu);
+
+	return 0;
+}
+
+/**
+ * pfctl_register_driver - register a Hardware Prefetch Control driver
+ * @driver_data: struct pfctl_driver must contain the supported prefetcher type
+ *               and function pointer for reading and writing hardware prefetch
+ *               register. If these are not defined this function return error.
+ *
+ * Note: This function must be called after the cache device is initialized
+ * because it requires access to the cache device.
+ * (e.g. Call at the late_initcall)
+ *
+ * Context: Any context.
+ * Return: 0 on success, negative error code on failure.
+ */
+int pfctl_register_driver(struct pfctl_driver *driver_data)
+{
+	int ret;
+
+	if (pdriver)
+		return -EEXIST;
+
+	if (!(driver_data->supported_l1d_prefetcher & ANYPF) &&
+	    !(driver_data->supported_l2_prefetcher & ANYPF))
+		return -EINVAL;
+
+	if ((driver_data->supported_l1d_prefetcher & SDPF) ||
+	    (driver_data->supported_l2_prefetcher & SDPF))
+		if (!(driver_data->supported_sdpf_strength & ANY_STRENGTH))
+			return -EINVAL;
+
+	if (!driver_data->read_pfreg || !driver_data->write_pfreg)
+		return -EINVAL;
+
+	pdriver = driver_data;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "base/pfctl:online",
+				pfctl_online, pfctl_prepare_down);
+	if (ret < 0) {
+		pr_err("failed to register hotplug callbacks\n");
+		pdriver = NULL;
+		return ret;
+	}
+
+	hp_online = ret;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pfctl_register_driver);
+
+/**
+ * pfctl_unregister_driver - unregister the Hardware Prefetch Control driver
+ * @driver_data: Used to verify that this function is called by the driver that
+ *               called pfctl_register_driver by determining if driver_data is
+ *               the same.
+ *
+ * Context: Any context.
+ * Return: nothing.
+ */
+void pfctl_unregister_driver(struct pfctl_driver *driver_data)
+{
+	if (!pdriver || (driver_data != pdriver))
+		return;
+
+	cpuhp_remove_state(hp_online);
+
+	pdriver = NULL;
+}
+EXPORT_SYMBOL_GPL(pfctl_unregister_driver);
diff --git a/include/linux/pfctl.h b/include/linux/pfctl.h
new file mode 100644
index 000000000000..9340b23f3d03
--- /dev/null
+++ b/include/linux/pfctl.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PFCTL_H
+#define _LINUX_PFCTL_H
+
+#define PFCTL_ENABLE_VAL		1
+#define PFCTL_DISABLE_VAL		0
+#define PFCTL_STRONG_VAL		0
+#define PFCTL_WEAK_VAL			1
+#define PFCTL_DIST_AUTO_VAL		0
+
+enum pfctl_attr {
+	HWPF_ENABLE,
+	IPPF_ENABLE,
+	ACLPF_ENABLE,
+	SDPF_ENABLE,
+	SDPF_STRENGTH,
+	SDPF_DIST,
+};
+
+enum prefetcher {
+	HWPF	= BIT(0), /* Hardware Prefetcher */
+	IPPF	= BIT(1), /* IP Prefetcher */
+	ACLPF	= BIT(2), /* Adjacent Cache Line Prefetcher */
+	SDPF	= BIT(3), /* Stream Detect Prefetcher */
+	ANYPF	= HWPF|IPPF|ACLPF|SDPF,
+};
+
+enum strength {
+	STRENGTH_STRONG	= BIT(0),
+	STRENGTH_WEAK	= BIT(1),
+	ANY_STRENGTH	= STRENGTH_STRONG|STRENGTH_WEAK,
+};
+
+struct pfctl_driver {
+	unsigned int supported_l1d_prefetcher;
+	unsigned int supported_l2_prefetcher;
+
+	unsigned int supported_sdpf_strength; /* Set this if support SDPF */
+
+	int (*read_pfreg)(enum pfctl_attr pattr, unsigned int cpu,
+			  unsigned int level, u64 *val);
+	int (*write_pfreg)(enum pfctl_attr pattr, unsigned int cpu,
+			   unsigned int level, u64 val);
+};
+
+int pfctl_register_driver(struct pfctl_driver *driver_data);
+void pfctl_unregister_driver(struct pfctl_driver *driver_data);
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build hardware prefetch control core driver
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 3/9] soc: fujitsu: Add hardware prefetch control support for A64FX Kohei Tarumizu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This adds Kconfig/Makefile to build hardware prefetch control core
driver. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS           | 6 ++++++
 drivers/base/Kconfig  | 9 +++++++++
 drivers/base/Makefile | 1 +
 3 files changed, 16 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 40fa1955ca3f..f6640dc053c0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8615,6 +8615,12 @@ F:	include/linux/hwmon*.h
 F:	include/trace/events/hwmon*.h
 K:	(devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
 
+HARDWARE PREFETCH CONTROL DRIVERS
+M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
+S:	Maintained
+F:	drivers/base/pfctl.c
+F:	include/linux/pfctl.h
+
 HARDWARE RANDOM NUMBER GENERATOR CORE
 M:	Matt Mackall <mpm@selenic.com>
 M:	Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 6f04b831a5c0..8f8a69e7f645 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -230,4 +230,13 @@ config GENERIC_ARCH_NUMA
 	  Enable support for generic NUMA implementation. Currently, RISC-V
 	  and ARM64 use it.
 
+config HWPF_CONTROL
+	bool "Hardware Prefetch Control driver"
+	help
+	  This driver allows user to control CPU's Hardware Prefetch behavior.
+	  If the machine supports this behavior, it provides a sysfs interface.
+
+	  See Documentation/ABI/testing/sysfs-devices-system-cpu for more
+	  information.
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 02f7f1358e86..13f3a0ddf3d1 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
 obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
 obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
 obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
+obj-$(CONFIG_HWPF_CONTROL)	+= pfctl.o
 
 obj-y			+= test/
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/9] soc: fujitsu: Add hardware prefetch control support for A64FX
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build " Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This adds module init/exit code, and creates sysfs attribute files for
"stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
and "stream_detect_prefetcher_dist". This driver works only if part
number is FUJITSU_CPU_PART_A64FX at this point. The details of the
registers to be read and written in this patch are described below.

"https://github.com/fujitsu/A64FX/tree/master/doc/"
    A64FX_Specification_HPC_Extension_v1_EN.pdf

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/soc/fujitsu/a64fx-pfctl.c | 356 ++++++++++++++++++++++++++++++
 1 file changed, 356 insertions(+)
 create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c

diff --git a/drivers/soc/fujitsu/a64fx-pfctl.c b/drivers/soc/fujitsu/a64fx-pfctl.c
new file mode 100644
index 000000000000..0e072592be73
--- /dev/null
+++ b/drivers/soc/fujitsu/a64fx-pfctl.c
@@ -0,0 +1,356 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * A64FX Hardware Prefetch Control support
+ */
+
+#include <asm/cputype.h>
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/module.h>
+#include <linux/pfctl.h>
+#include <linux/parser.h>
+
+struct pfctl_driver a64fx_pfctl_driver;
+
+/*
+ * Constants for these add the "A64FX_SDPF" prefix to the name described in
+ * section "1.3.4.2. IMP_PF_STREAM_DETECT_CTRL_EL0" of "A64FX specification".
+ * (https://github.com/fujitsu/A64FX/tree/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf")
+ * See this document for register specification details.
+ */
+#define A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0	sys_reg(3, 3, 11, 4, 0)
+#define A64FX_SDPF_V					BIT_ULL(63)
+#define A64FX_SDPF_L1PF_DIS				BIT_ULL(59)
+#define A64FX_SDPF_L2PF_DIS				BIT_ULL(58)
+#define A64FX_SDPF_L1W					BIT_ULL(55)
+#define A64FX_SDPF_L2W					BIT_ULL(54)
+#define A64FX_SDPF_L1_DIST				GENMASK_ULL(27, 24)
+#define A64FX_SDPF_L2_DIST				GENMASK_ULL(19, 16)
+
+#define A64FX_SDPF_MIN_DIST_L1				256
+#define A64FX_SDPF_MIN_DIST_L2				1024
+
+struct a64fx_read_info {
+	enum pfctl_attr pattr;
+	u64 val;
+	unsigned int level;
+	int ret;
+};
+
+struct a64fx_write_info {
+	enum pfctl_attr pattr;
+	u64 val;
+	unsigned int level;
+	int ret;
+};
+
+static int a64fx_get_sdpf_enable(u64 reg, unsigned int level)
+{
+	u64 val;
+
+	switch (level) {
+	case 1:
+		val = FIELD_GET(A64FX_SDPF_L1PF_DIS, reg);
+		break;
+	case 2:
+		val = FIELD_GET(A64FX_SDPF_L2PF_DIS, reg);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val == 0)
+		return PFCTL_ENABLE_VAL;
+	else if (val == 1)
+		return PFCTL_DISABLE_VAL;
+	else
+		return -EINVAL;
+}
+
+static int a64fx_modify_sdpf_enable(u64 *reg, unsigned int level, u64 val)
+{
+	if (val == PFCTL_ENABLE_VAL)
+		val = 0;
+	else
+		val = 1;
+
+	switch (level) {
+	case 1:
+		*reg &= ~A64FX_SDPF_L1PF_DIS;
+		*reg |= FIELD_PREP(A64FX_SDPF_L1PF_DIS, val);
+		break;
+	case 2:
+		*reg &= ~A64FX_SDPF_L2PF_DIS;
+		*reg |= FIELD_PREP(A64FX_SDPF_L2PF_DIS, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int a64fx_get_sdpf_strength(u64 reg, unsigned int level)
+{
+	u64 val;
+
+	switch (level) {
+	case 1:
+		val = FIELD_GET(A64FX_SDPF_L1W, reg);
+		break;
+	case 2:
+		val = FIELD_GET(A64FX_SDPF_L2W, reg);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val == 0)
+		return PFCTL_STRONG_VAL;
+	else if (val == 1)
+		return PFCTL_WEAK_VAL;
+	else
+		return -EINVAL;
+}
+
+static int a64fx_modify_sdpf_strength(u64 *reg, unsigned int level, u64 val)
+{
+	if (val == PFCTL_STRONG_VAL)
+		val = 0;
+	else
+		val = 1;
+
+	switch (level) {
+	case 1:
+		*reg &= ~A64FX_SDPF_L1W;
+		*reg |= FIELD_PREP(A64FX_SDPF_L1W, val);
+		break;
+	case 2:
+		*reg &= ~A64FX_SDPF_L2W;
+		*reg |= FIELD_PREP(A64FX_SDPF_L2W, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int a64fx_get_sdpf_dist(u64 reg, unsigned int level)
+{
+	switch (level) {
+	case 1:
+		return FIELD_GET(A64FX_SDPF_L1_DIST, reg) *
+			A64FX_SDPF_MIN_DIST_L1;
+	case 2:
+		return FIELD_GET(A64FX_SDPF_L2_DIST, reg) *
+			A64FX_SDPF_MIN_DIST_L2;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int a64fx_modify_sdpf_dist(u64 *reg, unsigned int level, u64 val)
+{
+	switch (level) {
+	case 1:
+		val = roundup(val, A64FX_SDPF_MIN_DIST_L1) /
+			A64FX_SDPF_MIN_DIST_L1;
+		if (!FIELD_FIT(A64FX_SDPF_L1_DIST, val))
+			return -EINVAL;
+		*reg &= ~A64FX_SDPF_L1_DIST;
+		*reg |= FIELD_PREP(A64FX_SDPF_L1_DIST, val);
+		break;
+	case 2:
+		val = roundup(val, A64FX_SDPF_MIN_DIST_L2) /
+			A64FX_SDPF_MIN_DIST_L2;
+		if (!FIELD_FIT(A64FX_SDPF_L2_DIST, val))
+			return -EINVAL;
+		*reg &= ~A64FX_SDPF_L2_DIST;
+		*reg |= FIELD_PREP(A64FX_SDPF_L2_DIST, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void a64fx_enable_sdpf_verify(u64 *reg)
+{
+	*reg |= FIELD_PREP(A64FX_SDPF_V, 1);
+}
+
+static int a64fx_get_sdpf_params(enum pfctl_attr pattr, u64 reg,
+				 unsigned int level, u64 *val)
+{
+	int ret;
+
+	switch (pattr) {
+	case SDPF_ENABLE:
+		ret = a64fx_get_sdpf_enable(reg, level);
+		break;
+	case SDPF_STRENGTH:
+		ret = a64fx_get_sdpf_strength(reg, level);
+		break;
+	case SDPF_DIST:
+		ret = a64fx_get_sdpf_dist(reg, level);
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	if (ret < 0)
+		return ret;
+	*val = ret;
+
+	return 0;
+}
+
+static int a64fx_modify_pfreg_val(enum pfctl_attr pattr, u64 *reg,
+				  unsigned int level, u64 val)
+{
+	int ret;
+
+	switch (pattr) {
+	case SDPF_ENABLE:
+		ret = a64fx_modify_sdpf_enable(reg, level, val);
+		break;
+	case SDPF_STRENGTH:
+		ret = a64fx_modify_sdpf_strength(reg, level, val);
+		break;
+	case SDPF_DIST:
+		ret = a64fx_modify_sdpf_dist(reg, level, val);
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	if (ret < 0)
+		return ret;
+
+	a64fx_enable_sdpf_verify(reg);
+
+	return 0;
+}
+
+static void _a64fx_read_pfreg(void *info)
+{
+	u64 reg;
+	struct a64fx_read_info *rinfo = info;
+
+	reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	rinfo->ret = a64fx_get_sdpf_params(rinfo->pattr, reg, rinfo->level,
+					   &rinfo->val);
+}
+
+static int a64fx_read_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+			    unsigned int level, u64 *val)
+{
+	struct a64fx_read_info info = {
+		.level = level,
+		.pattr = pattr,
+	};
+
+	smp_call_function_single(cpu, _a64fx_read_pfreg, &info, true);
+	if (info.ret < 0)
+		return info.ret;
+
+	*val = info.val;
+	return 0;
+}
+
+static void _a64fx_write_pfreg(void *info)
+{
+	u64 reg;
+	struct a64fx_write_info *winfo = info;
+
+	reg = read_sysreg_s(A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	winfo->ret = a64fx_modify_pfreg_val(winfo->pattr, &reg, winfo->level,
+					    winfo->val);
+	if (winfo->ret < 0)
+		return;
+
+	write_sysreg_s(reg, A64FX_SDPF_IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	winfo->ret = 0;
+}
+
+static int a64fx_write_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+			     unsigned int level, u64 val)
+{
+	struct a64fx_write_info info = {
+		.level = level,
+		.pattr = pattr,
+		.val = val,
+	};
+
+	smp_call_function_single(cpu, _a64fx_write_pfreg, &info, true);
+	return info.ret;
+}
+
+/*
+ * This driver returns a negative value if it does not support the Hardware
+ * Prefetch Control or if it is running on a VM guest.
+ */
+static int __init setup_pfctl_driver_params(void)
+{
+	unsigned long implementor = read_cpuid_implementor();
+	unsigned long part_number = read_cpuid_part_number();
+
+	if (!is_kernel_in_hyp_mode())
+		return -EINVAL;
+
+	if (implementor != ARM_CPU_IMP_FUJITSU)
+		return -ENODEV;
+
+	switch (part_number) {
+	case FUJITSU_CPU_PART_A64FX:
+		/* A64FX register requires EL2 access */
+		if (!has_vhe())
+			return -EINVAL;
+
+		a64fx_pfctl_driver.supported_l1d_prefetcher = SDPF;
+		a64fx_pfctl_driver.supported_l2_prefetcher = SDPF;
+		a64fx_pfctl_driver.supported_sdpf_strength =
+			STRENGTH_STRONG|STRENGTH_WEAK;
+		a64fx_pfctl_driver.read_pfreg = a64fx_read_pfreg;
+		a64fx_pfctl_driver.write_pfreg = a64fx_write_pfreg;
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int __init a64fx_pfctl_init(void)
+{
+	int ret;
+
+	ret = setup_pfctl_driver_params();
+	if (ret < 0)
+		return ret;
+
+	ret = pfctl_register_driver(&a64fx_pfctl_driver);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void __exit a64fx_pfctl_exit(void)
+{
+	pfctl_unregister_driver(&a64fx_pfctl_driver);
+}
+
+late_initcall(a64fx_pfctl_init);
+module_exit(a64fx_pfctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("A64FX Hardware Prefetch Control Driver");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (2 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 3/9] soc: fujitsu: Add hardware prefetch control support for A64FX Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20 22:14   ` Thomas Gleixner
  2022-04-20  3:02 ` [PATCH v3 5/9] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This adds Kconfig/Makefile to build hardware prefetch control driver
for A64FX support. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS                  |  1 +
 drivers/soc/Kconfig          |  1 +
 drivers/soc/Makefile         |  1 +
 drivers/soc/fujitsu/Kconfig  | 11 +++++++++++
 drivers/soc/fujitsu/Makefile |  2 ++
 5 files changed, 16 insertions(+)
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index f6640dc053c0..b359dcc38be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8619,6 +8619,7 @@ HARDWARE PREFETCH CONTROL DRIVERS
 M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
 S:	Maintained
 F:	drivers/base/pfctl.c
+F:	drivers/soc/fujitsu/a64fx-pfctl.c
 F:	include/linux/pfctl.h
 
 HARDWARE RANDOM NUMBER GENERATOR CORE
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index c5aae42673d3..d87754799d90 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -9,6 +9,7 @@ source "drivers/soc/atmel/Kconfig"
 source "drivers/soc/bcm/Kconfig"
 source "drivers/soc/canaan/Kconfig"
 source "drivers/soc/fsl/Kconfig"
+source "drivers/soc/fujitsu/Kconfig"
 source "drivers/soc/imx/Kconfig"
 source "drivers/soc/ixp4xx/Kconfig"
 source "drivers/soc/litex/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 904eec2a7871..6c8ff1792cda 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SOC_CANAAN)	+= canaan/
 obj-$(CONFIG_ARCH_DOVE)		+= dove/
 obj-$(CONFIG_MACH_DOVE)		+= dove/
 obj-y				+= fsl/
+obj-y				+= fujitsu/
 obj-$(CONFIG_ARCH_GEMINI)	+= gemini/
 obj-y				+= imx/
 obj-y				+= ixp4xx/
diff --git a/drivers/soc/fujitsu/Kconfig b/drivers/soc/fujitsu/Kconfig
new file mode 100644
index 000000000000..d9db05d5055d
--- /dev/null
+++ b/drivers/soc/fujitsu/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "Fujitsu SoC drivers"
+
+config A64FX_HWPF_CONTROL
+	tristate "A64FX Hardware Prefetch Control driver"
+	depends on ARM64 || HWPF_CONTROL
+	help
+	  This adds Hardware Prefetch driver control support for A64FX.
+
+endmenu
diff --git a/drivers/soc/fujitsu/Makefile b/drivers/soc/fujitsu/Makefile
new file mode 100644
index 000000000000..35e284a548bb
--- /dev/null
+++ b/drivers/soc/fujitsu/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_A64FX_HWPF_CONTROL)	+= a64fx-pfctl.o
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/9] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (3 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This patch create a cache sysfs directory without ACPI PPTT if the
CONFIG_HWPF_CONTROL is true. This patch use only the level/type
information obtained from CLIDR_EL1, and don't use CCSIDR information.

Hardware prefetch control driver need cache sysfs directory and cache
level/type information. In ARM processor, these information can be
obtained from the register even without PPTT. Therefore, we set the
cpu_map_populated to true to create cache sysfs directory if the
machine doesn't have PPTT.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 arch/arm64/kernel/cacheinfo.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 587543c6c51c..039ec32d0b3d 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -43,6 +43,21 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#if defined(CONFIG_HWPF_CONTROL)
+static bool acpi_has_pptt(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	acpi_put_table(table);
+	return true;
+}
+#endif
+
 int init_cache_level(unsigned int cpu)
 {
 	unsigned int ctype, level, leaves, fw_level;
@@ -95,5 +110,19 @@ int populate_cache_leaves(unsigned int cpu)
 			ci_leaf_init(this_leaf++, type, level);
 		}
 	}
+
+#if defined(CONFIG_HWPF_CONTROL)
+	/*
+	 * Hardware prefetch functions need cache sysfs directory and cache
+	 * level/type information. In ARM processor, these information can be
+	 * obtained from registers even without PPTT. Therefore, we set the
+	 * cpu_map_populated to true to create cache sysfs directory, if the
+	 * machine doesn't have PPTT.
+	 **/
+	if (!acpi_disabled)
+		if (!acpi_has_pptt())
+			this_cpu_ci->cpu_map_populated = true;
+#endif
+
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (4 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 5/9] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20 20:56   ` Dave Hansen
  2022-04-25 23:17   ` Reinette Chatre
  2022-04-20  3:02 ` [PATCH v3 7/9] x86: Add hardware prefetch control support for x86 Kohei Tarumizu
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

The current pseudo_lock.c code overwrittes the value of the
MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
Therefore, modify it to save and restore the original values.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index db813f819ad6..2d713c20f55f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -420,6 +420,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
 	struct pseudo_lock_region *plr = rdtgrp->plr;
 	u32 rmid_p, closid_p;
 	unsigned long i;
+	u64 saved_msr;
 #ifdef CONFIG_KASAN
 	/*
 	 * The registers used for local register variables are also used
@@ -463,6 +464,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
 	 * the buffer and evict pseudo-locked memory read earlier from the
 	 * cache.
 	 */
+	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
 	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
@@ -514,7 +516,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
 	__wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
 
 	/* Re-enable the hardware prefetcher(s) */
-	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	wrmsrl(MSR_MISC_FEATURE_CONTROL, saved_msr);
 	local_irq_enable();
 
 	plr->thread_done = 1;
@@ -873,12 +875,14 @@ static int measure_cycles_lat_fn(void *_plr)
 	struct pseudo_lock_region *plr = _plr;
 	unsigned long i;
 	u64 start, end;
+	u32 saved_low, saved_high;
 	void *mem_r;
 
 	local_irq_disable();
 	/*
 	 * Disable hardware prefetchers.
 	 */
+	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
 	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
 	mem_r = READ_ONCE(plr->kmem);
 	/*
@@ -895,7 +899,7 @@ static int measure_cycles_lat_fn(void *_plr)
 		end = rdtsc_ordered();
 		trace_pseudo_lock_mem_latency((u32)(end - start));
 	}
-	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
 	local_irq_enable();
 	plr->thread_done = 1;
 	wake_up_interruptible(&plr->lock_thread_wq);
@@ -945,6 +949,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	unsigned long i;
 	void *mem_r;
 	u64 tmp;
+	u32 saved_low, saved_high;
 
 	miss_event = perf_event_create_kernel_counter(miss_attr, plr->cpu,
 						      NULL, NULL, NULL);
@@ -973,6 +978,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	/*
 	 * Disable hardware prefetchers.
 	 */
+	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
 	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
 
 	/* Initialize rest of local variables */
@@ -1031,7 +1037,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 */
 	rmb();
 	/* Re-enable hardware prefetchers */
-	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
 	local_irq_enable();
 out_hit:
 	perf_event_release_kernel(hit_event);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 7/9] x86: Add hardware prefetch control support for x86
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (5 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20 22:27   ` Thomas Gleixner
  2022-04-20  3:02 ` [PATCH v3 8/9] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
  8 siblings, 1 reply; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This adds module init/exit code, and creates sysfs attribute file
"hardware_prefetcher_enable", "ip_prefetcher_enable" and
"adjacent_cache_line_prefetcher_enable" for x86. This driver works
only if the model is INTEL_FAM6_BROADWELL_X at this point.

If you would like to support a new model with the same register
specifications as INTEL_FAM6_BROADWELL_X, it is possible to add the
model settings to array of broadwell_cpu_ids[].

The details of the registers to be read and written in this patch are
described below:

"https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html"
    Volume 4

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 arch/x86/kernel/cpu/x86-pfctl.c | 347 ++++++++++++++++++++++++++++++++
 1 file changed, 347 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c

diff --git a/arch/x86/kernel/cpu/x86-pfctl.c b/arch/x86/kernel/cpu/x86-pfctl.c
new file mode 100644
index 000000000000..153b7a46ba80
--- /dev/null
+++ b/arch/x86/kernel/cpu/x86-pfctl.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 FUJITSU LIMITED
+ *
+ * x86 Hardware Prefetch Control support
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/pfctl.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
+
+struct pfctl_driver x86_pfctl_driver;
+
+/**************************************
+ * Intle BROADWELL support
+ **************************************/
+
+/*
+ * The register specification for each bits of Intel BROADWELL is as
+ * follow:
+ *
+ * [0]    L2 Hardware Prefetcher Disable (R/W)
+ * [1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
+ * [2]    DCU Hardware Prefetcher Disable (R/W)
+ * [3]    DCU IP Prefetcher Disable (R/W)
+ * [63:4] Reserved
+ *
+ * See "Intel 64 and IA-32 Architectures Software Developer's Manual"
+ * (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
+ * for register specification details.
+ */
+#define BROADWELL_L2_HWPF_FIELD		BIT_ULL(0)
+#define BROADWELL_L2_ACLPF_FIELD	BIT_ULL(1)
+#define BROADWELL_DCU_HWPF_FIELD	BIT_ULL(2)
+#define BROADWELL_DCU_IPPF_FIELD	BIT_ULL(3)
+
+struct broadwell_read_info {
+	enum pfctl_attr pattr;
+	u64 val;
+	unsigned int level;
+	int ret;
+};
+
+struct broadwell_write_info {
+	enum pfctl_attr pattr;
+	u64 val;
+	unsigned int level;
+	int ret;
+};
+
+static int broadwell_get_hwpf_enable(u64 reg, unsigned int level)
+{
+	u64 val;
+
+	switch (level) {
+	case 1:
+		val = FIELD_GET(BROADWELL_DCU_HWPF_FIELD, reg);
+		break;
+	case 2:
+		val = FIELD_GET(BROADWELL_L2_HWPF_FIELD, reg);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val == 0)
+		return PFCTL_ENABLE_VAL;
+	else if (val == 1)
+		return PFCTL_DISABLE_VAL;
+	else
+		return -EINVAL;
+}
+
+static int broadwell_modify_hwpf_enable(u64 *reg, unsigned int level, u64 val)
+{
+	if (val == PFCTL_ENABLE_VAL)
+		val = 0;
+	else
+		val = 1;
+
+	switch (level) {
+	case 1:
+		*reg &= ~BROADWELL_DCU_HWPF_FIELD;
+		*reg |= FIELD_PREP(BROADWELL_DCU_HWPF_FIELD, val);
+		break;
+	case 2:
+		*reg &= ~BROADWELL_L2_HWPF_FIELD;
+		*reg |= FIELD_PREP(BROADWELL_L2_HWPF_FIELD, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int broadwell_get_ippf_enable(u64 reg, unsigned int level)
+{
+	u64 val;
+
+	switch (level) {
+	case 1:
+		val = FIELD_GET(BROADWELL_DCU_IPPF_FIELD, reg);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val == 0)
+		return PFCTL_ENABLE_VAL;
+	else if (val == 1)
+		return PFCTL_DISABLE_VAL;
+	else
+		return -EINVAL;
+}
+
+static int broadwell_modify_ippf_enable(u64 *reg, unsigned int level, u64 val)
+{
+	if (val == PFCTL_ENABLE_VAL)
+		val = 0;
+	else
+		val = 1;
+
+	switch (level) {
+	case 1:
+		*reg &= ~BROADWELL_DCU_IPPF_FIELD;
+		*reg |= FIELD_PREP(BROADWELL_DCU_IPPF_FIELD, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int broadwell_get_aclpf_enable(u64 reg, unsigned int level)
+{
+	u64 val;
+
+	switch (level) {
+	case 2:
+		val = FIELD_GET(BROADWELL_L2_ACLPF_FIELD, reg);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val == 0)
+		return PFCTL_ENABLE_VAL;
+	else if (val == 1)
+		return PFCTL_DISABLE_VAL;
+	else
+		return -EINVAL;
+}
+
+static int broadwell_modify_aclpf_enable(u64 *reg, unsigned int level, u64 val)
+{
+	if (val == PFCTL_ENABLE_VAL)
+		val = 0;
+	else
+		val = 1;
+
+	switch (level) {
+	case 2:
+		*reg &= ~BROADWELL_L2_ACLPF_FIELD;
+		*reg |= FIELD_PREP(BROADWELL_L2_ACLPF_FIELD, val);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int broadwell_get_pfctl_params(enum pfctl_attr pattr, u64 reg,
+				      unsigned int level, u64 *val)
+{
+	int ret;
+
+	switch (pattr) {
+	case HWPF_ENABLE:
+		ret = broadwell_get_hwpf_enable(reg, level);
+		break;
+	case IPPF_ENABLE:
+		ret = broadwell_get_ippf_enable(reg, level);
+		break;
+	case ACLPF_ENABLE:
+		ret = broadwell_get_aclpf_enable(reg, level);
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	if (ret < 0)
+		return ret;
+	*val = ret;
+
+	return 0;
+}
+
+static int broadwell_modify_pfreg(enum pfctl_attr pattr, u64 *reg,
+				  unsigned int level, u64 val)
+{
+	int ret;
+
+	switch (pattr) {
+	case HWPF_ENABLE:
+		ret = broadwell_modify_hwpf_enable(reg, level, val);
+		break;
+	case IPPF_ENABLE:
+		ret = broadwell_modify_ippf_enable(reg, level, val);
+		break;
+	case ACLPF_ENABLE:
+		ret = broadwell_modify_aclpf_enable(reg, level, val);
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void _broadwell_read_pfreg(void *info)
+{
+	u64 reg;
+	struct broadwell_read_info *rinfo = info;
+
+	rdmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+
+	rinfo->ret = broadwell_get_pfctl_params(rinfo->pattr, reg, rinfo->level,
+						&rinfo->val);
+	if (rinfo->ret < 0)
+		return;
+}
+
+static int broadwell_read_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+				unsigned int level, u64 *val)
+{
+	struct broadwell_read_info info = {
+		.level = level,
+		.pattr = pattr,
+	};
+
+	smp_call_function_single(cpu, _broadwell_read_pfreg, &info, true);
+	if (info.ret < 0)
+		return info.ret;
+
+	*val = info.val;
+	return 0;
+}
+
+static void _broadwell_write_pfreg(void *info)
+{
+	u64 reg;
+	struct broadwell_write_info *winfo = info;
+
+	rdmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+
+	winfo->ret = broadwell_modify_pfreg(winfo->pattr, &reg, winfo->level,
+					    winfo->val);
+	if (winfo->ret < 0)
+		return;
+
+	wrmsrl(MSR_MISC_FEATURE_CONTROL, reg);
+}
+
+static int broadwell_write_pfreg(enum pfctl_attr pattr, unsigned int cpu,
+				 unsigned int level, u64 val)
+{
+	struct broadwell_write_info info = {
+		.level = level,
+		.pattr = pattr,
+		.val = val,
+	};
+
+	smp_call_function_single(cpu, _broadwell_write_pfreg, &info, true);
+	return info.ret;
+}
+
+/*
+ * In addition to BROADWELL_X, NEHALEM and others have same register
+ * specifications as those represented by BROADWELL_XXX_FIELD.
+ * If you want to add support for these processor, add the new target model
+ * here.
+ */
+static const struct x86_cpu_id broadwell_cpu_ids[] = {
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, NULL),
+	{}
+};
+
+/***** end of Intel BROADWELL support *****/
+
+/*
+ * This driver returns a negative value if it does not support the Hardware
+ * Prefetch Control or if it is running on a VM guest.
+ */
+static int __init setup_pfctl_driver_params(void)
+{
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+		return -EINVAL;
+
+	if (x86_match_cpu(broadwell_cpu_ids)) {
+		x86_pfctl_driver.supported_l1d_prefetcher = HWPF|IPPF;
+		x86_pfctl_driver.supported_l2_prefetcher = HWPF|ACLPF;
+		x86_pfctl_driver.read_pfreg = broadwell_read_pfreg;
+		x86_pfctl_driver.write_pfreg = broadwell_write_pfreg;
+	} else {
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int __init x86_pfctl_init(void)
+{
+	int ret;
+
+	ret = setup_pfctl_driver_params();
+	if (ret < 0)
+		return ret;
+
+	ret = pfctl_register_driver(&x86_pfctl_driver);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void __exit x86_pfctl_exit(void)
+{
+	pfctl_unregister_driver(&x86_pfctl_driver);
+}
+
+late_initcall(x86_pfctl_init);
+module_exit(x86_pfctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("x86 Hardware Prefetch Control Driver");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 8/9] x86: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (6 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 7/9] x86: Add hardware prefetch control support for x86 Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  2022-04-20  3:02 ` [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
  8 siblings, 0 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This adds Kconfig/Makefile to build hardware prefetch control driver
for x86 support. This also adds a MAINTAINERS entry.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS                  | 1 +
 arch/x86/Kconfig             | 6 ++++++
 arch/x86/kernel/cpu/Makefile | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b359dcc38be3..4de219599e52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8618,6 +8618,7 @@ K:	(devm_)?hwmon_device_(un)?register(|_with_groups|_with_info)
 HARDWARE PREFETCH CONTROL DRIVERS
 M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
 S:	Maintained
+F:	arch/x86/kernel/cpu/x86-pfctl.c
 F:	drivers/base/pfctl.c
 F:	drivers/soc/fujitsu/a64fx-pfctl.c
 F:	include/linux/pfctl.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b0142e01002e..1ef47c29c338 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1359,6 +1359,12 @@ config X86_CPUID
 	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
 	  /dev/cpu/31/cpuid.
 
+config X86_HWPF_CONTROL
+	tristate "x86 Hardware Prefetch Control support"
+	depends on HWPF_CONTROL
+	help
+	  This adds Hardware Prefetch driver control support for X86.
+
 choice
 	prompt "High Memory Support"
 	default HIGHMEM4G
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be..1aa13dad17a3 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -56,6 +56,8 @@ obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr-watchdog.o
 obj-$(CONFIG_HYPERVISOR_GUEST)		+= vmware.o hypervisor.o mshyperv.o
 obj-$(CONFIG_ACRN_GUEST)		+= acrn.o
 
+obj-$(CONFIG_X86_HWPF_CONTROL)		+= x86-pfctl.o
+
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of hardware prefetch control driver
  2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
                   ` (7 preceding siblings ...)
  2022-04-20  3:02 ` [PATCH v3 8/9] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-04-20  3:02 ` Kohei Tarumizu
  8 siblings, 0 replies; 21+ messages in thread
From: Kohei Tarumizu @ 2022-04-20  3:02 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: tarumizu.kohei

This describes the sysfs interface implemented by the hardware prefetch
control driver.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 98 +++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2ad01cad7f1c..0da4c1bac51e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -688,3 +688,101 @@ Description:
 		(RO) the list of CPUs that are isolated and don't
 		participate in load balancing. These CPUs are set by
 		boot parameter "isolcpus=".
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/hardware_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/ip_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/adjacent_cache_line_prefetcher_enable
+Date:		March 2022
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Parameters for some Intel CPU's hardware prefetch control
+
+		This sysfs interface provides Hardware Prefetch control
+		attribute for some Intel processors. Attributes are only
+		present if the particular cache implements the relevant
+		prefetcher controls.
+
+		*_prefetcher_enable:
+		    (RW) control this prefetcher's enablement state.
+		    Read returns current status:
+			0: this prefetcher is disabled
+			1: this prefetcher is enabled
+
+		- Attribute mapping
+
+		    Some Intel processors have MSR 0x1a4. This register has several
+		    specifications depending on the model. This interface provides
+		    a one-to-one attribute file to control all the tunable
+		    parameters the CPU provides of the following.
+
+			- "* Hardware Prefetcher Disable (R/W)"
+			    corresponds to the "hardware_prefetcher_enable"
+
+			- "* Adjacent Cache Line Prefetcher Disable (R/W)"
+			    corresponds to the "adjacent_cache_line_prefetcher_enable"
+
+			- "* IP Prefetcher Disable (R/W)"
+			    corresponds to the "ip_prefetcher_enable"
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_enable
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_strength_available
+		/sys/devices/system/cpu/cpu*/cache/index*/prefetch_control/stream_detect_prefetcher_dist
+Date:		March 2022
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Parameters for A64FX's hardware prefetch control
+
+		This sysfs interface provides Hardware Prefetch control
+		attribute for the processor A64FX. Attributes are only
+		present if the particular cache implements the relevant
+		prefetcher controls.
+
+		stream_detect_prefetcher_enable:
+		    (RW) control the prefetcher's enablement state.
+		    Read returns current status:
+			0: this prefetcher is disabled
+			1: this prefetcher is enabled
+
+		stream_detect_prefetcher_strength:
+		    (RW) control the prefetcher operation's strongness state.
+		    Read returns current status:
+			weak: prefetch operation is weak
+			strong: prefetch operation is strong
+
+		    Strong prefetch operation is surely executed, if there is
+		    no corresponding data in cache.
+		    Weak prefetch operation allows the hardware not to execute
+		    operation depending on hardware state.
+
+
+		stream_detect_prefetcher_strength_available:
+		    (RO) displays a space separated list of available strongness
+		    state.
+
+		stream_detect_prefetcher_dist:
+		    (RW) control the prefetcher distance value.
+		    Read return current prefetcher distance value in bytes
+		    or the string "auto".
+
+		    Write either a value in byte or the string "auto" to this
+		    parameter. If you write a value less than multiples of a
+		    specific value, it is rounded up.
+
+		    The string "auto" have a special meaning. This means that
+		    instead of setting dist to a user-specified value, it
+		    operates using hardware-specific values.
+
+		- Attribute mapping
+
+		    The processor A64FX has register IMP_PF_STREAM_DETECT_CTRL_EL0
+		    for Hardware Prefetch Control. This attribute maps each
+		    specification to the following.
+
+			- "L*PF_DIS": enablement of hardware prefetcher
+			    corresponds to the "stream_detect_prefetcher_enable"
+
+			- "L*W": strongness of hardware prefetcher
+			    corresponds to "stream_detect_prefetcher_strength"
+			    and "stream_detect_prefetcher_strength_available"
+
+			- "L*_DIST": distance of hardware prefetcher
+			    corresponds to the "stream_detect_prefetcher_dist"
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register
  2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
@ 2022-04-20 20:56   ` Dave Hansen
  2022-04-22 12:01     ` tarumizu.kohei
  2022-04-25 23:17   ` Reinette Chatre
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2022-04-20 20:56 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel,
	fenghua.yu, reinette.chatre

On 4/19/22 20:02, Kohei Tarumizu wrote:
> The current pseudo_lock.c code overwrittes the value of the
> MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
> Therefore, modify it to save and restore the original values.

It would be nice to mention that the wrmsr()'s in this patch are only
done inside of an interrupt-disabled region.  Without that little tidbit
of information, it's not obviously correct that the smp_call_function()
in patch 8/9 is correct.

The x86 code here looks reasonable otherwise.  It's a little bit of a
shame that this is *ONLY* for BROADWELL_X for now, but I assume you were
just conservative about it because that's all you have to test on.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver
  2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
@ 2022-04-20 21:40   ` Thomas Gleixner
  2022-04-22 12:10     ` tarumizu.kohei
  2022-04-21  6:18   ` Greg KH
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2022-04-20 21:40 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, mingo, bp, dave.hansen,
	x86, hpa, linux-arm-kernel, linux-kernel, fenghua.yu,
	reinette.chatre
  Cc: tarumizu.kohei, Greg Kroah-Hartman, Rafael J. Wysocki

Kohei,

Cc+: driver core maintainers

On Wed, Apr 20 2022 at 12:02, Kohei Tarumizu wrote:
> +static const char dist_auto_string[] = "auto";
> +static const char strength_strong_string[] = "strong";
> +static const char strength_weak_string[] = "weak";

This is A64FX specific.

> +pfctl_enable_show(hardware_prefetcher, HWPF_ENABLE);
> +pfctl_enable_show(ip_prefetcher, IPPF_ENABLE);
> +pfctl_enable_show(adjacent_cache_line_prefetcher, ACLPF_ENABLE);

This is x86 specific.

> +pfctl_enable_show(stream_detect_prefetcher, SDPF_ENABLE);

This is A64FX specific.

So why is this in generic code and why needs x86 to populate the A64FX
bits and make them invisible? Same the other way round.

Now imagine a few other [sub]architectures come around and add their
specific prefetcher control knobs, strings and whatever. That's going to
be unmaintainable in no time.

This is not comparable to the cache attributes where the architectures
share a significant amount of subsets. You just demonstrated that X86
and A64FX share not even a single entry.

The core code should provide infrastructure to manage the
[sub]architecture specific control files at different cache levels.

Not more not less.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver
  2022-04-20  3:02 ` [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
@ 2022-04-20 22:14   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2022-04-20 22:14 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, mingo, bp, dave.hansen,
	x86, hpa, linux-arm-kernel, linux-kernel, fenghua.yu,
	reinette.chatre
  Cc: tarumizu.kohei

On Wed, Apr 20 2022 at 12:02, Kohei Tarumizu wrote:
> +
> +menu "Fujitsu SoC drivers"
> +
> +config A64FX_HWPF_CONTROL
> +	tristate "A64FX Hardware Prefetch Control driver"
> +	depends on ARM64 || HWPF_CONTROL

  && HWPF_CONTROL

No point in enabling this on x86.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 7/9] x86: Add hardware prefetch control support for x86
  2022-04-20  3:02 ` [PATCH v3 7/9] x86: Add hardware prefetch control support for x86 Kohei Tarumizu
@ 2022-04-20 22:27   ` Thomas Gleixner
  2022-04-22 12:16     ` tarumizu.kohei
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2022-04-20 22:27 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, mingo, bp, dave.hansen,
	x86, hpa, linux-arm-kernel, linux-kernel, fenghua.yu,
	reinette.chatre
  Cc: tarumizu.kohei

Kohei!

On Wed, Apr 20 2022 at 12:02, Kohei Tarumizu wrote:
> +#define BROADWELL_L2_HWPF_FIELD		BIT_ULL(0)
> +#define BROADWELL_L2_ACLPF_FIELD	BIT_ULL(1)
> +#define BROADWELL_DCU_HWPF_FIELD	BIT_ULL(2)
> +#define BROADWELL_DCU_IPPF_FIELD	BIT_ULL(3)

Prefetch control is supported on a lot of CPU models and all except Xeon
PHI have the same MSR layout. ATOMs do not support L2_ACL and DCU_IP,
but that's it. So there is absolutely nothing broadwell specific here.

> +static int broadwell_get_hwpf_enable(u64 reg, unsigned int level)
> +static int broadwell_modify_hwpf_enable(u64 *reg, unsigned int level, u64 val)
> +static int broadwell_get_ippf_enable(u64 reg, unsigned int level)
> +static int broadwell_modify_ippf_enable(u64 *reg, unsigned int level, u64 val)
> +static int broadwell_get_aclpf_enable(u64 reg, unsigned int level)
> +static int broadwell_modify_aclpf_enable(u64 *reg, unsigned int level, u64 val)

Why do we need six explicit functions, which are pretty much copy and paste?

The only difference is the bit they operate on. It's just a matter of
proper wrapper structs.

/* Global */
struct pfctl_attr {
	unsigned int		level;
	unsigned int		type;
	struct device_attribute	**attr;
};

/* x86 */
enum {
	TYPE_L12_BASE,
	TYPE_L12_PLUS,
	TYPE_L12_XPHI,
};

struct x86_pfctl_attr {
	struct device_attribute		attr;
	u64				mask;
};

static ssize_t pfctl_show(struct device *pfctl_dev, struct device_attribute *attr, char *buf)
{
	struct x86_pfctl_attr *xa = container_of(attr, struct x86_pfctl_attr, attr);
	int cpu = pfctl_dev_get_cpu(pfctl_dev);
	u64 val;

	rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, &val);
	return sprintf(buf, "%d\n", val & xa->mask ? 0 : 1);
}

static DEFINE_MUTEX(pfctl_mutex);

static ssize_t pfctl_store(struct device *cache_dev, struct device_attribute *attr,
			   const char *buf, size_t size)
{
	struct x86_pfctl_attr *xa = container_of(attr, struct x86_pfctl_attr, attr);
	int cpu = pfctl_dev_get_cpu(pfctl_dev);
	bool enable;
	u64 val;

	if (strtobool(buf, &enable) < 0)
		return -EINVAL;

	/* MSR_MISC_FEATURE_CONTROL is per core, not per thread */
	mutex_lock(&pfctl_mutex);
	rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, &val);
	if (enable)
		val &= ~xa->mask;
	else
		val |= xa->mask;
	wrmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, val);
	mutex_unlock(&pfctl_mutex);
	return 0;
}

#define PFCTL_ATTR(_name, _level, _type, _bit)				\
	struct x86_pfctl_attr attr_l##_level##_##_name = {		\
		.attr	= __ATTR(_name, 0600, pfctl_show, pfctl_store),	\
		.mask = BIT_ULL(_bit), }

static PFCTL_ATTR(prefetch,     1, CACHE_TYPE_DATA,    2);
static PFCTL_ATTR(prefetch,     2, CACHE_TYPE_UNIFIED, 0);
static PFCTL_ATTR(prefetch_ip,  1, CACHE_TYPE_DATA,    3);
static PFCTL_ATTR(prefetch_acl, 2, CACHE_TYPE_UNIFIED, 1);

static struct device_attribute *l1_attrs[] __ro_after_init = {
	&attr_l1_prefetch.attr,
	&attr_l1_prefetch_ip.attr,
	NULL,
};

static struct device_attribute *l2_attrs[] __ro_after_init = {
	&attr_l2_prefetch.attr,
	&attr_l2_prefetch_acl.attr,
	NULL,
};

static const struct pfctl_group pfctl_groups[] = {
	{
		.level = 1,
		.type = CACHE_TYPE_DATA,
		.attrs = l1_attrs,
	},
	{
		.level = 2,
		.type = CACHE_TYPE_UNIFIED,
		.attrs = l2_attrs,
	},
	{
		.attrs = NULL,
	},
};

static const __initconst struct x86_cpu_id pfctl_match[] = {
	X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT_D,	TYPE_L12_BASE),
	X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT,	TYPE_L12_BASE),
	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,		TYPE_L12_PLUS),
	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL,		TYPE_L12_PLUS),
	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,	TYPE_L12_XPHI),
        { },
};

static __init int pfctl_init(void)
{
	const struct x86_cpu_id *m;

	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
		return -ENODEV;

	m = x86_match_cpu(pfctl_match);
	if (!m)
		return -ENODEV;

	switch (m->driver_data) {
	case TYPE_L12_BASE:
		l1_attrs[1] = NULL;
		l2_attrs[1] = NULL;
		break;
	case TYPE_L12_PLUS:
		break;
	case TYPE_L12_XPHI:
		attr_l1_prefetch.mask = BIT_ULL(0);
		attr_l2_prefetch.mask = BIT_ULL(1);
		break;
	default:
		return -ENODEV;
	};

	return pfctl_register_attrs(pfctl_groups);
}
late_initcall(pfctl_init);

See? One show() and one store() function which operates directly at the
attribute level and supports all known variants of bits in the control
MSR. No magic global constants, no visible management, no hardcoded
type/level relations... Simple and straight forward.

All what the core code needs to do is to populate the attributes in the
sys/.../cache/index$N/ directories when level and type matches.

I'm pretty sure you can simplify the A64FX code in a very similar way.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver
  2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
  2022-04-20 21:40   ` Thomas Gleixner
@ 2022-04-21  6:18   ` Greg KH
  2022-04-22 12:30     ` tarumizu.kohei
  1 sibling, 1 reply; 21+ messages in thread
From: Greg KH @ 2022-04-21  6:18 UTC (permalink / raw)
  To: Kohei Tarumizu
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre

On Wed, Apr 20, 2022 at 12:02:15PM +0900, Kohei Tarumizu wrote:
> This driver adds the register/unregister function to create the
> "prefetch_control" directory and some attribute files. Attributes are
> only present if the particular cache implements the relevant
> prefetcher controls
> 
> If the architecture has control of the CPU's hardware prefetcher
> behavior, use this function to create sysfs. When registering, it is
> necessary to provide what type of hardware prefetcher is supported
> and how to read/write to the register.
> 
> Following patches add support for A64FX and x86.
> 
> Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
> ---
>  drivers/base/pfctl.c  | 458 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pfctl.h |  49 +++++
>  2 files changed, 507 insertions(+)
>  create mode 100644 drivers/base/pfctl.c
>  create mode 100644 include/linux/pfctl.h

Thanks to Thomas for pointing this change out to me.

Why did you not use get_maintainer.pl on your patch?  You are adding
files here that you want _me_ to maintain for the next 25+ years, yet
not asking for my review?  That's not nice, and for that reason alone I
would not accept this change.

Also, this is very hardware-specific, which is not a good thing for code
in drivers/base/  See the mess we have in the topology driver core code
for examples of that mess :(

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register
  2022-04-20 20:56   ` Dave Hansen
@ 2022-04-22 12:01     ` tarumizu.kohei
  0 siblings, 0 replies; 21+ messages in thread
From: tarumizu.kohei @ 2022-04-22 12:01 UTC (permalink / raw)
  To: 'Dave Hansen',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre

Thanks for the comment.

> It would be nice to mention that the wrmsr()'s in this patch are only done inside of
> an interrupt-disabled region.  Without that little tidbit of information, it's not
> obviously correct that the smp_call_function() in patch 8/9 is correct.

I would like to add description that the wrmsr()'s in this patch are
only done inside of an interrupt-disabled region.
 
> The x86 code here looks reasonable otherwise.  It's a little bit of a shame that
> this is *ONLY* for BROADWELL_X for now, but I assume you were just
> conservative about it because that's all you have to test on.

That's right. It is possible to implement for other models based on
information provided by the Intel SDM. However, I didn't implement it
because I can test immediately with BROADWELL_X only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver
  2022-04-20 21:40   ` Thomas Gleixner
@ 2022-04-22 12:10     ` tarumizu.kohei
  0 siblings, 0 replies; 21+ messages in thread
From: tarumizu.kohei @ 2022-04-22 12:10 UTC (permalink / raw)
  To: 'Thomas Gleixner',
	catalin.marinas, will, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki

Thanks for the comment.

> This is A64FX specific.
> This is x86 specific.
> This is A64FX specific.
> 
> So why is this in generic code and why needs x86 to populate the A64FX bits and
> make them invisible? Same the other way round.

As you commented, current generic code includes things that are not
needed outside of specific hardware.
 
> Now imagine a few other [sub]architectures come around and add their specific
> prefetcher control knobs, strings and whatever. That's going to be unmaintainable
> in no time.
> 
> This is not comparable to the cache attributes where the architectures share a
> significant amount of subsets. You just demonstrated that X86 and A64FX share
> not even a single entry.
> 
> The core code should provide infrastructure to manage the [sub]architecture
> specific control files at different cache levels.
> 
> Not more not less.

I understand the risks of the current implementation.

I would like to fix the core code to provide infrastructure to manage
the [sub]architecture specific control files at different cache levels
by also referring to the comments I received at patch 7/9.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3 7/9] x86: Add hardware prefetch control support for x86
  2022-04-20 22:27   ` Thomas Gleixner
@ 2022-04-22 12:16     ` tarumizu.kohei
  0 siblings, 0 replies; 21+ messages in thread
From: tarumizu.kohei @ 2022-04-22 12:16 UTC (permalink / raw)
  To: 'Thomas Gleixner',
	catalin.marinas, will, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre

> Prefetch control is supported on a lot of CPU models and all except Xeon PHI have
> the same MSR layout. ATOMs do not support L2_ACL and DCU_IP, but that's it. So
> there is absolutely nothing broadwell specific here.
> 
> Why do we need six explicit functions, which are pretty much copy and paste?

There is no special reason for them. My design was not good.

> The only difference is the bit they operate on. It's just a matter of proper wrapper
> structs.
> 
> See? One show() and one store() function which operates directly at the attribute
> level and supports all known variants of bits in the control MSR. No magic global
> constants, no visible management, no hardcoded type/level relations... Simple
> and straight forward.
> 
> All what the core code needs to do is to populate the attributes in the
> sys/.../cache/index$N/ directories when level and type matches.
> 
> I'm pretty sure you can simplify the A64FX code in a very similar way.

Thank you for showing me a concrete example. The implementation image
is now clear. With this in mind, I would like to simplify the core,
x86 and A64FX codes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver
  2022-04-21  6:18   ` Greg KH
@ 2022-04-22 12:30     ` tarumizu.kohei
  0 siblings, 0 replies; 21+ messages in thread
From: tarumizu.kohei @ 2022-04-22 12:30 UTC (permalink / raw)
  To: 'Greg KH'
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu, reinette.chatre

Thanks for the comment.

> Thanks to Thomas for pointing this change out to me.
> 
> Why did you not use get_maintainer.pl on your patch?  You are adding files here
> that you want _me_ to maintain for the next 25+ years, yet not asking for my
> review?  That's not nice, and for that reason alone I would not accept this change.

I apologize for my mistake. I did not specify some patch files when I
executed get_maintainer.pl. I would like to use it correctly when I
send the next version patch.

> Also, this is very hardware-specific, which is not a good thing for code in
> drivers/base/  See the mess we have in the topology driver core code for
> examples of that mess :(

I would like to try to remove hardware-specific code from core code,
or move to other place.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register
  2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
  2022-04-20 20:56   ` Dave Hansen
@ 2022-04-25 23:17   ` Reinette Chatre
  2022-04-27  2:51     ` tarumizu.kohei
  1 sibling, 1 reply; 21+ messages in thread
From: Reinette Chatre @ 2022-04-25 23:17 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel,
	fenghua.yu

Hi Kohei,

Thank you very much for catching this issue. This fix is not specific to
or required by the driver you are creating in this series so you could also
extract this patch and submit it separately as a fix to resctrl.

When you do resubmit there are a few style related points that I highlight here,
the fix itself looks good.

For the subject, please use "x86/resctrl:" prefix in the subject. 

On 4/19/2022 8:02 PM, Kohei Tarumizu wrote:
> The current pseudo_lock.c code overwrittes the value of the

overwrittes -> overwrites

> MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
> Therefore, modify it to save and restore the original values.
> 

This needs a Fixes tag. A few patches are impacted by this fix:
 
Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal core")
Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing")
Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements")

> Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
> ---
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index db813f819ad6..2d713c20f55f 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -420,6 +420,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
>  	struct pseudo_lock_region *plr = rdtgrp->plr;
>  	u32 rmid_p, closid_p;
>  	unsigned long i;
> +	u64 saved_msr;
>  #ifdef CONFIG_KASAN
>  	/*
>  	 * The registers used for local register variables are also used
> @@ -463,6 +464,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
>  	 * the buffer and evict pseudo-locked memory read earlier from the
>  	 * cache.
>  	 */
> +	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
>  	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>  	closid_p = this_cpu_read(pqr_state.cur_closid);
>  	rmid_p = this_cpu_read(pqr_state.cur_rmid);
> @@ -514,7 +516,7 @@ static int pseudo_lock_fn(void *_rdtgrp)
>  	__wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
>  
>  	/* Re-enable the hardware prefetcher(s) */
> -	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> +	wrmsrl(MSR_MISC_FEATURE_CONTROL, saved_msr);
>  	local_irq_enable();
>  
>  	plr->thread_done = 1;
> @@ -873,12 +875,14 @@ static int measure_cycles_lat_fn(void *_plr)
>  	struct pseudo_lock_region *plr = _plr;
>  	unsigned long i;
>  	u64 start, end;
> +	u32 saved_low, saved_high;
>  	void *mem_r;

Please do follow the current style of using "reverse fir tree order".
More information in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst#n587

>  
>  	local_irq_disable();
>  	/*
>  	 * Disable hardware prefetchers.
>  	 */
> +	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
>  	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>  	mem_r = READ_ONCE(plr->kmem);
>  	/*
> @@ -895,7 +899,7 @@ static int measure_cycles_lat_fn(void *_plr)
>  		end = rdtsc_ordered();
>  		trace_pseudo_lock_mem_latency((u32)(end - start));
>  	}
> -	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> +	wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
>  	local_irq_enable();
>  	plr->thread_done = 1;
>  	wake_up_interruptible(&plr->lock_thread_wq);
> @@ -945,6 +949,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
>  	unsigned long i;
>  	void *mem_r;
>  	u64 tmp;
> +	u32 saved_low, saved_high;

Same as above.

>  
>  	miss_event = perf_event_create_kernel_counter(miss_attr, plr->cpu,
>  						      NULL, NULL, NULL);
> @@ -973,6 +978,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
>  	/*
>  	 * Disable hardware prefetchers.
>  	 */
> +	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
>  	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
>  
>  	/* Initialize rest of local variables */
> @@ -1031,7 +1037,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
>  	 */
>  	rmb();
>  	/* Re-enable hardware prefetchers */
> -	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
> +	wrmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
>  	local_irq_enable();
>  out_hit:
>  	perf_event_release_kernel(hit_event);

Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register
  2022-04-25 23:17   ` Reinette Chatre
@ 2022-04-27  2:51     ` tarumizu.kohei
  0 siblings, 0 replies; 21+ messages in thread
From: tarumizu.kohei @ 2022-04-27  2:51 UTC (permalink / raw)
  To: 'Reinette Chatre',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel, fenghua.yu

Thanks for the comment.

> Thank you very much for catching this issue. This fix is not specific to or required
> by the driver you are creating in this series so you could also extract this patch and
> submit it separately as a fix to resctrl.

I would like to send this patch separated from this series next time.

> When you do resubmit there are a few style related points that I highlight here, the
> fix itself looks good.
> 
> For the subject, please use "x86/resctrl:" prefix in the subject.

> This needs a Fixes tag. A few patches are impacted by this fix:
> 
> Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal
> core")
> Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking
> testing")
> Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements")

I would like to use this prefix and add Fixes tag for the next patch.

> Please do follow the current style of using "reverse fir tree order".
> More information in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docume
> ntation/process/maintainer-tip.rst#n587

> Same as above.

I check the URL to fix style problem.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-04-27  2:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20  3:02 [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86 Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
2022-04-20 21:40   ` Thomas Gleixner
2022-04-22 12:10     ` tarumizu.kohei
2022-04-21  6:18   ` Greg KH
2022-04-22 12:30     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build " Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 3/9] soc: fujitsu: Add hardware prefetch control support for A64FX Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-04-20 22:14   ` Thomas Gleixner
2022-04-20  3:02 ` [PATCH v3 5/9] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
2022-04-20 20:56   ` Dave Hansen
2022-04-22 12:01     ` tarumizu.kohei
2022-04-25 23:17   ` Reinette Chatre
2022-04-27  2:51     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 7/9] x86: Add hardware prefetch control support for x86 Kohei Tarumizu
2022-04-20 22:27   ` Thomas Gleixner
2022-04-22 12:16     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 8/9] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).