linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
@ 2021-11-04  5:21 Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 1/5] driver: hwpf: Add hardware prefetch core driver register/unregister functions Kohei Tarumizu
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This patch series add hardware prefetch driver register/unregister
function. The purpose of this driver is to provide an interface to
control the hardware prefetch mechanism depending on the application
characteristics.

An earlier RFC[1], we were suggested that we create a hardware
prefetch directory under /sys/devices/system/cpu/[CPUNUM]/cache.
Hardware prefetch is a cache-related feature, but it does not require
cache sysfs feature. Therefore, we decided to isolate the code.
Specifically, create a directory under cpu/[CPUNUM].

[1]https://lore.kernel.org/lkml/OSBPR01MB2037D114B11153F00F233F8780389@OSBPR01MB2037.jpnprd01.prod.outlook.com/

Changes since v1:
- Add Intel hardware prefetch support
- Fix typo

This version adds Intel Hardware Prefetch support by Proposal A that
proposed in v1 RFC PATCH[2], and the proposal is also described in the
[RFC & Future plan] section of this letter.
This is the first step to supporting Intel processors, so we add
support only for INTEL_FAM6_BROADWELL_X.

[2]https://lore.kernel.org/lkml/20211011043952.995856-1-tarumizu.kohei@fujitsu.com/

Patch organizations are as follows:

- patch1: Add hardware prefetch core driver
  This adds register/unregister function to create the sysfs interface
  with attribute "enable", "dist", and "strong".  Detailed description
  of these are in Documentation/ABI/testing/sysfs-devices-system-cpu.

- patch2: Add support for A64FX
  This adds module init/exit code for A64FX.

- patch3: Add support for Intel

- patch4: Add Kconfig/Makefile to build module

- patch5: Add documentation for the new sysfs interface

We tested this driver and measured its performance by STREAM benchmark
on our x86 machine. The results are as follows:

| Hardware Prefetch status | Triad      |
|--------------------------|------------|
| Enabled                  | 40300.4600 |
| Disabled                 | 31694.6333 |

The performance is better with Enabled, which is an expected result.
We also measured the performance on our A64FX machine and showed the
results in v1 RFC PATCH.

[RFC & Future plan]
We plan to support Intel processors that have MSR 0x1A4(1A4H)[3].
We would appreciate it if you could give us a comment on how we should
handle multiple hardware prefetch types in enable attribute file for
Intel processor. Detailed description will be described later.

[3]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    Volume 4

There are some cases where MSR 0x1A4 has different specifications
depending on the model. One of the specification of MSR 0x1A4 for each
bits is as follows:

[0]    L2 Hardware Prefetcher Disable (R/W)
[1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2]    DCU Hardware Prefetcher Disable (R/W)
[3]    DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

If it supports enabling two types of hardware prefetches for each
cache, as in the specification above, we should consider how to
handle them.

We would like to assign these features to an enable attribute file
(i.e. Map l1/enable to bit[2:3] and l2/enable to bit[0:1]), and
consider the two proposals:

A) The enable file handles only one bit, and changes affect the multiple
   hardware prefetch types at a certain cache level.

B) The enable file handles one or more bit, and changes to a single bit
   affect a corresponding single hardware prefetch type.

For each proposal, an example of the result of writing to the enable
file when all bits of the MSR 0x1A4 are 0 is shown below.

| Value to write          | bit[0] | bit[1] | bit[2] | bit[3] |
|-------------------------|--------|--------|--------|--------|
| A) write 1 to l1/enable | 0      | 0      | 1      | 1      |
| A) write 1 to l2/enable | 1      | 1      | 0      | 0      |
| B) write 1 to l1/enable | 0      | 0      | 1      | 0      |
| B) write 2 to l1/enable | 0      | 0      | 0      | 1      |
| B) write 3 to l2/enable | 1      | 1      | 0      | 0      |

Proposal A is simple, it uniformly controls the enablement of the
hardware prefetch type at a certain cache level. In this case, it is
easy to provide the same interface as the A64FX. However, it cannot
allow the detailed tuning(e.g. Write 1 to only bit[1]).

Proposal B allows the same tuning as direct register access. However,
user needs to know the hardware specifications (e.g. Number of features
that can be enabled via register) to use interface.

We think proposal A is better for providing a standard interface, but it
is a concern that it cannot provide all the features of the register.
Do you have any comments on these proposals?

Best regards,
Kohei Tarumizu

Kohei Tarumizu (5):
  driver: hwpf: Add hardware prefetch core driver register/unregister
    functions
  driver: hwpf: Add support for A64FX to hardware prefetch driver
  driver: hwpf: Add support for Intel to hardware prefetch driver
  driver: hwpf: Add Kconfig/Makefile to build hardware prefetch driver
  docs: ABI: Add sysfs documentation interface of hardware prefetch
    driver

 .../ABI/testing/sysfs-devices-system-cpu      |  58 +++
 MAINTAINERS                                   |   7 +
 arch/arm64/Kconfig.platforms                  |   6 +
 arch/x86/Kconfig                              |  12 +
 drivers/Kconfig                               |   2 +
 drivers/Makefile                              |   1 +
 drivers/hwpf/Kconfig                          |  24 +
 drivers/hwpf/Makefile                         |   9 +
 drivers/hwpf/fujitsu_hwpf.c                   | 460 ++++++++++++++++++
 drivers/hwpf/hwpf.c                           | 452 +++++++++++++++++
 drivers/hwpf/intel_hwpf.c                     | 219 +++++++++
 include/linux/hwpf.h                          |  38 ++
 12 files changed, 1288 insertions(+)
 create mode 100644 drivers/hwpf/Kconfig
 create mode 100644 drivers/hwpf/Makefile
 create mode 100644 drivers/hwpf/fujitsu_hwpf.c
 create mode 100644 drivers/hwpf/hwpf.c
 create mode 100644 drivers/hwpf/intel_hwpf.c
 create mode 100644 include/linux/hwpf.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 1/5] driver: hwpf: Add hardware prefetch core driver register/unregister functions
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
@ 2021-11-04  5:21 ` Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 2/5] driver: hwpf: Add support for A64FX to hardware prefetch driver Kohei Tarumizu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This adds hardware prefetch core driver register/unregister EXPORT_SYMBOL.
We use this symbol to support the new environment.
Following commits will add A64FX and Intel support.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/hwpf/hwpf.c  | 452 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/hwpf.h |  38 ++++
 2 files changed, 490 insertions(+)
 create mode 100644 drivers/hwpf/hwpf.c
 create mode 100644 include/linux/hwpf.h

diff --git a/drivers/hwpf/hwpf.c b/drivers/hwpf/hwpf.c
new file mode 100644
index 000000000..1c86fd0cf
--- /dev/null
+++ b/drivers/hwpf/hwpf.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 FUJITSU LIMITED
+ *
+ * This driver provides Hardware Prefetch (HWPF) tunable sysfs interface
+ * by using implementation defined registers.
+ */
+
+#include <linux/cpu.h>
+#include <linux/hwpf.h>
+#include <linux/slab.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+struct hwpf_dir {
+	struct kobject *kobj;
+	struct list_head level_dir_list; /* linked list of hwpf_level_dir */
+	unsigned int cpu;
+};
+
+struct hwpf_level_dir {
+	struct hwpf_dir *parent;
+	struct kobject kobj;
+	struct list_head entry; /* Membership in level_dir_list */
+	unsigned int level;
+};
+
+static DEFINE_PER_CPU(struct hwpf_dir *, hwpf_dir_pcpu);
+#define per_cpu_hwpf_dir(cpu)  (per_cpu(hwpf_dir_pcpu, cpu))
+
+static struct hwpf_driver *hwpf_driver;
+
+enum cpuhp_state hp_online;
+
+static bool hwpf_enable_support;
+static bool hwpf_dist_support;
+static bool hwpf_strong_support;
+
+#define kobj_to_hwpf_level_dir(k) container_of(k, struct hwpf_level_dir, kobj)
+
+static int create_hwpf_dir(unsigned int cpu)
+{
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct kobject *kobj;
+	struct hwpf_dir *hwpf_dir;
+
+	if (unlikely(!cpu_dev))
+		return -ENODEV;
+
+	kobj = kobject_create_and_add("hwpf", &cpu_dev->kobj);
+	if (!kobj)
+		return -EINVAL;
+
+	hwpf_dir = kzalloc(sizeof(*hwpf_dir), GFP_KERNEL);
+	if (!hwpf_dir) {
+		kobject_put(kobj);
+		return -ENOMEM;
+	}
+
+	hwpf_dir->kobj = kobj;
+	hwpf_dir->cpu = cpu;
+	INIT_LIST_HEAD(&hwpf_dir->level_dir_list);
+
+	per_cpu_hwpf_dir(cpu) = hwpf_dir;
+
+	return 0;
+}
+
+static void remove_hwpf_dir(unsigned int cpu)
+{
+	struct hwpf_dir *hwpf_dir;
+
+	hwpf_dir = per_cpu_hwpf_dir(cpu);
+	if (!hwpf_dir)
+		return;
+
+	kobject_put(hwpf_dir->kobj);
+	kfree(hwpf_dir);
+
+	hwpf_dir = NULL;
+}
+
+#define hwpf_show(name)							\
+static ssize_t name##_show(struct kobject *kobj,			\
+			   struct kobj_attribute *attr, char *buf)	\
+{									\
+	int ret;							\
+	struct hwpf_level_dir *level_dir;				\
+									\
+	level_dir = kobj_to_hwpf_level_dir(kobj);			\
+									\
+	ret = hwpf_driver->get_##name(level_dir->parent->cpu,		\
+				      level_dir->level);		\
+	if (ret < 0)							\
+		return ret;						\
+									\
+	return sprintf(buf, "%u\n", ret);				\
+}
+
+hwpf_show(enable);
+hwpf_show(strong);
+
+#define hwpf_store(name)						\
+static ssize_t name##_store(struct kobject *kobj,			\
+			    struct kobj_attribute *attr,		\
+			    const char *buf, size_t count)		\
+{									\
+	int ret, arg;							\
+	struct hwpf_level_dir *level_dir;				\
+									\
+	ret = kstrtoint(buf, 10, &arg);					\
+	if (ret < 0)							\
+		return -EINVAL;						\
+									\
+	if (arg < 0)							\
+		return -EINVAL;						\
+									\
+	level_dir = kobj_to_hwpf_level_dir(kobj);			\
+									\
+	ret = hwpf_driver->set_##name(level_dir->parent->cpu,		\
+				      level_dir->level, arg);		\
+	if (ret < 0)							\
+		return ret;						\
+									\
+	return count;							\
+}
+
+hwpf_store(enable);
+hwpf_store(strong);
+
+static const char dist_auto_string[] = "auto";
+
+static ssize_t dist_show(struct kobject *kobj, struct kobj_attribute *attr,
+			 char *buf)
+{
+	int ret;
+	struct hwpf_level_dir *level_dir;
+
+	level_dir = kobj_to_hwpf_level_dir(kobj);
+
+	ret = hwpf_driver->get_dist(level_dir->parent->cpu,
+				    level_dir->level);
+	if (ret < 0)
+		return ret;
+
+	if (ret == DIST_AUTO_VALUE)
+		return sprintf(buf, "%s\n", dist_auto_string);
+	else
+		return sprintf(buf, "%u\n", ret);
+}
+
+static ssize_t dist_store(struct kobject *kobj, struct kobj_attribute *attr,
+			  const char *buf, size_t count)
+{
+	int ret, arg;
+	struct hwpf_level_dir *level_dir;
+
+	if (sysfs_streq(buf, dist_auto_string)) {
+		arg = DIST_AUTO_VALUE;
+	} else {
+		ret = kstrtoint(buf, 10, &arg);
+		if (ret < 0)
+			return -EINVAL;
+	}
+
+	if (arg < 0)
+		return -EINVAL;
+
+	level_dir = kobj_to_hwpf_level_dir(kobj);
+
+	ret = hwpf_driver->set_dist(level_dir->parent->cpu,
+				    level_dir->level, arg);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+static struct dist_table *get_dist_table(struct list_head dist_list,
+					  unsigned int level)
+{
+	struct dist_table *table;
+
+	list_for_each_entry(table, &dist_list, entry)
+		if (table->level == level)
+			return table;
+
+	return NULL;
+}
+
+static ssize_t available_dist_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	struct dist_table *table;
+	struct hwpf_level_dir *level_dir;
+	int i;
+	ssize_t count = 0;
+
+	level_dir = kobj_to_hwpf_level_dir(kobj);
+
+	table = get_dist_table(hwpf_driver->dist_list, level_dir->level);
+	if (!table)
+		return -EINVAL;
+
+	for (i = 0; i < table->table_size; i++) {
+		int available_dist = table->available_dist[i];
+
+		if (available_dist == DIST_AUTO_VALUE)
+			count += sprintf(&buf[count], "%s ", dist_auto_string);
+		else
+			count += sprintf(&buf[count], "%d ", available_dist);
+	}
+	count += sprintf(&buf[count], "\n");
+
+	return count;
+}
+
+#define HWPF_ATTR_RW(name)					\
+	static struct kobj_attribute hwpf_##name##_attribute =	\
+	__ATTR(name, 0664, name##_show, name##_store)
+
+HWPF_ATTR_RW(enable);
+HWPF_ATTR_RW(dist);
+HWPF_ATTR_RW(strong);
+
+#define HWPF_ATTR_RO(name)					\
+	static struct kobj_attribute hwpf_##name##_attribute =	\
+	__ATTR(name, 0664, name##_show, NULL)
+
+HWPF_ATTR_RO(available_dist);
+
+static struct attribute *hwpf_attrs[] = {
+	&hwpf_enable_attribute.attr,
+	&hwpf_dist_attribute.attr,
+	&hwpf_strong_attribute.attr,
+	&hwpf_available_dist_attribute.attr,
+	NULL
+};
+
+static umode_t hwpf_attrs_is_visible(struct kobject *kobj,
+				     struct attribute *attr, int unused)
+{
+	umode_t mode = attr->mode;
+
+	if ((attr == &hwpf_enable_attribute.attr) && (hwpf_enable_support))
+		return mode;
+	if ((attr == &hwpf_dist_attribute.attr) && (hwpf_dist_support))
+		return mode;
+	if ((attr == &hwpf_available_dist_attribute.attr) && (hwpf_dist_support))
+		return mode;
+	if ((attr == &hwpf_strong_attribute.attr) && (hwpf_strong_support))
+		return mode;
+
+	return 0;
+}
+
+static const struct attribute_group hwpf_attr_group = {
+	.attrs = hwpf_attrs,
+	.is_visible = hwpf_attrs_is_visible,
+};
+
+static void level_dir_release(struct kobject *kobj)
+{
+	struct hwpf_level_dir *level_dir;
+
+	level_dir = kobj_to_hwpf_level_dir(kobj);
+
+	kfree(level_dir);
+}
+
+static struct kobj_type hwpf_level_type = {
+	.sysfs_ops = &kobj_sysfs_ops,
+	.release = level_dir_release,
+};
+
+static int create_level_dir(struct hwpf_dir *hwpf_dir, unsigned int level)
+{
+	struct hwpf_level_dir *level_dir;
+	int ret;
+
+	if (level >= hwpf_driver->cache_leaves)
+		return -EINVAL;
+
+	level_dir = kzalloc(sizeof(*level_dir), GFP_KERNEL);
+	if (!level_dir)
+		return -ENOMEM;
+
+	ret = kobject_init_and_add(&level_dir->kobj, &hwpf_level_type,
+				   hwpf_dir->kobj, "l%d", level + 1);
+	if (ret < 0) {
+		kfree(level_dir);
+		return ret;
+	}
+
+	ret = sysfs_create_group(&level_dir->kobj, &hwpf_attr_group);
+	if (ret) {
+		kobject_put(&level_dir->kobj);
+		kfree(level_dir);
+		return ret;
+	}
+
+	level_dir->level = level;
+	level_dir->parent = hwpf_dir;
+	list_add(&level_dir->entry, &hwpf_dir->level_dir_list);
+
+	return 0;
+}
+
+static void remove_level_dirs(unsigned int cpu)
+{
+	struct hwpf_dir *hwpf_dir;
+	struct hwpf_level_dir *dir, *tmp;
+
+	hwpf_dir = per_cpu_hwpf_dir(cpu);
+	if (!hwpf_dir)
+		return;
+
+	list_for_each_entry_safe(dir, tmp, &hwpf_dir->level_dir_list, entry) {
+		list_del(&dir->entry);
+		kobject_put(&dir->kobj);
+	}
+}
+
+static int create_level_dirs(unsigned int cpu)
+{
+	int level, ret;
+	struct hwpf_dir *hwpf_dir;
+
+	hwpf_dir = per_cpu_hwpf_dir(cpu);
+
+	for (level = 0; level < hwpf_driver->cache_leaves; level++) {
+		ret = create_level_dir(hwpf_dir, level);
+		if (ret < 0) {
+			remove_level_dirs(cpu);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int hwpf_online(unsigned int cpu)
+{
+	int ret;
+
+	if (hwpf_driver->cpuhp_online) {
+		ret = hwpf_driver->cpuhp_online(cpu);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = create_hwpf_dir(cpu);
+	if (ret < 0)
+		return ret;
+
+	ret = create_level_dirs(cpu);
+	if (ret < 0) {
+		remove_hwpf_dir(cpu);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int hwpf_prepare_down(unsigned int cpu)
+{
+	remove_level_dirs(cpu);
+
+	remove_hwpf_dir(cpu);
+
+	if (hwpf_driver->cpuhp_prepare_down)
+		hwpf_driver->cpuhp_prepare_down(cpu);
+
+	return 0;
+}
+
+/**
+ * hwpf_register_driver - register a Hardware Prefetch driver
+ * @driver_data: struct hwpf_driver contains a function pointer to create
+ *               attribute. If the function pointer is defined, corresponding
+ *               attribute is created.
+ *
+ * Context: Any context.
+ * Return: 0 on success, negative error code on failure.
+ */
+int hwpf_register_driver(struct hwpf_driver *driver_data)
+{
+	int ret;
+
+	if (hwpf_driver)
+		return -EEXIST;
+
+	if (!driver_data || driver_data->cache_leaves == 0)
+		return -EINVAL;
+
+	if (!driver_data->get_enable || !driver_data->set_enable)
+		hwpf_enable_support = false;
+	else
+		hwpf_enable_support = true;
+
+	if (!driver_data->get_dist || !driver_data->set_dist ||
+	    list_empty(&driver_data->dist_list))
+		hwpf_dist_support = false;
+	else
+		hwpf_dist_support = true;
+
+	if (!driver_data->get_strong || !driver_data->set_strong)
+		hwpf_strong_support = false;
+	else
+		hwpf_strong_support = true;
+
+	if (!hwpf_enable_support && !hwpf_dist_support &&
+	    !hwpf_strong_support)
+		return -EINVAL;
+
+	hwpf_driver = driver_data;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hwpf/hwpf:online",
+				hwpf_online, hwpf_prepare_down);
+	if (ret < 0) {
+		pr_err("failed to register hotplug callbacks\n");
+		return ret;
+	}
+
+	hp_online = ret;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(hwpf_register_driver);
+
+/**
+ * hwpf_unregister_driver - unregister the Hardware Prefetch driver
+ * @driver_data: Used to verify that this function is called by the driver that
+ *               called hwpf_register_driver by determining if driver_data is
+ *               the same.
+ *
+ * Context: Any context.
+ * Return: nothing.
+ */
+void hwpf_unregister_driver(struct hwpf_driver *driver_data)
+{
+	if (!hwpf_driver || (driver_data != hwpf_driver))
+		return;
+
+	cpuhp_remove_state(hp_online);
+
+	hwpf_driver = NULL;
+}
+EXPORT_SYMBOL_GPL(hwpf_unregister_driver);
diff --git a/include/linux/hwpf.h b/include/linux/hwpf.h
new file mode 100644
index 000000000..6a87e4591
--- /dev/null
+++ b/include/linux/hwpf.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_HWPF_H
+#define _LINUX_HWPF_H
+
+#define DIST_AUTO_VALUE 0 /* Special value when dist is "auto" */
+
+struct dist_table {
+	unsigned int level;		/* Cache level */
+	unsigned int *available_dist;	/* Array of available dists */
+	size_t table_size;		/* Number of elements in array */
+	struct list_head entry;		/* Membership in dist_list */
+};
+
+struct hwpf_driver {
+	unsigned int cache_leaves; /* Number of hwpf/l* directories */
+
+	/* Set these function pointer if support enable attribute. */
+	int (*get_enable)(unsigned int cpu, unsigned int level);
+	int (*set_enable)(unsigned int cpu, unsigned int level, int val);
+
+	/* Set these function pointer and dist_list if support dist attribute. */
+	int (*get_dist)(unsigned int cpu, unsigned int level);
+	int (*set_dist)(unsigned int cpu, unsigned int level, int val);
+	struct list_head dist_list; /* linked list of dist_table */
+
+	/* Set these function pointer if support strong attribute. */
+	int (*get_strong)(unsigned int cpu, unsigned int level);
+	int (*set_strong)(unsigned int cpu, unsigned int level, int val);
+
+	/* Set these function pointer if register them in cpuhp_*_state. */
+	int (*cpuhp_online)(unsigned int cpu);
+	void (*cpuhp_prepare_down)(unsigned int cpu);
+};
+
+int hwpf_register_driver(struct hwpf_driver *driver_data);
+void hwpf_unregister_driver(struct hwpf_driver *driver_data);
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 2/5] driver: hwpf: Add support for A64FX to hardware prefetch driver
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 1/5] driver: hwpf: Add hardware prefetch core driver register/unregister functions Kohei Tarumizu
@ 2021-11-04  5:21 ` Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel " Kohei Tarumizu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This adds module init/exit code, and creates sysfs attribute files for
"enable", "dist", "available_dist" and "strong".

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/hwpf/fujitsu_hwpf.c | 460 ++++++++++++++++++++++++++++++++++++
 1 file changed, 460 insertions(+)
 create mode 100644 drivers/hwpf/fujitsu_hwpf.c

diff --git a/drivers/hwpf/fujitsu_hwpf.c b/drivers/hwpf/fujitsu_hwpf.c
new file mode 100644
index 000000000..70bd53cd0
--- /dev/null
+++ b/drivers/hwpf/fujitsu_hwpf.c
@@ -0,0 +1,460 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 FUJITSU LIMITED
+ *
+ * Fujitsu A64FX Hardware Prefetch support
+ */
+
+#include <asm/cputype.h>
+#include <linux/bitfield.h>
+#include <linux/hwpf.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#define IMP_PF_STREAM_DETECT_CTRL_EL0	sys_reg(3, 3, 11, 4, 0)
+#define HWPF_V_FIELD			BIT_ULL(63)
+#define HWPF_L1PF_DIS_FIELD		BIT_ULL(59)
+#define HWPF_L2PF_DIS_FIELD		BIT_ULL(58)
+#define HWPF_L1W_FIELD			BIT_ULL(55)
+#define HWPF_L2W_FIELD			BIT_ULL(54)
+#define HWPF_L1_DIST_FIELD		GENMASK_ULL(27, 24)
+#define HWPF_L2_DIST_FIELD		GENMASK_ULL(19, 16)
+
+enum cache_level {
+	CACHE_LEVEL_L1 = 0,
+	CACHE_LEVEL_L2,
+	CACHE_LEVEL_NOCACHE,
+};
+
+struct parse_info {
+	enum cache_level level;
+	int val;
+	int ret;
+};
+
+/* The original register value when module_init is called */
+static DEFINE_PER_CPU(u64, orig_reg_pcpu);
+#define per_cpu_orig_reg(cpu)  (per_cpu(orig_reg_pcpu, cpu))
+
+static unsigned int cache_leaves;
+
+static int min_dist_l1;
+static int min_dist_l2;
+
+static int default_v;
+
+static enum cache_level get_cache_level(unsigned int level)
+{
+	if (level >= cache_leaves)
+		return CACHE_LEVEL_NOCACHE;
+
+	return level;
+}
+
+static void _get_enable(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		pinfo->val = FIELD_GET(HWPF_L1PF_DIS_FIELD, reg);
+		break;
+	case CACHE_LEVEL_L2:
+		pinfo->val = FIELD_GET(HWPF_L2PF_DIS_FIELD, reg);
+		break;
+	default:
+		return;
+	}
+
+	pinfo->ret = 0;
+}
+
+static void _set_enable(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		if (!FIELD_FIT(HWPF_L1PF_DIS_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L1PF_DIS_FIELD;
+		reg |= FIELD_PREP(HWPF_L1PF_DIS_FIELD, pinfo->val);
+		break;
+	case CACHE_LEVEL_L2:
+		if (!FIELD_FIT(HWPF_L2PF_DIS_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L2PF_DIS_FIELD;
+		reg |= FIELD_PREP(HWPF_L2PF_DIS_FIELD, pinfo->val);
+		break;
+	default:
+		return;
+	}
+
+	write_sysreg_s(reg, IMP_PF_STREAM_DETECT_CTRL_EL0);
+	pinfo->ret = 0;
+}
+
+static void _get_dist(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		pinfo->val = FIELD_GET(HWPF_L1_DIST_FIELD, reg);
+		pinfo->val = pinfo->val * min_dist_l1;
+		break;
+	case CACHE_LEVEL_L2:
+		pinfo->val = FIELD_GET(HWPF_L2_DIST_FIELD, reg);
+		pinfo->val = pinfo->val * min_dist_l2;
+		break;
+	default:
+		return;
+	}
+
+	if (pinfo->val == 0)
+		pinfo->val = DIST_AUTO_VALUE;
+
+	pinfo->ret = 0;
+}
+
+static void _set_dist(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	if (pinfo->val == DIST_AUTO_VALUE)
+		pinfo->val = 0;
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		pinfo->val = roundup(pinfo->val, min_dist_l1) / min_dist_l1;
+		if (!FIELD_FIT(HWPF_L1_DIST_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L1_DIST_FIELD;
+		reg |= FIELD_PREP(HWPF_L1_DIST_FIELD, pinfo->val);
+		break;
+	case CACHE_LEVEL_L2:
+		pinfo->val = roundup(pinfo->val, min_dist_l2) / min_dist_l2;
+		if (!FIELD_FIT(HWPF_L2_DIST_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L2_DIST_FIELD;
+		reg |= FIELD_PREP(HWPF_L2_DIST_FIELD, pinfo->val);
+		break;
+	default:
+		return;
+	}
+
+	write_sysreg_s(reg, IMP_PF_STREAM_DETECT_CTRL_EL0);
+	pinfo->ret = 0;
+}
+
+static void _get_strong(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		pinfo->val = FIELD_GET(HWPF_L1W_FIELD, reg);
+		break;
+	case CACHE_LEVEL_L2:
+		pinfo->val = FIELD_GET(HWPF_L2W_FIELD, reg);
+		break;
+	default:
+		return;
+	}
+
+	pinfo->ret = 0;
+}
+
+static void _set_strong(void *info)
+{
+	u64 reg;
+	struct parse_info *pinfo = info;
+
+	pinfo->ret = -EINVAL;
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	switch (pinfo->level) {
+	case CACHE_LEVEL_L1:
+		if (!FIELD_FIT(HWPF_L1W_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L1W_FIELD;
+		reg |= FIELD_PREP(HWPF_L1W_FIELD, pinfo->val);
+		break;
+	case CACHE_LEVEL_L2:
+		if (!FIELD_FIT(HWPF_L2W_FIELD, pinfo->val))
+			return;
+		reg &= ~HWPF_L2W_FIELD;
+		reg |= FIELD_PREP(HWPF_L2W_FIELD, pinfo->val);
+		break;
+	default:
+		return;
+	}
+
+	write_sysreg_s(reg, IMP_PF_STREAM_DETECT_CTRL_EL0);
+	pinfo->ret = 0;
+}
+
+#define DEFINE_SMP_CALL_ATTR_FUNCTION(name)				\
+static int get_##name(unsigned int cpu, unsigned int level)		\
+{									\
+	struct parse_info info = {					\
+		.level = get_cache_level(level),			\
+	};								\
+									\
+	if (info.level == CACHE_LEVEL_NOCACHE)				\
+		return -EINVAL;						\
+									\
+	smp_call_function_single(cpu, _get_##name, &info, true);	\
+	if (info.ret < 0)						\
+		return info.ret;					\
+									\
+	return info.val;						\
+}									\
+									\
+static int set_##name(unsigned int cpu, unsigned int level, int val)	\
+{									\
+	struct parse_info info = {					\
+		.level = get_cache_level(level),			\
+		.val = val,						\
+	};								\
+									\
+	if (info.level == CACHE_LEVEL_NOCACHE)				\
+		return -EINVAL;						\
+									\
+	smp_call_function_single(cpu, _set_##name, &info, true);	\
+	if (info.ret < 0)						\
+		return info.ret;					\
+									\
+	return info.ret;						\
+}
+
+DEFINE_SMP_CALL_ATTR_FUNCTION(enable);
+DEFINE_SMP_CALL_ATTR_FUNCTION(dist);
+DEFINE_SMP_CALL_ATTR_FUNCTION(strong);
+
+/*
+ * Initialize IMP_PF_STREAM_DETECT_CTRL_EL0 v[63] field to validate
+ * the register, and save the original register value.
+ */
+static int init_hwpf_register(unsigned int cpu)
+{
+	u64 reg;
+
+	reg = read_sysreg_s(IMP_PF_STREAM_DETECT_CTRL_EL0);
+	per_cpu_orig_reg(cpu) = reg;
+
+	reg &= ~HWPF_V_FIELD;
+	reg |= FIELD_PREP(HWPF_V_FIELD, default_v);
+
+	write_sysreg_s(reg, IMP_PF_STREAM_DETECT_CTRL_EL0);
+
+	return 0;
+}
+
+static void restore_hwpf_register(unsigned int cpu)
+{
+	write_sysreg_s(per_cpu_orig_reg(cpu), IMP_PF_STREAM_DETECT_CTRL_EL0);
+}
+
+struct hwpf_driver fujitsu_hwpf_driver = {
+	.get_enable		= get_enable,
+	.get_dist		= get_dist,
+	.get_strong		= get_strong,
+	.set_enable		= set_enable,
+	.set_dist		= set_dist,
+	.set_strong		= set_strong,
+	.cpuhp_online		= init_hwpf_register,
+	.cpuhp_prepare_down	= restore_hwpf_register,
+};
+
+static unsigned int l1_dist_table[] = {
+	DIST_AUTO_VALUE,
+	256,
+	512,
+	768,
+	1024,
+	1280,
+	1536,
+	1792,
+	2048,
+	2304,
+	2560,
+	2816,
+	3072,
+	3328,
+	3584,
+	3840,
+};
+
+static unsigned int l2_dist_table[] = {
+	DIST_AUTO_VALUE,
+	1024,
+	2048,
+	3072,
+	4096,
+	5120,
+	6144,
+	7168,
+	8192,
+	9216,
+	10240,
+	11264,
+	12288,
+	13312,
+	14336,
+	15360,
+};
+
+static int __init alloc_dist_table(unsigned int level)
+{
+	struct dist_table *dist_table;
+	enum cache_level cache_level;
+
+	cache_level = get_cache_level(level);
+	if (cache_level == CACHE_LEVEL_NOCACHE)
+		return -EINVAL;
+
+	dist_table = kzalloc(sizeof(*dist_table), GFP_KERNEL);
+	if (!dist_table)
+		return -ENOMEM;
+
+	switch (cache_level) {
+	case CACHE_LEVEL_L1:
+		dist_table->table_size = ARRAY_SIZE(l1_dist_table);
+		dist_table->available_dist = l1_dist_table;
+		break;
+	case CACHE_LEVEL_L2:
+		dist_table->table_size = ARRAY_SIZE(l2_dist_table);
+		dist_table->available_dist = l2_dist_table;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	dist_table->level = cache_level;
+	list_add(&dist_table->entry, &fujitsu_hwpf_driver.dist_list);
+
+	return 0;
+}
+
+static void free_dist_table(void)
+{
+	struct dist_table *table, *tmp;
+
+	list_for_each_entry_safe(table, tmp, &fujitsu_hwpf_driver.dist_list,
+				 entry) {
+		list_del(&table->entry);
+		kfree(table);
+	}
+}
+
+static int __init init_dist_table(void)
+{
+	unsigned int level;
+	int ret;
+
+	INIT_LIST_HEAD(&fujitsu_hwpf_driver.dist_list);
+
+	for (level = 0; level < cache_leaves; level++) {
+		ret = alloc_dist_table(level);
+		if (ret < 0) {
+			free_dist_table();
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int __init get_cpu_type(void)
+{
+	if (read_cpuid_implementor() != ARM_CPU_IMP_FUJITSU)
+		return -1;
+
+	return read_cpuid_part_number();
+}
+
+static int __init setup_hwpf_driver_params(void)
+{
+	int type, ret;
+
+	type = get_cpu_type();
+	if (type < 0)
+		return -ENODEV;
+
+	switch (type) {
+	case FUJITSU_CPU_PART_A64FX:
+		cache_leaves = 2;
+		min_dist_l1 = 256;
+		min_dist_l2 = 1024;
+		default_v = 1;
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	fujitsu_hwpf_driver.cache_leaves = cache_leaves;
+
+	ret = init_dist_table();
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void __exit cleanup_hwpf_driver_params(void)
+{
+	free_dist_table();
+}
+
+static int __init hwpf_init(void)
+{
+	int ret;
+
+	if (!has_vhe() || !is_kernel_in_hyp_mode())
+		return -EINVAL;
+
+	ret = setup_hwpf_driver_params();
+	if (ret < 0)
+		return ret;
+
+	ret = hwpf_register_driver(&fujitsu_hwpf_driver);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void __exit hwpf_exit(void)
+{
+	hwpf_unregister_driver(&fujitsu_hwpf_driver);
+
+	cleanup_hwpf_driver_params();
+}
+
+module_init(hwpf_init);
+module_exit(hwpf_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("FUJITSU Hardware Prefetch Driver");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel to hardware prefetch driver
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 1/5] driver: hwpf: Add hardware prefetch core driver register/unregister functions Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 2/5] driver: hwpf: Add support for A64FX to hardware prefetch driver Kohei Tarumizu
@ 2021-11-04  5:21 ` Kohei Tarumizu
  2021-11-08  1:51   ` Dave Hansen
  2021-11-04  5:21 ` [RFC PATCH v2 4/5] driver: hwpf: Add Kconfig/Makefile to build " Kohei Tarumizu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This adds module init/exit code, and creates sysfs attribute files for
"enable". It works on only INTEL_FAM6_BROADWELL_X.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 drivers/hwpf/intel_hwpf.c | 219 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 drivers/hwpf/intel_hwpf.c

diff --git a/drivers/hwpf/intel_hwpf.c b/drivers/hwpf/intel_hwpf.c
new file mode 100644
index 000000000..391a58200
--- /dev/null
+++ b/drivers/hwpf/intel_hwpf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0.
+/*
+ * Copyright 2021 FUJITSU LIMITED
+ *
+ * Intel Hardware Prefetch support
+ */
+
+#include <linux/bitfield.h>
+#include <linux/hwpf.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
+
+enum cache_level {
+	CACHE_LEVEL_L1 = 0,
+	CACHE_LEVEL_L2,
+	CACHE_LEVEL_NOCACHE,
+};
+
+/*
+ * Classify the register specification of MSR_MISC_FEATURE_CONTROL depending on
+ * the target model.
+ *
+ * Note: This enum type is added for future extension. Currently, this driver
+ * supports only one TYPE. However, the register has several different
+ * specifications. For example, XEON_PHI series (06_57H, 06_85H) has different
+ * specification for each bits as follow:
+ * [0]    DCU Hardware Prefetcher Disable (R/W)
+ * [1]    L2 Hardware Prefetcher Disable (R/W)
+ * [63:2] Reserved
+ */
+enum register_type {
+	/*
+	 * The register specification for each bits of REGISTER_TYPE_BROADWELL
+	 * is as follow:
+	 * [0]    L2 Hardware Prefetcher Disable (R/W)
+	 * [1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
+	 * [2]    DCU Hardware Prefetcher Disable (R/W)
+	 * [3]    DCU IP Prefetcher Disable (R/W)
+	 * [63:4] Reserved
+	 */
+	REGISTER_TYPE_BROADWELL,
+	REGISTER_TYPE_NONE, /* Hardware prefetch is not supported */
+};
+
+#define TYPE_BROADWELL_L1_FIELD		GENMASK_ULL(3, 2)
+#define TYPE_BROADWELL_L2_FIELD		GENMASK_ULL(1, 0)
+
+static enum register_type hwpf_rtype;
+
+static unsigned int cache_leaves;
+
+static enum cache_level get_cache_level(unsigned int level)
+{
+	if (level >= cache_leaves)
+		return CACHE_LEVEL_NOCACHE;
+
+	return level;
+}
+
+/*
+ * Return: 0 if enabled, 1 if disabled, negative errno if an error occurs.
+ * If any bit in TYPE_XXX_FIELD is set to 1, it is considered disabled.
+ */
+static int get_enable(unsigned int cpu, unsigned int level)
+{
+	u64 reg;
+	int ret, val;
+	enum cache_level clevel;
+
+	clevel = get_cache_level(level);
+	if (clevel == CACHE_LEVEL_NOCACHE)
+		return -EINVAL;
+
+	ret = rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, &reg);
+	if (ret)
+		return ret;
+
+	switch (hwpf_rtype) {
+	case REGISTER_TYPE_BROADWELL:
+		switch (clevel) {
+		case CACHE_LEVEL_L1:
+			val = FIELD_GET(TYPE_BROADWELL_L1_FIELD, reg);
+			break;
+		case CACHE_LEVEL_L2:
+			val = FIELD_GET(TYPE_BROADWELL_L2_FIELD, reg);
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (val > 0)
+		return 1;
+	else if (val == 0)
+		return 0;
+	else
+		return -EINVAL;
+}
+
+static int set_enable(unsigned int cpu, unsigned int level, int val)
+{
+	int ret;
+	u64 reg;
+	enum cache_level clevel;
+
+	clevel = get_cache_level(level);
+	if (clevel == CACHE_LEVEL_NOCACHE)
+		return -EINVAL;
+
+	ret = rdmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, &reg);
+	if (ret)
+		return ret;
+
+	switch (hwpf_rtype) {
+	case REGISTER_TYPE_BROADWELL:
+		if (val == 1)
+			val = 0x3;
+		else if (val == 0)
+			val = 0x0;
+		else
+			return -EINVAL;
+
+		switch (clevel) {
+		case CACHE_LEVEL_L1:
+			reg &= ~TYPE_BROADWELL_L1_FIELD;
+			reg |= FIELD_PREP(TYPE_BROADWELL_L1_FIELD, val);
+			break;
+		case CACHE_LEVEL_L2:
+			reg &= ~TYPE_BROADWELL_L2_FIELD;
+			reg |= FIELD_PREP(TYPE_BROADWELL_L2_FIELD, val);
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	ret = wrmsrl_on_cpu(cpu, MSR_MISC_FEATURE_CONTROL, reg);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+
+struct hwpf_driver intel_hwpf_driver = {
+	.get_enable		= get_enable,
+	.set_enable		= set_enable,
+};
+
+enum register_type __init get_register_type(void)
+{
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return REGISTER_TYPE_NONE;
+
+	switch (boot_cpu_data.x86_model) {
+	/*
+	 * Note: In addition to BROADWELL, NEHALEM and others have same register
+	 * specifications as REGISTER_TYPE_BROADWELL. If you want to add support
+	 * for these processor, add the target model case here.
+	 */
+	case INTEL_FAM6_BROADWELL_X:
+		return REGISTER_TYPE_BROADWELL;
+	default:
+		return REGISTER_TYPE_NONE;
+	}
+}
+
+static int __init setup_hwpf_driver_params(void)
+{
+	hwpf_rtype = get_register_type();
+
+	switch (hwpf_rtype) {
+	case REGISTER_TYPE_BROADWELL:
+		cache_leaves = 2;
+		break;
+	default:
+		return -ENODEV;
+	}
+
+	intel_hwpf_driver.cache_leaves = cache_leaves;
+
+	return 0;
+}
+
+static int __init hwpf_init(void)
+{
+	int ret;
+
+	ret = setup_hwpf_driver_params();
+	if (ret < 0)
+		return ret;
+
+	ret = hwpf_register_driver(&intel_hwpf_driver);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void __exit hwpf_exit(void)
+{
+	hwpf_unregister_driver(&intel_hwpf_driver);
+}
+
+module_init(hwpf_init);
+module_exit(hwpf_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("FUJITSU LIMITED");
+MODULE_DESCRIPTION("Intel Hardware Prefetch Driver");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 4/5] driver: hwpf: Add Kconfig/Makefile to build hardware prefetch driver
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
                   ` (2 preceding siblings ...)
  2021-11-04  5:21 ` [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel " Kohei Tarumizu
@ 2021-11-04  5:21 ` Kohei Tarumizu
  2021-11-04  5:21 ` [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This adds kconfig/Makefile to build hardware prefetch driver for
A64FX and Intel support. This also add MAINTAINERS entry.

Note that this is the first time to add A64FX specific driver,
this adds A64FX entry in Kconfig.platforms of arm64 Kconfig.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 MAINTAINERS                  |  7 +++++++
 arch/arm64/Kconfig.platforms |  6 ++++++
 arch/x86/Kconfig             | 12 ++++++++++++
 drivers/Kconfig              |  2 ++
 drivers/Makefile             |  1 +
 drivers/hwpf/Kconfig         | 24 ++++++++++++++++++++++++
 drivers/hwpf/Makefile        |  9 +++++++++
 7 files changed, 61 insertions(+)
 create mode 100644 drivers/hwpf/Kconfig
 create mode 100644 drivers/hwpf/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index f26920f0f..29ad0e613 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1588,6 +1588,13 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
 F:	arch/arm/mach-*/
 F:	arch/arm/plat-*/
 
+HARDWARE PREFETCH DRIVERS
+M:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
+L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/hwpf/
+F:	include/linux/hwpf.h
+
 ARM/ACTIONS SEMI ARCHITECTURE
 M:	Andreas Färber <afaerber@suse.de>
 M:	Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index b0ce18d4c..8ecbcd0b7 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -20,6 +20,12 @@ config ARCH_SUNXI
 	help
 	  This enables support for Allwinner sunxi based SoCs like the A64.
 
+config ARCH_A64FX
+	bool "Fujitsu A64FX Platforms"
+	select ARCH_HAS_HARDWARE_PREFETCH
+	help
+	  This enables support for Fujitsu A64FX SoC family.
+
 config ARCH_ALPINE
 	bool "Annapurna Labs Alpine platform"
 	select ALPINE_MSI if PCI
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7e1..d60ec8eb7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1356,6 +1356,18 @@ config X86_CPUID
 	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
 	  /dev/cpu/31/cpuid.
 
+config INTEL_HARDWARE_PREFETCH
+	tristate "Intel Hardware Prefetch support"
+	select ARCH_HAS_HARDWARE_PREFETCH
+	select HARDWARE_PREFETCH
+	depends on X86_64
+	help
+	  This option enables a Hardware Prefetch sysfs interface.
+	  This requires an Intel processor that has MSR about Hardware Prefetch.
+
+	  See Documentation/ABI/testing/sysfs-devices-system-cpu for more
+	  information.
+
 choice
 	prompt "High Memory Support"
 	default HIGHMEM4G
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0d399ddaa..c46702569 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,6 @@ source "drivers/interconnect/Kconfig"
 source "drivers/counter/Kconfig"
 
 source "drivers/most/Kconfig"
+
+source "drivers/hwpf/Kconfig"
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index be5d40ae1..8cb2e42f6 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -188,3 +188,4 @@ obj-$(CONFIG_GNSS)		+= gnss/
 obj-$(CONFIG_INTERCONNECT)	+= interconnect/
 obj-$(CONFIG_COUNTER)		+= counter/
 obj-$(CONFIG_MOST)		+= most/
+obj-$(CONFIG_HARDWARE_PREFETCH)	+= hwpf/
diff --git a/drivers/hwpf/Kconfig b/drivers/hwpf/Kconfig
new file mode 100644
index 000000000..e011fa6e0
--- /dev/null
+++ b/drivers/hwpf/Kconfig
@@ -0,0 +1,24 @@
+config ARCH_HAS_HARDWARE_PREFETCH
+	bool
+
+menuconfig HARDWARE_PREFETCH
+	bool "Hardware Prefetch Control"
+	depends on ARCH_HAS_HARDWARE_PREFETCH
+	default y
+	help
+	  Hardware Prefetch Control Driver
+
+	  This driver allows you to control the Hardware Prefetch mechanism.
+	  If the hardware supports the mechanism, it provides a sysfs interface
+	  for changing the feature's enablement, prefetch distance and strongness.
+
+if HARDWARE_PREFETCH
+
+config A64FX_HARDWARE_PREFETCH
+	tristate "A64FX Hardware Prefetch support"
+	depends on ARCH_A64FX
+	default m
+	help
+	  This adds Hardware Prefetch driver support for A64FX SOCs.
+
+endif
diff --git a/drivers/hwpf/Makefile b/drivers/hwpf/Makefile
new file mode 100644
index 000000000..6790eb2d2
--- /dev/null
+++ b/drivers/hwpf/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+# Hardware prefetch core driver
+obj-$(CONFIG_HARDWARE_PREFETCH) += hwpf.o
+
+# FUJITSU SoC driver
+obj-$(CONFIG_A64FX_HARDWARE_PREFETCH) += fujitsu_hwpf.o
+
+# Intel SoC driver
+obj-$(CONFIG_INTEL_HARDWARE_PREFETCH) += intel_hwpf.o
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
                   ` (3 preceding siblings ...)
  2021-11-04  5:21 ` [RFC PATCH v2 4/5] driver: hwpf: Add Kconfig/Makefile to build " Kohei Tarumizu
@ 2021-11-04  5:21 ` Kohei Tarumizu
  2021-11-04 14:55   ` Dave Hansen
  2021-11-04 15:13 ` [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Borislav Petkov
  2021-11-04 17:10 ` Peter Zijlstra
  6 siblings, 1 reply; 22+ messages in thread
From: Kohei Tarumizu @ 2021-11-04  5:21 UTC (permalink / raw)
  To: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel
  Cc: tarumizu.kohei

This descrives the sysfs interface implemented on the hardware prefetch
driver.

Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 58 +++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b46ef1476..caeefd320 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -666,3 +666,61 @@ Description:	Preferred MTE tag checking mode
 		================  ==============================================
 
 		See also: Documentation/arm64/memory-tagging-extension.rst
+
+What:		/sys/devices/system/cpu/cpu*/hwpf/l*/enable
+		/sys/devices/system/cpu/cpu*/hwpf/l*/available_dist
+		/sys/devices/system/cpu/cpu*/hwpf/l*/dist
+		/sys/devices/system/cpu/cpu*/hwpf/l*/reliable
+Date:		October 2021
+Contact:	Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
+		Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Parameters for the hardware prefetch driver
+
+		This sysfs interface provides Hardware Prefetch (HWPF) tunable
+		attribute files by using implementation defined registers.
+		These attribute files are corresponding to the cache level of
+		the parent directory.
+
+		enable:
+			Read/write interface to change hardware prefetch
+			enablement.
+			Read returns hardware prefetch enablement status:
+				0: hardware prefetch is enabled
+				1: hardware prefetch is disabled
+
+			Write '0' to enable Hardware Prefetch.
+			Write '1' to disable Hardware Prefetch.
+
+		available_dist:
+			Read only interface to get a list of values that can be
+			written to dist.
+
+		dist:
+			Read/write interface to specify the hardware prefetch
+			distance.
+			Read return the current hardware prefetch distance value
+			in bytes or the string "auto".
+
+			Write either a value in byte read from available_dist,
+			or the string "auto" to this attribuite. If you write
+			a value less than these, the value is rounded up.
+
+			The value 0 and the string "auto" are the same and have
+			a special meaning. This means that instead of setting
+			dist to a user-specified value, it operates using
+			hardware-specific values.
+
+		strong:
+			Read/write interface to change hardware prefetch
+			strongness.
+			Strong prefetch operation is surely executed, if there
+			is no corresponding data in cache.
+			Weak prefetch operation allows the hardware not to
+			execute operation depending on hardware state.
+
+			Read returns hardware prefetch strongness status:
+				0: hardware prefetch is generated strong
+				1: hardware prefetch is generated weak
+
+			Write '0' to hardware prefetch generate strong.
+			Write '1' to hardware prefetch generate weak.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-04  5:21 ` [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
@ 2021-11-04 14:55   ` Dave Hansen
  2021-11-08  1:29     ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2021-11-04 14:55 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel

On 11/3/21 10:21 PM, Kohei Tarumizu wrote:
> +What:		/sys/devices/system/cpu/cpu*/hwpf/l*/enable
> +		/sys/devices/system/cpu/cpu*/hwpf/l*/available_dist
> +		/sys/devices/system/cpu/cpu*/hwpf/l*/dist
> +		/sys/devices/system/cpu/cpu*/hwpf/l*/reliable

How does this look in practice?

	# ls /sys/devices/system/cpu/cpu0/hwpf/
	l0
	l1
	l2
	...

?

Dumb question, but why don't we give these things names?  If the Intel
one is called "L2 Hardware Prefetcher Disable", couldn't the directory
be "l2_prefetch"?

BTW, your "reliable" is mismatched with the "strong" value in the docs.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
                   ` (4 preceding siblings ...)
  2021-11-04  5:21 ` [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
@ 2021-11-04 15:13 ` Borislav Petkov
  2021-11-08  2:17   ` tarumizu.kohei
  2021-11-04 17:10 ` Peter Zijlstra
  6 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2021-11-04 15:13 UTC (permalink / raw)
  To: Kohei Tarumizu
  Cc: catalin.marinas, will, tglx, mingo, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

On Thu, Nov 04, 2021 at 02:21:17PM +0900, Kohei Tarumizu wrote:
> This patch series add hardware prefetch driver register/unregister
> function. The purpose of this driver is to provide an interface to
> control the hardware prefetch mechanism depending on the application
> characteristics.

This is all fine and dandy but what I'm missing in this pile of text -
at least I couldn't find it - is why do we need this in the upstream
kernel?

Is there some real-life use case that would benefit from software
fiddling with prefetchers or is this one of those, well, we have those
controls, lets expose them in the OS?

IOW, you need to sell this stuff properly first - then talk design.

>  create mode 100644 drivers/hwpf/Kconfig
>  create mode 100644 drivers/hwpf/Makefile
>  create mode 100644 drivers/hwpf/fujitsu_hwpf.c
>  create mode 100644 drivers/hwpf/hwpf.c
>  create mode 100644 drivers/hwpf/intel_hwpf.c
>  create mode 100644 include/linux/hwpf.h

I'm not sure about a wholly separate drivers/hwpf/ - it's not like there
are gazillion different hw prefetch drivers.

HTH.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
                   ` (5 preceding siblings ...)
  2021-11-04 15:13 ` [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Borislav Petkov
@ 2021-11-04 17:10 ` Peter Zijlstra
  2021-11-08  2:36   ` tarumizu.kohei
  6 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2021-11-04 17:10 UTC (permalink / raw)
  To: Kohei Tarumizu
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

On Thu, Nov 04, 2021 at 02:21:17PM +0900, Kohei Tarumizu wrote:
> This patch series add hardware prefetch driver register/unregister
> function. The purpose of this driver is to provide an interface to
> control the hardware prefetch mechanism depending on the application
> characteristics.

Here you talk about applications..

> An earlier RFC[1], we were suggested that we create a hardware
> prefetch directory under /sys/devices/system/cpu/[CPUNUM]/cache.
> Hardware prefetch is a cache-related feature, but it does not require
> cache sysfs feature. Therefore, we decided to isolate the code.
> Specifically, create a directory under cpu/[CPUNUM].

Here you talk about CPUs..

How does that work?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-04 14:55   ` Dave Hansen
@ 2021-11-08  1:29     ` tarumizu.kohei
  2021-11-08  1:49       ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-08  1:29 UTC (permalink / raw)
  To: 'Dave Hansen',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

Hi,

Thanks for your comment.

> How does this look in practice?

It works on a x86 machine is shown below:

# find /sys/devices/system/cpu/cpu0/hwpf/
/sys/devices/system/cpu/cpu0/hwpf/
/sys/devices/system/cpu/cpu0/hwpf/l2
/sys/devices/system/cpu/cpu0/hwpf/l2/enable
/sys/devices/system/cpu/cpu0/hwpf/l1
/sys/devices/system/cpu/cpu0/hwpf/l1/enable

> Dumb question, but why don't we give these things names?  If the Intel one is
> called "L2 Hardware Prefetcher Disable", couldn't the directory be "l2_prefetch"?

There is no specific reason for directory names. We named it "l*"
because it is related to a certain cache level. We would change it,
if there is another suitable name.

> BTW, your "reliable" is mismatched with the "strong" value in the docs.

Sorry, that's our mistake. The "strong" is correct.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-08  1:29     ` tarumizu.kohei
@ 2021-11-08  1:49       ` Dave Hansen
  2021-11-09  9:41         ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2021-11-08  1:49 UTC (permalink / raw)
  To: tarumizu.kohei, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel

On 11/7/21 5:29 PM, tarumizu.kohei@fujitsu.com wrote:
>> How does this look in practice?
> It works on a x86 machine is shown below:
> 
> # find /sys/devices/system/cpu/cpu0/hwpf/
> /sys/devices/system/cpu/cpu0/hwpf/
> /sys/devices/system/cpu/cpu0/hwpf/l2
> /sys/devices/system/cpu/cpu0/hwpf/l2/enable
> /sys/devices/system/cpu/cpu0/hwpf/l1
> /sys/devices/system/cpu/cpu0/hwpf/l1/enable
> 
>> Dumb question, but why don't we give these things names?  If the Intel one is
>> called "L2 Hardware Prefetcher Disable", couldn't the directory be "l2_prefetch"?
> There is no specific reason for directory names. We named it "l*"
> because it is related to a certain cache level. We would change it,
> if there is another suitable name.

Ahh, so you really do intend the l2 directory to be for *all* the L2
prefetchers?  I guess that's OK, but will folks ever want to do "L2
Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
Disable"?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel to hardware prefetch driver
  2021-11-04  5:21 ` [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel " Kohei Tarumizu
@ 2021-11-08  1:51   ` Dave Hansen
  2021-11-09  9:44     ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2021-11-08  1:51 UTC (permalink / raw)
  To: Kohei Tarumizu, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel

On 11/3/21 10:21 PM, Kohei Tarumizu wrote:
> +enum register_type __init get_register_type(void)
> +{
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +		return REGISTER_TYPE_NONE;
> +
> +	switch (boot_cpu_data.x86_model) {
> +	/*
> +	 * Note: In addition to BROADWELL, NEHALEM and others have same register
> +	 * specifications as REGISTER_TYPE_BROADWELL. If you want to add support
> +	 * for these processor, add the target model case here.
> +	 */
> +	case INTEL_FAM6_BROADWELL_X:
> +		return REGISTER_TYPE_BROADWELL;
> +	default:
> +		return REGISTER_TYPE_NONE;
> +	}
> +}

Can you do this with a struct and x86_match_cpu() instead?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-04 15:13 ` [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Borislav Petkov
@ 2021-11-08  2:17   ` tarumizu.kohei
  2021-11-10  8:34     ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-08  2:17 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: catalin.marinas, will, tglx, mingo, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

Hi,

Thanks for your comment.

> This is all fine and dandy but what I'm missing in this pile of text - at least I couldn't
> find it - is why do we need this in the upstream kernel?
> 
> Is there some real-life use case that would benefit from software fiddling with
> prefetchers or is this one of those, well, we have those controls, lets expose them
> in the OS?
> 
> IOW, you need to sell this stuff properly first - then talk design.

A64FX and some Intel processors has implementation-dependent register
for controlling hardware prefetch. Intel has MSR_MISC_FEATURE_CONTROL,
and A64FX has IMP_PF_STREAM_DETECT_CTRL_EL0. These register cannot be
accessed from userspace, so we provide a proper kernel interface.

The advantage of using this interface from userspace is that we can
expect performance improvements.

The following performance improvements have been reported for some
Intel processors.
https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

A64FX also has several applications that have actually been improved
performance. In most of these cases, we are tuning the parameter of
hardware prefetch distance. One of them is the Stream benchmark.

For reference, here is the result of STREAM Triad when tuning with
the dist attribute file in L1 and L2 cache on A64FX.

| dist combination  | Pattern A   | Pattern B   |
|-------------------|-------------|-------------|
| L1:256,  L2:1024  | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024  | 279172.8742 | 118979.4542 |
| L1:256,  L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

For these reasons, we would like to add this interface to the
upstream kernel.

> I'm not sure about a wholly separate drivers/hwpf/ - it's not like there are
> gazillion different hw prefetch drivers.

We created a new directory to lump multiple separate files into one
place. We don't think this is a good way. If there is any other
suitable way, we would like to change it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-04 17:10 ` Peter Zijlstra
@ 2021-11-08  2:36   ` tarumizu.kohei
  0 siblings, 0 replies; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-08  2:36 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

Hi,

Thanks for your comment.

> Here you talk about applications..

> Here you talk about CPUs..
> 
> How does that work?

Does your question mean how users tune their applications? We intend
to use it as follows:
1.) User tunes the hardware prefetch parameters of a particular CPUs
    via sysfs interface.
2.) Execute the application bound to the specific CPUs set above.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-08  1:49       ` Dave Hansen
@ 2021-11-09  9:41         ` tarumizu.kohei
  2021-11-09 17:44           ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-09  9:41 UTC (permalink / raw)
  To: 'Dave Hansen',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

> Ahh, so you really do intend the l2 directory to be for *all* the L2
> prefetchers?

Yes, we intend to create the l2 directory for *all* the L2 prefetchers
(i.e. "L2 Hardware Prefetcher Disable" and "L2 Adjacent Cache Line
Prefetcher Disable). 

> I guess that's OK, but will folks ever want to do "L2
> Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
> Disable"?

There are people who actually tested the performance improvement[1].

[1]https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

In this report, write 5 to MSR 0x1a4 (i.e. "L2 Hardware Prefetcher
Disable", but not "L2 Adjacent Cache Line Prefetcher Disable")
on i7-5930K for best performance. If such tuning is possible, it may
be useful for some people.

We describe how to deal these parameters in our sysfs interface at
"[RFC & Future plan]" section in the cover letter(0/5), but we can't
come up with any good ideas.

We thought that the sysfs interface should be generic and common,
and avoid showing architecture-dependent specifications.

We have considered the Proposal B that multiple hardware prefetch
types in one enable attribute file at above section. However, in
order to use it, we have to know the register specification, so we
think it is not appropriate.

Do you have any idea how to represent architecture-dependent
specifications in sysfs interface?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel to hardware prefetch driver
  2021-11-08  1:51   ` Dave Hansen
@ 2021-11-09  9:44     ` tarumizu.kohei
  0 siblings, 0 replies; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-09  9:44 UTC (permalink / raw)
  To: 'Dave Hansen',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

> Can you do this with a struct and x86_match_cpu() instead?

Yes, we would like to replace it in next version, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-09  9:41         ` tarumizu.kohei
@ 2021-11-09 17:44           ` Dave Hansen
  2021-11-10  9:25             ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2021-11-09 17:44 UTC (permalink / raw)
  To: tarumizu.kohei, catalin.marinas, will, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-arm-kernel, linux-kernel

On 11/9/21 1:41 AM, tarumizu.kohei@fujitsu.com wrote:
>> I guess that's OK, but will folks ever want to do "L2
>> Hardware Prefetcher Disable", but not "L2 Adjacent Cache Line Prefetcher
>> Disable"?
> There are people who actually tested the performance improvement[1].
> 
> [1]https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184
> 
> In this report, write 5 to MSR 0x1a4 (i.e. "L2 Hardware Prefetcher
> Disable", but not "L2 Adjacent Cache Line Prefetcher Disable")
> on i7-5930K for best performance. If such tuning is possible, it may
> be useful for some people.
> 
> We describe how to deal these parameters in our sysfs interface at
> "[RFC & Future plan]" section in the cover letter(0/5), but we can't
> come up with any good ideas.
> 
> We thought that the sysfs interface should be generic and common,
> and avoid showing architecture-dependent specifications.
> 
> We have considered the Proposal B that multiple hardware prefetch
> types in one enable attribute file at above section. However, in
> order to use it, we have to know the register specification, so we
> think it is not appropriate.
> 
> Do you have any idea how to represent architecture-dependent
> specifications in sysfs interface?

First, I'd give them real names.

Second, I'd link them to the level or levels of the cache that they effect.

Third, I'd make sure that it is clear what caches it affects.

We have a representation of the caches in:

	/sys/devices/system/cpu/cpu*/cache

It would be a shame to ignore those.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-08  2:17   ` tarumizu.kohei
@ 2021-11-10  8:34     ` Borislav Petkov
  2021-11-18  6:14       ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2021-11-10  8:34 UTC (permalink / raw)
  To: tarumizu.kohei
  Cc: catalin.marinas, will, tglx, mingo, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

On Mon, Nov 08, 2021 at 02:17:43AM +0000, tarumizu.kohei@fujitsu.com wrote:
> The following performance improvements have been reported for some
> Intel processors.
> https://github.com/xmrig/xmrig/issues/1433#issuecomment-572126184

Yes, I know about that use case.

> For these reasons, we would like to add this interface to the
> upstream kernel.

So put all those justifications at the beginning of your 0th message
when you send a patchset so that it is clear to reviewers *why* you're
doing this. The "why" is the most important - everything else comes
after.

> > I'm not sure about a wholly separate drivers/hwpf/ - it's not like there are
> > gazillion different hw prefetch drivers.
> 
> We created a new directory to lump multiple separate files into one
> place. We don't think this is a good way. If there is any other
> suitable way, we would like to change it.

Well, how many prefetcher drivers will be there?

On x86 there will be one per vendor, so 2-3 the most...

Also, as dhansen points out, we have already

  /sys/devices/system/cpu/cpu*/cache

so all those knobs belong there on x86.

Also, I think that shoehorning all these different cache architectures
and different prefetcher knobs which are available from each CPU, into a
common sysfs hierarchy is going to cause a lot of ugly ifdeffery if not
done right.

Some caches will have control A while others won't - they will have
control B so people will wonder why control A works on box B_a but not
on box B_b...

So we have to be very careful what we expose to userspace because it
becomes an ABI which we have to support for an indefinite time.

Also, if you're going to give the xmrig example, then we should involve
the xmrig people and ask them whether the stuff you're exposing to
userspace is good for their use case.

And so on and so on...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of hardware prefetch driver
  2021-11-09 17:44           ` Dave Hansen
@ 2021-11-10  9:25             ` tarumizu.kohei
  0 siblings, 0 replies; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-10  9:25 UTC (permalink / raw)
  To: 'Dave Hansen',
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

> First, I'd give them real names.
> 
> Second, I'd link them to the level or levels of the cache that they effect.
> 
> Third, I'd make sure that it is clear what caches it affects.
> 
> We have a representation of the caches in:
> 
> 	/sys/devices/system/cpu/cpu*/cache
> 
> It would be a shame to ignore those.

Thank you for your advice.
I try to reconsider the design.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-10  8:34     ` Borislav Petkov
@ 2021-11-18  6:14       ` tarumizu.kohei
  2021-11-18  7:09         ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-18  6:14 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: catalin.marinas, will, tglx, mingo, dave.hansen, x86, hpa,
	linux-arm-kernel, linux-kernel

I'm sorry for the late reply.

> So put all those justifications at the beginning of your 0th message
> when you send a patchset so that it is clear to reviewers *why* you're
> doing this. The "why" is the most important - everything else comes
> after.

I understand. The next time we send a patchset, put these
justifications at the beginning of our 0th message.

> Well, how many prefetcher drivers will be there?
> 
> On x86 there will be one per vendor, so 2-3 the most…

Currentry, we plan to support only two drivers for Intel and A64FX.
Even if we support other vendors, it will probably increase only a
little.

>> We don't think this is a good way. If there is any other suitable
>> way, we would like to change it.

This means that our way is not good. Therefore, we would like to
reconsider the file structure along with changes in the interface
specification.

> Also, as dhansen points out, we have already
> 
>   /sys/devices/system/cpu/cpu*/cache
> 
> so all those knobs belong there on x86.

Intel MSR and A64FX have hardware prefetcher that affect L1d cache and
L2 cache. Does it suit your intention to create a prefetcher directory
under the cache directory as below?

/sys/devices/system/cpu/cpu*/cache/
                index0/prefetcher/enable
                index2/prefetcher/enable

The above example presumes that the L1d cache is at index0 (level: 1,
type: Data) and the L2 cache is at index2 (level:2, type: Unified).

> Also, I think that shoehorning all these different cache architectures
> and different prefetcher knobs which are available from each CPU, into a
> common sysfs hierarchy is going to cause a lot of ugly ifdeffery if not
> done right.
> 
> Some caches will have control A while others won't - they will have
> control B so people will wonder why control A works on box B_a but not
> on box B_b...
> 
> So we have to be very careful what we expose to userspace because it
> becomes an ABI which we have to support for an indefinite time.

To avoid shoehorning different prefetchers in a common sysfs hierarchy,
we would like to represent these to different hierarchy. 

Intel MSR has three type of prefetchers, and we represent "Hardware
Prefethcer" as "hwpf", "Adjacent Cache Line Prefetcher" as "aclpf",
and "IP Prefetcher" as "ippf". These prefetcher have one controllable
parameter "disable".

A64FX has one type of prefetcher, and we represent it as "hwpf". This
prefetcher has three parameter "disable", "dist" and "strong".

The following table shows which caches are affected by the combination
of prefetcher and parameter.

| Cache affected | Combination ([prefecher]/[parameter]) |
|----------------|---------------------------------------|
| Intel MSR L1d  | hwpf/disable, ippf/disable            |
| Intel MSR L2   | hwpf/disable, aclpf/disable           |
| A64FX L1d      | hwpf/disable, hwpf/dist, hwpf/strong  |
| A64FX L2       | hwpf/disable, hwpf/dist, hwpf/strong  |

Does it make sense to create sysfs directories as below?

* For Intel MSR
/.../index0/prefetcher/hwpf/enable
/.../index0/prefetcher/ippf/enable
/.../index2/prefetcher/hwpf/enable
/.../index2/prefetcher/aclpf/enable

* For A64FX
/.../index[0,2]/prefetcher/hwpf/enable
/.../index[0,2]/prefetcher/hwpf/dist
/.../index[0,2]/prefetcher/hwpf/strong

> Also, if you're going to give the xmrig example, then we should involve
> the xmrig people and ask them whether the stuff you're exposing to
> userspace is good for their use case.

We would like to ask them when the interface specification is fixed to
some extent.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-18  6:14       ` tarumizu.kohei
@ 2021-11-18  7:09         ` tarumizu.kohei
  2021-12-06  9:30           ` tarumizu.kohei
  0 siblings, 1 reply; 22+ messages in thread
From: tarumizu.kohei @ 2021-11-18  7:09 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'catalin.marinas@arm.com', 'will@kernel.org',
	'tglx@linutronix.de', 'mingo@redhat.com',
	'dave.hansen@linux.intel.com', 'x86@kernel.org',
	'hpa@zytor.com',
	'linux-arm-kernel@lists.infradead.org',
	'linux-kernel@vger.kernel.org'

> Does it make sense to create sysfs directories as below?
> 
> * For Intel MSR
> /.../index0/prefetcher/hwpf/enable
> /.../index0/prefetcher/ippf/enable
> /.../index2/prefetcher/hwpf/enable
> /.../index2/prefetcher/aclpf/enable
> 
> * For A64FX
> /.../index[0,2]/prefetcher/hwpf/enable
> /.../index[0,2]/prefetcher/hwpf/dist
> /.../index[0,2]/prefetcher/hwpf/strong

There was a mistake in the attribute file name. The following is
correct.

* For Intel MSR
/.../index0/prefetcher/hwpf/disable
/.../index0/prefetcher/ippf/disable
/.../index2/prefetcher/hwpf/disable
/.../index2/prefetcher/aclpf/disable

* For A64FX
/.../index[0,2]/prefetcher/hwpf/disable
/.../index[0,2]/prefetcher/hwpf/dist
/.../index[0,2]/prefetcher/hwpf/strong

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors
  2021-11-18  7:09         ` tarumizu.kohei
@ 2021-12-06  9:30           ` tarumizu.kohei
  0 siblings, 0 replies; 22+ messages in thread
From: tarumizu.kohei @ 2021-12-06  9:30 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'catalin.marinas@arm.com', 'will@kernel.org',
	'tglx@linutronix.de', 'mingo@redhat.com',
	'dave.hansen@linux.intel.com', 'x86@kernel.org',
	'hpa@zytor.com',
	'linux-arm-kernel@lists.infradead.org',
	'linux-kernel@vger.kernel.org'

>> Also, as dhansen points out, we have already
>> 
>>   /sys/devices/system/cpu/cpu*/cache
>> 
>> so all those knobs belong there on x86.
> 
> Intel MSR and A64FX have hardware prefetcher that affect L1d cache and
> L2 cache. Does it suit your intention to create a prefetcher directory
> under the cache directory as below?
> 
> /sys/devices/system/cpu/cpu*/cache/
>                 index0/prefetcher/enable
>                 index2/prefetcher/enable
> 
> The above example presumes that the L1d cache is at index0 (level: 1,
> type: Data) and the L2 cache is at index2 (level:2, type: Unified).

Any comment or suggestion would be much appreciated. In particular,
is our using cache/index directory above match your intent?

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-12-06  9:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04  5:21 [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Kohei Tarumizu
2021-11-04  5:21 ` [RFC PATCH v2 1/5] driver: hwpf: Add hardware prefetch core driver register/unregister functions Kohei Tarumizu
2021-11-04  5:21 ` [RFC PATCH v2 2/5] driver: hwpf: Add support for A64FX to hardware prefetch driver Kohei Tarumizu
2021-11-04  5:21 ` [RFC PATCH v2 3/5] driver: hwpf: Add support for Intel " Kohei Tarumizu
2021-11-08  1:51   ` Dave Hansen
2021-11-09  9:44     ` tarumizu.kohei
2021-11-04  5:21 ` [RFC PATCH v2 4/5] driver: hwpf: Add Kconfig/Makefile to build " Kohei Tarumizu
2021-11-04  5:21 ` [RFC PATCH v2 5/5] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu
2021-11-04 14:55   ` Dave Hansen
2021-11-08  1:29     ` tarumizu.kohei
2021-11-08  1:49       ` Dave Hansen
2021-11-09  9:41         ` tarumizu.kohei
2021-11-09 17:44           ` Dave Hansen
2021-11-10  9:25             ` tarumizu.kohei
2021-11-04 15:13 ` [RFC PATCH v2 0/5] Add hardware prefetch driver for A64FX and Intel processors Borislav Petkov
2021-11-08  2:17   ` tarumizu.kohei
2021-11-10  8:34     ` Borislav Petkov
2021-11-18  6:14       ` tarumizu.kohei
2021-11-18  7:09         ` tarumizu.kohei
2021-12-06  9:30           ` tarumizu.kohei
2021-11-04 17:10 ` Peter Zijlstra
2021-11-08  2:36   ` tarumizu.kohei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).