linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kohei Tarumizu <tarumizu.kohei@fujitsu.com>
To: catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, fenghua.yu@intel.com,
	reinette.chatre@intel.com
Cc: tarumizu.kohei@fujitsu.com
Subject: [PATCH v3 0/9] Add hardware prefetch control driver for A64FX and x86
Date: Wed, 20 Apr 2022 12:02:14 +0900	[thread overview]
Message-ID: <20220420030223.689259-1-tarumizu.kohei@fujitsu.com> (raw)

This patch series add sysfs interface to control CPU's hardware
prefetch behavior for performance tuning from userspace for the
processor A64FX and x86 (on supported CPU).

Changes from v2:
  - move arm64 driver (arch/arm64) to A64FX only (drivers/soc/fujitsu)
  - prohibit writing 0 to stream_detect_prefetcher_dist
  - change the type of strongness state handled from bool to string
    (e.g. "strong"), and rename to stream_detect_prefetcher_strength
  - change x86 code to work correctly with resctrl's pseudo lock
    - read and write registers in one smp_call_function_single() to
      prevent context switch when writing registers in x86-pfctl.c
    - restore to original value when re-enabling the register in
      pseudo_lock.c
  - add prefix to driver's name for A64FX(a64fx-) and x86(x86-)
  - modify the document
    - split the description of pfctl into blocks for x86 and A64FX
    - remove unnecessary descriptions
https://lore.kernel.org/lkml/20220311101940.3403607-1-tarumizu.kohei@fujitsu.com/

[Background]
============
A64FX and some Intel processors have implementation-dependent register
for controlling CPU's hardware prefetch behavior. A64FX has
IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
(MSR_MISC_FEATURE_CONTROL)[2]. These registers cannot be accessed from
userspace.

[1]https://github.com/fujitsu/A64FX/tree/master/doc/
   A64FX_Specification_HPC_Extension_v1_EN.pdf

[2]https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    Volume 4

The advantage of using this is improved performance. As an example of
performance improvements, the results of running the Stream benchmark
on the A64FX are described in section [Merit].

For MSR 0x1a4, it is also possible to change the value from userspace
via the MSR driver. However, using MSR driver is not recommended, so
it needs a proper kernel interface[3].

[3]https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about/

For these reasons, we provide a new proper kernel interface to control
both IMP_PF_STREAM_DETECT_CTRL_EL0 and MSR 0x1a4.

[Overall design]
================
The source code for this driver is divided into common parts
(driver/base/pfctl.c) and architecture parts (arch/XXX/XXX/pfctl.c).
Common parts is described architecture-independent processing, such as
creating sysfs.
Architecture parts is described architecture-dependent processing. It
must contain at least the what type of hardware prefetcher is supported
and how to read/write to the register. These information are set
through registration function in common parts.

This driver creates "prefetch_control" directory and some attribute
files in every CPU's cache/index[0,2] directory, if CPU supports
hardware prefetch control behavior. Each attribute file corresponds to
the cache level of the parent index directory.

Detailed description of this sysfs interface is in
Documentation/ABI/testing/sysfs-devices-system-cpu (patch8).

This driver needs cache sysfs directory and cache level/type
information. In ARM processor, these information can be obtained
from registers even without ACPI PPTT.
We add processing to create a cache/index directory using only the
information from the register if the machine does not support ACPI
PPTT and Kconfig for hardware prefetch control (CONFIG_HWPF_CONTROL)
is true in patch5.
This action caused a problem and is described in [Known problem].

[Examples]
==========
This section provides an example of using this sysfs interface at the
x86's model of INTEL_FAM6_BROADWELL_X.

This model has the following register specifications:

[0]    L2 Hardware Prefetcher Disable (R/W)
[1]    L2 Adjacent Cache Line Prefetcher Disable (R/W)
[2]    DCU Hardware Prefetcher Disable (R/W)
[3]    DCU IP Prefetcher Disable (R/W)
[63:4] Reserved

In this case, index0 (L1d cache) corresponds to bit[2,3] and index2
(L2 cache) corresponds to bit [0,1]. A list of attribute files of
index0 and index2 in CPU1 at BROADWELL_X is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

hardware_prefetcher_enable
ip_prefetcher_enable

# ls /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/

adjacent_cache_line_prefetcher_enable
hardware_prefetcher_enable
```

If user would like to disable the setting of "L2 Adjacent Cache Line
Prefetcher Disable (R/W)" in CPU1, do the following:

```
# echo 0 > /sys/devices/system/cpu/cpu1/cache/index2/prefetch_control/adjacent_cache_line_prefetcher_enable
```

In another example, a list of index0 at A64FX is following:

```
# ls /sys/devices/system/cpu/cpu1/cache/index0/prefetch_control/

stream_detect_prefetcher_dist
stream_detect_prefetcher_enable
stream_detect_prefetcher_strength
stream_detect_prefetcher_strength_available
```

[Patch organizations]
=====================
This patch series add hardware prefetch control core driver for A64FX
and x86. Also, we add support for A64FX and BROADWELL_X at x86.

- patch1: Add hardware prefetch core driver

  This driver provides a register/unregister function to create the
  "prefetch_control" directory and some attribute files in every CPU's
  cache/index[0,2] directory.
  If the architecture has control of the CPU's hardware prefetch
  behavior, use this function to create sysfs. When registering, it
  is necessary to provide what type of Hardware Prefetcher is
  supported and how to read/write to the register.

- patch2: Add Kconfig/Makefile to build hardware prefetch control core
  driver

- patch3: Add support for A64FX

  This adds module init/exit code, and creates sysfs attribute file
  "stream_detect_prefetcher_enable", "stream_detect_prefetcher_strong"
  and "stream_detect_prefetcher_dist" for A64FX. This driver works only
  if part number is FUJITSU_CPU_PART_A64FX at this point.

- patch4: Add Kconfig/Makefile to build driver for A64FX

- patch5: Create cache sysfs directory without ACPI PPTT for hardware
  prefetch control

  Hardware Prefetch control driver needs cache sysfs directory and cache
  level/type information. In ARM processor, these information can be
  obtained from registers(CLIDR_EL1) even without PPTT. Therefore, we
  set the cpu_map_populated to true to create cache sysfs directory, if
  the machine doesn't have PPTT.

- patch6: Fix to restore to original value when re-enabling hardware
  prefetch register in pseudo_lock.c

  The current pseudo_lock.c code overwrittes the value of the
  MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0.
  Therefore, modify it to save and restore the original values.

- patch7: Add support for x86

  This adds module init/exit code, and creates sysfs attribute file
  "hardware_prefetcher_enable", "ip_prefetcher_enable" and
  "adjacent_cache_line_prefetcher_enable" for x86. This driver works
  only if the model is INTEL_FAM6_BROADWELL_X at this point.

- patch8: Add Kconfig/Makefile to build driver for x86

- patch9: Add documentation for the new sysfs interface

[Known problem]
===============
- `lscpu` command terminates with -ENOENT because cache/index directory
  is exists but shared_cpu_map file does not exist. This is due to
  patch5, which creates a cache/index directory containing only level
  and type without ACPI PPTT.

[Merit]
=======
For reference, here is the result of STREAM Triad when tuning with
the "s file in L1 and L2 cache on A64FX.

| dist combination  | Pattern A   | Pattern B   |
|-------------------|-------------|-------------|
| L1:256,  L2:1024  | 234505.2144 | 114600.0801 |
| L1:1536, L2:1024  | 279172.8742 | 118979.4542 |
| L1:256,  L2:10240 | 247716.7757 | 127364.1533 |
| L1:1536, L2:10240 | 283675.6625 | 125950.6847 |

In pattern A, we set the size of the array to 174720, which is about
half the size of the L1d cache. In pattern B, we set the size of the
array to 10485120, which is about twice the size of the L2 cache.

In pattern A, a change of dist at L1 has a larger effect. On the other
hand, in pattern B, the change of dist at L2 has a larger effect.
As described above, the optimal dist combination depends on the
characteristics of the application. Therefore, such a sysfs interface
is useful for performance tuning.

Best regards,
Kohei Tarumizu

Kohei Tarumizu (9):
  drivers: base: Add hardware prefetch control core driver
  drivers: base: Add Kconfig/Makefile to build hardware prefetch control
    core driver
  soc: fujitsu: Add hardware prefetch control support for A64FX
  soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control
    driver
  arm64: Create cache sysfs directory without ACPI PPTT for hardware
    prefetch control
  x86: resctrl: pseudo_lock: Fix to restore to original value when
    re-enabling hardware prefetch register
  x86: Add hardware prefetch control support for x86
  x86: Add Kconfig/Makefile to build hardware prefetch control driver
  docs: ABI: Add sysfs documentation interface of hardware prefetch
    control driver

 .../ABI/testing/sysfs-devices-system-cpu      |  98 ++++
 MAINTAINERS                                   |   8 +
 arch/arm64/kernel/cacheinfo.c                 |  29 ++
 arch/x86/Kconfig                              |   6 +
 arch/x86/kernel/cpu/Makefile                  |   2 +
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  12 +-
 arch/x86/kernel/cpu/x86-pfctl.c               | 347 +++++++++++++
 drivers/base/Kconfig                          |   9 +
 drivers/base/Makefile                         |   1 +
 drivers/base/pfctl.c                          | 458 ++++++++++++++++++
 drivers/soc/Kconfig                           |   1 +
 drivers/soc/Makefile                          |   1 +
 drivers/soc/fujitsu/Kconfig                   |  11 +
 drivers/soc/fujitsu/Makefile                  |   2 +
 drivers/soc/fujitsu/a64fx-pfctl.c             | 356 ++++++++++++++
 include/linux/pfctl.h                         |  49 ++
 16 files changed, 1387 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/x86-pfctl.c
 create mode 100644 drivers/base/pfctl.c
 create mode 100644 drivers/soc/fujitsu/Kconfig
 create mode 100644 drivers/soc/fujitsu/Makefile
 create mode 100644 drivers/soc/fujitsu/a64fx-pfctl.c
 create mode 100644 include/linux/pfctl.h

-- 
2.27.0


             reply	other threads:[~2022-04-20  3:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-20  3:02 Kohei Tarumizu [this message]
2022-04-20  3:02 ` [PATCH v3 1/9] drivers: base: Add hardware prefetch control core driver Kohei Tarumizu
2022-04-20 21:40   ` Thomas Gleixner
2022-04-22 12:10     ` tarumizu.kohei
2022-04-21  6:18   ` Greg KH
2022-04-22 12:30     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 2/9] drivers: base: Add Kconfig/Makefile to build " Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 3/9] soc: fujitsu: Add hardware prefetch control support for A64FX Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 4/9] soc: fujitsu: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-04-20 22:14   ` Thomas Gleixner
2022-04-20  3:02 ` [PATCH v3 5/9] arm64: Create cache sysfs directory without ACPI PPTT for hardware prefetch control Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 6/9] x86: resctrl: pseudo_lock: Fix to restore to original value when re-enabling hardware prefetch register Kohei Tarumizu
2022-04-20 20:56   ` Dave Hansen
2022-04-22 12:01     ` tarumizu.kohei
2022-04-25 23:17   ` Reinette Chatre
2022-04-27  2:51     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 7/9] x86: Add hardware prefetch control support for x86 Kohei Tarumizu
2022-04-20 22:27   ` Thomas Gleixner
2022-04-22 12:16     ` tarumizu.kohei
2022-04-20  3:02 ` [PATCH v3 8/9] x86: Add Kconfig/Makefile to build hardware prefetch control driver Kohei Tarumizu
2022-04-20  3:02 ` [PATCH v3 9/9] docs: ABI: Add sysfs documentation interface of " Kohei Tarumizu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220420030223.689259-1-tarumizu.kohei@fujitsu.com \
    --to=tarumizu.kohei@fujitsu.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).