From: Haris Okanovic <harisokn@amazon.com>
To: <linux-kernel@vger.kernel.org>, <linux-pm@vger.kernel.org>,
<linux-assembly@vger.kernel.org>
Cc: <peterz@infradead.org>, Haris Okanovic <harisokn@amazon.com>,
Ali Saidi <alisaidi@amazon.com>,
Geoff Blake <blakgeof@amazon.com>,
Brian Silver <silverbr@amazon.com>
Subject: [PATCH 3/3] arm64: cpuidle: Add arm_poll_idle
Date: Mon, 1 Apr 2024 20:47:06 -0500 [thread overview]
Message-ID: <20240402014706.3969151-3-harisokn@amazon.com> (raw)
In-Reply-To: <20240402014706.3969151-1-harisokn@amazon.com>
An arm64 cpuidle driver with two states: (1) First polls for new runable
tasks up to 100 us (by default) before (2) a wfi idle and awoken by
interrupt (the current arm64 behavior). It allows CPUs to return from
idle more quickly by avoiding the longer interrupt wakeup path, which
may require EL1/EL2 transition in certain VM scenarios.
Poll duration is optionally configured at load time via the poll_limit
module parameter.
The default 100 us duration was experimentally chosen, by measuring QPS
(queries per sec) of the MLPerf bert inference benchmark, which seems
particularly susceptible to this change; see procedure below. 100 us is
the inflection point where QPS stopped growing in a range of tested
values. All results are from AWS m7g.16xlarge instances (Graviton3 SoC)
with dedicated tenancy (dedicated hardware).
| before | 10us | 25us | 50us | 100us | 125us | 150us | 200us | 300us |
| 5.87 | 5.91 | 5.96 | 6.01 | 6.06 | 6.07 | 6.06 | 6.06 | 6.06 |
Perf's scheduler benchmarks also improve with a range of poll_limit
values >= 10 us. Higher limits produce near identical results within a
3% noise margin. The following tables are `perf bench sched` results,
run times in seconds.
`perf bench sched messaging -l 80000`
| AWS instance | SoC | Before | After | % Change |
| c6g.16xl (VM) | Graviton2 | 18.974 | 18.400 | none |
| c7g.16xl (VM) | Graviton3 | 13.852 | 13.859 | none |
| c6g.metal | Graviton2 | 17.621 | 16.744 | none |
| c7g.metal | Graviton3 | 13.430 | 13.404 | none |
`perf bench sched pipe -l 2500000`
| AWS instance | SoC | Before | After | % Change |
| c6g.16xl (VM) | Graviton2 | 30.158 | 15.181 | -50% |
| c7g.16xl (VM) | Graviton3 | 18.289 | 12.067 | -34% |
| c6g.metal | Graviton2 | 17.609 | 15.170 | -14% |
| c7g.metal | Graviton3 | 14.103 | 12.304 | -13% |
`perf bench sched seccomp-notify -l 2500000`
| AWS instance | SoC | Before | After | % Change |
| c6g.16xl (VM) | Graviton2 | 28.784 | 13.754 | -52% |
| c7g.16xl (VM) | Graviton3 | 16.964 | 11.430 | -33% |
| c6g.metal | Graviton2 | 15.717 | 13.536 | -14% |
| c7g.metal | Graviton3 | 13.301 | 11.491 | -14% |
Steps to run MLPerf bert inference on Ubuntu 22.04:
sudo apt install build-essential python3 python3-pip
pip install "pybind11[global]" tensorflow transformers
export TF_ENABLE_ONEDNN_OPTS=1
export DNNL_DEFAULT_FPMATH_MODE=BF16
git clone https://github.com/mlcommons/inference.git --recursive
cd inference
git checkout v2.0
cd loadgen
CFLAGS="-std=c++14" python3 setup.py bdist_wheel
pip install dist/*.whl
cd ../language/bert
make setup
python3 run.py --backend=tf --scenario=SingleStream
Suggested-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Geoff Blake <blakgeof@amazon.com>
Cc: Brian Silver <silverbr@amazon.com>
Signed-off-by: Haris Okanovic <harisokn@amazon.com>
---
drivers/cpuidle/Kconfig.arm | 13 ++
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/cpuidle-arm-polling.c | 171 ++++++++++++++++++++++++++
3 files changed, 185 insertions(+)
create mode 100644 drivers/cpuidle/cpuidle-arm-polling.c
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index a1ee475d180d..484666dda38d 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -14,6 +14,19 @@ config ARM_CPUIDLE
initialized by calling the CPU operations init idle hook
provided by architecture code.
+config ARM_POLL_CPUIDLE
+ bool "ARM64 CPU idle Driver with polling"
+ depends on ARM64
+ depends on ARM_ARCH_TIMER_EVTSTREAM
+ select CPU_IDLE_MULTIPLE_DRIVERS
+ help
+ Select this to enable a polling cpuidle driver for ARM64:
+ The first state polls TIF_NEED_RESCHED for best latency on short
+ sleep intervals. The second state falls back to arch_cpu_idle() to
+ wait for interrupt. This is can be helpful in workloads that
+ frequently block/wake at short intervals or VMs where wakeup IPIs
+ are more expensive.
+
config ARM_PSCI_CPUIDLE
bool "PSCI CPU idle Driver"
depends on ARM_PSCI_FW
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index d103342b7cfc..23c21422792d 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o
obj-$(CONFIG_ARM_EXYNOS_CPUIDLE) += cpuidle-exynos.o
obj-$(CONFIG_ARM_CPUIDLE) += cpuidle-arm.o
+obj-$(CONFIG_ARM_POLL_CPUIDLE) += cpuidle-arm-polling.o
obj-$(CONFIG_ARM_PSCI_CPUIDLE) += cpuidle-psci.o
obj-$(CONFIG_ARM_PSCI_CPUIDLE_DOMAIN) += cpuidle-psci-domain.o
obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o
diff --git a/drivers/cpuidle/cpuidle-arm-polling.c b/drivers/cpuidle/cpuidle-arm-polling.c
new file mode 100644
index 000000000000..bca128568114
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-arm-polling.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM64 CPU idle driver using wfe polling
+ *
+ * Copyright 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
+ *
+ * Authors:
+ * Haris Okanovic <harisokn@amazon.com>
+ * Brian Silver <silverbr@amazon.com>
+ *
+ * Based on cpuidle-arm.c
+ * Copyright (C) 2014 ARM Ltd.
+ * Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
+ */
+
+#include <linux/cpu.h>
+#include <linux/cpu_cooling.h>
+#include <linux/cpuidle.h>
+#include <linux/sched/clock.h>
+
+#include <asm/cpuidle.h>
+#include <asm/readex.h>
+
+#include "dt_idle_states.h"
+
+/* Max duration of the wfe() poll loop in us, before transitioning to
+ * arch_cpu_idle()/wfi() sleep.
+ */
+#define DEFAULT_POLL_LIMIT_US 100
+static unsigned int poll_limit __read_mostly = DEFAULT_POLL_LIMIT_US;
+
+/*
+ * arm_idle_wfe_poll - Polls state in wfe loop until reschedule is
+ * needed or timeout
+ */
+static int __cpuidle arm_idle_wfe_poll(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ u64 time_start, time_limit;
+
+ time_start = local_clock();
+ dev->poll_time_limit = false;
+
+ local_irq_enable();
+
+ if (current_set_polling_and_test())
+ goto end;
+
+ time_limit = cpuidle_poll_time(drv, dev);
+
+ do {
+ // exclusive read arms the monitor for wfe
+ if (__READ_ONCE_EX(current_thread_info()->flags) & _TIF_NEED_RESCHED)
+ goto end;
+
+ // may exit prematurely, see ARM_ARCH_TIMER_EVTSTREAM
+ wfe();
+ } while (local_clock() - time_start < time_limit);
+
+ dev->poll_time_limit = true;
+
+end:
+ current_clr_polling();
+ return idx;
+}
+
+/*
+ * arm_idle_wfi - Places cpu in lower power state until interrupt,
+ * a fallback to polling
+ */
+static int __cpuidle arm_idle_wfi(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
+{
+ if (current_clr_polling_and_test()) {
+ local_irq_enable();
+ return idx;
+ }
+ arch_cpu_idle();
+ return idx;
+}
+
+static struct cpuidle_driver arm_poll_idle_driver __initdata = {
+ .name = "arm_poll_idle",
+ .owner = THIS_MODULE,
+ .states = {
+ {
+ .enter = arm_idle_wfe_poll,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .exit_latency_ns = 0,
+ .power_usage = UINT_MAX,
+ .flags = CPUIDLE_FLAG_POLLING,
+ .name = "WFE",
+ .desc = "ARM WFE",
+ },
+ {
+ .enter = arm_idle_wfi,
+ .exit_latency = DEFAULT_POLL_LIMIT_US,
+ .target_residency = DEFAULT_POLL_LIMIT_US,
+ .power_usage = UINT_MAX,
+ .name = "WFI",
+ .desc = "ARM WFI",
+ },
+ },
+ .state_count = 2,
+};
+
+/*
+ * arm_poll_init_cpu - Initializes arm cpuidle polling driver for one cpu
+ */
+static int __init arm_poll_init_cpu(int cpu)
+{
+ int ret;
+ struct cpuidle_driver *drv;
+
+ drv = kmemdup(&arm_poll_idle_driver, sizeof(*drv), GFP_KERNEL);
+ if (!drv)
+ return -ENOMEM;
+
+ drv->cpumask = (struct cpumask *)cpumask_of(cpu);
+ drv->states[1].exit_latency = poll_limit;
+ drv->states[1].target_residency = poll_limit;
+
+ ret = cpuidle_register(drv, NULL);
+ if (ret) {
+ pr_err("failed to register driver: %d, cpu %d\n", ret, cpu);
+ goto out_kfree_drv;
+ }
+
+ pr_info("registered driver cpu %d\n", cpu);
+
+ cpuidle_cooling_register(drv);
+
+ return 0;
+
+out_kfree_drv:
+ kfree(drv);
+ return ret;
+}
+
+/*
+ * arm_poll_init - Initializes arm cpuidle polling driver
+ */
+static int __init arm_poll_init(void)
+{
+ int cpu, ret;
+ struct cpuidle_driver *drv;
+ struct cpuidle_device *dev;
+
+ for_each_possible_cpu(cpu) {
+ ret = arm_poll_init_cpu(cpu);
+ if (ret)
+ goto out_fail;
+ }
+
+ return 0;
+
+out_fail:
+ pr_info("de-register all");
+ while (--cpu >= 0) {
+ dev = per_cpu(cpuidle_devices, cpu);
+ drv = cpuidle_get_cpu_driver(dev);
+ cpuidle_unregister(drv);
+ kfree(drv);
+ }
+
+ return ret;
+}
+
+module_param(poll_limit, uint, 0444);
+device_initcall(arm_poll_init);
--
2.34.1
next prev parent reply other threads:[~2024-04-02 1:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 1:47 [PATCH 1/3] arm64: Add TIF_POLLING_NRFLAG Haris Okanovic
2024-04-02 1:47 ` [PATCH 2/3] arm64: add __READ_ONCE_EX() Haris Okanovic
2024-04-02 16:48 ` Mark Rutland
2024-04-08 14:51 ` David Laight
2024-04-02 1:47 ` Haris Okanovic [this message]
2024-04-02 2:30 ` [PATCH 3/3] arm64: cpuidle: Add arm_poll_idle Okanovic, Haris
2024-04-02 17:23 ` Mark Rutland
2024-04-02 23:17 ` Ankur Arora
2024-04-05 19:36 ` Okanovic, Haris
2024-04-05 20:22 ` Ankur Arora
2024-04-05 20:05 ` Okanovic, Haris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240402014706.3969151-3-harisokn@amazon.com \
--to=harisokn@amazon.com \
--cc=alisaidi@amazon.com \
--cc=blakgeof@amazon.com \
--cc=linux-assembly@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=silverbr@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.