From: Andy Lutomirski <luto@kernel.org> To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, Keith Busch <keith.busch@intel.com> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Kai-Heng Feng <kai.heng.feng@canonical.com>, linux-nvme <linux-nvme@lists.infradead.org>, Andy Lutomirski <luto@kernel.org> Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state Date: Wed, 24 May 2017 15:06:30 -0700 [thread overview] Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw) In-Reply-To: <cover.1495663545.git.luto@kernel.org> In-Reply-To: <cover.1495663545.git.luto@kernel.org> This should at least make vendors less nervous about Linux's APST policy. I'm not aware of any concrete bugs it would fix (although I was hoping it would fix the Samsung/Dell quirk). Cc: stable@vger.kernel.org # v4.11 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Mario Limonciello <mario_limonciello@dell.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> --- drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d5e0906262ea..381e9f813385 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl) /* * APST (Autonomous Power State Transition) lets us program a * table of power state transitions that the controller will - * perform automatically. We configure it with a simple - * heuristic: we are willing to spend at most 2% of the time - * transitioning between power states. Therefore, when running - * in any given state, we will enter the next lower-power - * non-operational state after waiting 50 * (enlat + exlat) - * microseconds, as long as that state's total latency is under - * the requested maximum latency. + * perform automatically. * * We will not autonomously enter any non-operational state for * which the total latency exceeds ps_max_latency_us. Users @@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl) /* * This state is good. Use it as the APST idle * target for higher power states. + * + * Intel RSTe supposedly uses the following algorithm: + * 60ms delay to transition to the first + * non-operational state and 1000*exlat to each + * additional state. This is problematic. 60ms is + * too short if the first non-operational state has + * high latency, and 1000*exlat into a state is + * absurdly slow. (exlat=22ms seems typical for the + * deepest state. A delay of 22 seconds to enter that + * state means that it will almost never be entered at + * all, wasting power and, worse, turning otherwise + * easy-to-detect hardware/firmware bugs into sporadic + * problems. + * + * Linux is willing to spend at most 2% of the time + * transitioning between power states. Therefore, + * when running in any given state, we will enter the + * next lower-power non-operational state after + * waiting 50 * (enlat + exlat) microseconds, as long + * as that state's total latency is under the + * requested maximum latency. */ transition_ms = total_latency_us + 19; do_div(transition_ms, 20); + + /* + * Some vendors have expressed nervousness about + * entering the deepest state after less than six + * seconds. + */ + if (state == ctrl->npss && transition_ms < 6000) + transition_ms = 6000; + if (transition_ms > (1 << 24) - 1) transition_ms = (1 << 24) - 1; -- 2.9.4
WARNING: multiple messages have this Message-ID (diff)
From: luto@kernel.org (Andy Lutomirski) Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state Date: Wed, 24 May 2017 15:06:30 -0700 [thread overview] Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw) In-Reply-To: <cover.1495663545.git.luto@kernel.org> This should at least make vendors less nervous about Linux's APST policy. I'm not aware of any concrete bugs it would fix (although I was hoping it would fix the Samsung/Dell quirk). Cc: stable at vger.kernel.org # v4.11 Cc: Kai-Heng Feng <kai.heng.feng at canonical.com> Cc: Mario Limonciello <mario_limonciello at dell.com> Signed-off-by: Andy Lutomirski <luto at kernel.org> --- drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d5e0906262ea..381e9f813385 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl) /* * APST (Autonomous Power State Transition) lets us program a * table of power state transitions that the controller will - * perform automatically. We configure it with a simple - * heuristic: we are willing to spend at most 2% of the time - * transitioning between power states. Therefore, when running - * in any given state, we will enter the next lower-power - * non-operational state after waiting 50 * (enlat + exlat) - * microseconds, as long as that state's total latency is under - * the requested maximum latency. + * perform automatically. * * We will not autonomously enter any non-operational state for * which the total latency exceeds ps_max_latency_us. Users @@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl) /* * This state is good. Use it as the APST idle * target for higher power states. + * + * Intel RSTe supposedly uses the following algorithm: + * 60ms delay to transition to the first + * non-operational state and 1000*exlat to each + * additional state. This is problematic. 60ms is + * too short if the first non-operational state has + * high latency, and 1000*exlat into a state is + * absurdly slow. (exlat=22ms seems typical for the + * deepest state. A delay of 22 seconds to enter that + * state means that it will almost never be entered at + * all, wasting power and, worse, turning otherwise + * easy-to-detect hardware/firmware bugs into sporadic + * problems. + * + * Linux is willing to spend at most 2% of the time + * transitioning between power states. Therefore, + * when running in any given state, we will enter the + * next lower-power non-operational state after + * waiting 50 * (enlat + exlat) microseconds, as long + * as that state's total latency is under the + * requested maximum latency. */ transition_ms = total_latency_us + 19; do_div(transition_ms, 20); + + /* + * Some vendors have expressed nervousness about + * entering the deepest state after less than six + * seconds. + */ + if (state == ctrl->npss && transition_ms < 6000) + transition_ms = 6000; + if (transition_ms > (1 << 24) - 1) transition_ms = (1 << 24) - 1; -- 2.9.4
next prev parent reply other threads:[~2017-05-24 22:07 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-05-24 22:06 [PATCH 0/2] nvme APST fixes Andy Lutomirski 2017-05-24 22:06 ` Andy Lutomirski 2017-05-24 22:06 ` Andy Lutomirski [this message] 2017-05-24 22:06 ` [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state Andy Lutomirski 2017-05-26 8:52 ` Christoph Hellwig 2017-05-26 8:52 ` Christoph Hellwig 2017-05-27 16:08 ` Andy Lutomirski 2017-05-27 16:08 ` Andy Lutomirski 2017-05-27 16:14 ` Linus Torvalds 2017-05-27 16:14 ` Linus Torvalds 2017-05-24 22:06 ` [PATCH 2/2] nvme: Quirk APST on Intel 600P/P3100 devices Andy Lutomirski 2017-05-24 22:06 ` Andy Lutomirski 2017-05-26 8:52 ` Christoph Hellwig 2017-05-26 8:52 ` Christoph Hellwig 2017-05-30 15:35 ` Keith Busch 2017-05-30 15:35 ` Keith Busch 2017-05-31 13:54 ` Andy Lutomirski 2017-05-31 13:54 ` Andy Lutomirski 2017-06-02 7:23 ` Nicholas Sielicki
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org \ --to=luto@kernel.org \ --cc=axboe@kernel.dk \ --cc=hch@lst.de \ --cc=kai.heng.feng@canonical.com \ --cc=keith.busch@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=sagi@grimberg.me \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.