All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	Keith Busch <keith.busch@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-nvme <linux-nvme@lists.infradead.org>,
	Andy Lutomirski <luto@kernel.org>
Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state
Date: Wed, 24 May 2017 15:06:30 -0700	[thread overview]
Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1495663545.git.luto@kernel.org>
In-Reply-To: <cover.1495663545.git.luto@kernel.org>

This should at least make vendors less nervous about Linux's APST
policy.  I'm not aware of any concrete bugs it would fix (although I
was hoping it would fix the Samsung/Dell quirk).

Cc: stable@vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..381e9f813385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			/*
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
+			 *
+			 * Intel RSTe supposedly uses the following algorithm:
+			 * 60ms delay to transition to the first
+			 * non-operational state and 1000*exlat to each
+			 * additional state.  This is problematic.  60ms is
+			 * too short if the first non-operational state has
+			 * high latency, and 1000*exlat into a state is
+			 * absurdly slow.  (exlat=22ms seems typical for the
+			 * deepest state.  A delay of 22 seconds to enter that
+			 * state means that it will almost never be entered at
+			 * all, wasting power and, worse, turning otherwise
+			 * easy-to-detect hardware/firmware bugs into sporadic
+			 * problems.
+			 *
+			 * Linux is willing to spend at most 2% of the time
+			 * transitioning between power states.  Therefore,
+			 * when running in any given state, we will enter the
+			 * next lower-power non-operational state after
+			 * waiting 50 * (enlat + exlat) microseconds, as long
+			 * as that state's total latency is under the
+			 * requested maximum latency.
 			 */
 			transition_ms = total_latency_us + 19;
 			do_div(transition_ms, 20);
+
+			/*
+			 * Some vendors have expressed nervousness about
+			 * entering the deepest state after less than six
+			 * seconds.
+			 */
+			if (state == ctrl->npss && transition_ms < 6000)
+				transition_ms = 6000;
+
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.4

WARNING: multiple messages have this Message-ID (diff)
From: luto@kernel.org (Andy Lutomirski)
Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state
Date: Wed, 24 May 2017 15:06:30 -0700	[thread overview]
Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1495663545.git.luto@kernel.org>

This should at least make vendors less nervous about Linux's APST
policy.  I'm not aware of any concrete bugs it would fix (although I
was hoping it would fix the Samsung/Dell quirk).

Cc: stable at vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng at canonical.com>
Cc: Mario Limonciello <mario_limonciello at dell.com>
Signed-off-by: Andy Lutomirski <luto at kernel.org>
---
 drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..381e9f813385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			/*
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
+			 *
+			 * Intel RSTe supposedly uses the following algorithm:
+			 * 60ms delay to transition to the first
+			 * non-operational state and 1000*exlat to each
+			 * additional state.  This is problematic.  60ms is
+			 * too short if the first non-operational state has
+			 * high latency, and 1000*exlat into a state is
+			 * absurdly slow.  (exlat=22ms seems typical for the
+			 * deepest state.  A delay of 22 seconds to enter that
+			 * state means that it will almost never be entered at
+			 * all, wasting power and, worse, turning otherwise
+			 * easy-to-detect hardware/firmware bugs into sporadic
+			 * problems.
+			 *
+			 * Linux is willing to spend at most 2% of the time
+			 * transitioning between power states.  Therefore,
+			 * when running in any given state, we will enter the
+			 * next lower-power non-operational state after
+			 * waiting 50 * (enlat + exlat) microseconds, as long
+			 * as that state's total latency is under the
+			 * requested maximum latency.
 			 */
 			transition_ms = total_latency_us + 19;
 			do_div(transition_ms, 20);
+
+			/*
+			 * Some vendors have expressed nervousness about
+			 * entering the deepest state after less than six
+			 * seconds.
+			 */
+			if (state == ctrl->npss && transition_ms < 6000)
+				transition_ms = 6000;
+
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.4

  reply	other threads:[~2017-05-24 22:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-24 22:06 [PATCH 0/2] nvme APST fixes Andy Lutomirski
2017-05-24 22:06 ` Andy Lutomirski
2017-05-24 22:06 ` Andy Lutomirski [this message]
2017-05-24 22:06   ` [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state Andy Lutomirski
2017-05-26  8:52   ` Christoph Hellwig
2017-05-26  8:52     ` Christoph Hellwig
2017-05-27 16:08     ` Andy Lutomirski
2017-05-27 16:08       ` Andy Lutomirski
2017-05-27 16:14       ` Linus Torvalds
2017-05-27 16:14         ` Linus Torvalds
2017-05-24 22:06 ` [PATCH 2/2] nvme: Quirk APST on Intel 600P/P3100 devices Andy Lutomirski
2017-05-24 22:06   ` Andy Lutomirski
2017-05-26  8:52   ` Christoph Hellwig
2017-05-26  8:52     ` Christoph Hellwig
2017-05-30 15:35   ` Keith Busch
2017-05-30 15:35     ` Keith Busch
2017-05-31 13:54     ` Andy Lutomirski
2017-05-31 13:54       ` Andy Lutomirski
2017-06-02  7:23       ` Nicholas Sielicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org \
    --to=luto@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.