All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	Keith Busch <keith.busch@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-nvme <linux-nvme@lists.infradead.org>,
	Andy Lutomirski <luto@kernel.org>
Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state
Date: Wed, 24 May 2017 15:06:30 -0700	[thread overview]
Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1495663545.git.luto@kernel.org>
In-Reply-To: <cover.1495663545.git.luto@kernel.org>

This should at least make vendors less nervous about Linux's APST
policy.  I'm not aware of any concrete bugs it would fix (although I
was hoping it would fix the Samsung/Dell quirk).

Cc: stable@vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..381e9f813385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			/*
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
+			 *
+			 * Intel RSTe supposedly uses the following algorithm:
+			 * 60ms delay to transition to the first
+			 * non-operational state and 1000*exlat to each
+			 * additional state.  This is problematic.  60ms is
+			 * too short if the first non-operational state has
+			 * high latency, and 1000*exlat into a state is
+			 * absurdly slow.  (exlat=22ms seems typical for the
+			 * deepest state.  A delay of 22 seconds to enter that
+			 * state means that it will almost never be entered at
+			 * all, wasting power and, worse, turning otherwise
+			 * easy-to-detect hardware/firmware bugs into sporadic
+			 * problems.
+			 *
+			 * Linux is willing to spend at most 2% of the time
+			 * transitioning between power states.  Therefore,
+			 * when running in any given state, we will enter the
+			 * next lower-power non-operational state after
+			 * waiting 50 * (enlat + exlat) microseconds, as long
+			 * as that state's total latency is under the
+			 * requested maximum latency.
 			 */
 			transition_ms = total_latency_us + 19;
 			do_div(transition_ms, 20);
+
+			/*
+			 * Some vendors have expressed nervousness about
+			 * entering the deepest state after less than six
+			 * seconds.
+			 */
+			if (state == ctrl->npss && transition_ms < 6000)
+				transition_ms = 6000;
+
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.4

WARNING: multiple messages have this Message-ID
From: luto@kernel.org (Andy Lutomirski)
Subject: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state
Date: Wed, 24 May 2017 15:06:30 -0700	[thread overview]
Message-ID: <6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1495663545.git.luto@kernel.org>

This should at least make vendors less nervous about Linux's APST
policy.  I'm not aware of any concrete bugs it would fix (although I
was hoping it would fix the Samsung/Dell quirk).

Cc: stable at vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng at canonical.com>
Cc: Mario Limonciello <mario_limonciello at dell.com>
Signed-off-by: Andy Lutomirski <luto at kernel.org>
---
 drivers/nvme/host/core.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d5e0906262ea..381e9f813385 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1325,13 +1325,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1405,9 +1399,39 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			/*
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
+			 *
+			 * Intel RSTe supposedly uses the following algorithm:
+			 * 60ms delay to transition to the first
+			 * non-operational state and 1000*exlat to each
+			 * additional state.  This is problematic.  60ms is
+			 * too short if the first non-operational state has
+			 * high latency, and 1000*exlat into a state is
+			 * absurdly slow.  (exlat=22ms seems typical for the
+			 * deepest state.  A delay of 22 seconds to enter that
+			 * state means that it will almost never be entered at
+			 * all, wasting power and, worse, turning otherwise
+			 * easy-to-detect hardware/firmware bugs into sporadic
+			 * problems.
+			 *
+			 * Linux is willing to spend at most 2% of the time
+			 * transitioning between power states.  Therefore,
+			 * when running in any given state, we will enter the
+			 * next lower-power non-operational state after
+			 * waiting 50 * (enlat + exlat) microseconds, as long
+			 * as that state's total latency is under the
+			 * requested maximum latency.
 			 */
 			transition_ms = total_latency_us + 19;
 			do_div(transition_ms, 20);
+
+			/*
+			 * Some vendors have expressed nervousness about
+			 * entering the deepest state after less than six
+			 * seconds.
+			 */
+			if (state == ctrl->npss && transition_ms < 6000)
+				transition_ms = 6000;
+
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.4

  reply	other threads:[~2017-05-24 22:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-24 22:06 [PATCH 0/2] nvme APST fixes Andy Lutomirski
2017-05-24 22:06 ` Andy Lutomirski
2017-05-24 22:06 ` Andy Lutomirski [this message]
2017-05-24 22:06   ` [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state Andy Lutomirski
2017-05-26  8:52   ` Christoph Hellwig
2017-05-26  8:52     ` Christoph Hellwig
2017-05-27 16:08     ` Andy Lutomirski
2017-05-27 16:08       ` Andy Lutomirski
2017-05-27 16:14       ` Linus Torvalds
2017-05-27 16:14         ` Linus Torvalds
2017-05-24 22:06 ` [PATCH 2/2] nvme: Quirk APST on Intel 600P/P3100 devices Andy Lutomirski
2017-05-24 22:06   ` Andy Lutomirski
2017-05-26  8:52   ` Christoph Hellwig
2017-05-26  8:52     ` Christoph Hellwig
2017-05-30 15:35   ` Keith Busch
2017-05-30 15:35     ` Keith Busch
2017-05-31 13:54     ` Andy Lutomirski
2017-05-31 13:54       ` Andy Lutomirski
2017-06-02  7:23       ` Nicholas Sielicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6760ae9459ba19657f8009a9231b97a71114a1e5.1495663545.git.luto@kernel.org \
    --to=luto@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --subject='Re: [PATCH 1/2] nvme: Wait at least 6000ms before entering the deepest idle state' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.