linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
@ 2017-05-12  4:06 Andy Lutomirski
  2017-05-12 13:58 ` Mario.Limonciello
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-12  4:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, Kai-Heng Feng, linux-nvme, Christoph Hellwig,
	Sagi Grimberg, Keith Busch, Mario Limonciello, Andy Lutomirski

It seems like RSTe is much more conservative with transition timing
that we are.  According to Mario, RSTe programs APST to transition from
active states to the first idle state after 60ms and, thereafter, to
1000 * the exit latency of the target state.

This is IMO a terrible policy.  On my Samsung 950, RSTe would set a
transition time of 22 seconds to the deepest state.  I suspect that
this means that the deepest state will essentially never be used in
practice.  Unfortunately, since RSTe does this, it's what the
manufacturers tested, and hardware or firmware bugs seem to have crept
in.  I'm having a bit of trouble imagining what would go wrong if the
hardware goes to sleep after merely one second that isn't a problem
after 22 seconds.  Maybe some PLLs are extremely slow to stabilize and
the firmware doesn't have appropriate safeguards, or maybe there are
power-fail caps that behave oddly when charged and dischaged too
quickly, but this is just idle speculation on my part.

This patch is a bit more conservative than RSTe's algorithm.  I think
that, if the first idle state has total latency that's too large (even
10ms, say), then 60ms is too fast and risks hurting performance
excessively.  I set the latency to the greater of 60ms and 50 *
(enlat+exlat).

Some testing reports suggest that this will fix the issues we've
seen on Dell laptops.

Cc: stable@vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Jens et all, I think this might be reasonable to apply for 4.12.

 drivers/nvme/host/core.c | 62 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5249027a76ca..fea682db2176 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1267,13 +1267,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 	/*
 	 * APST (Autonomous Power State Transition) lets us program a
 	 * table of power state transitions that the controller will
-	 * perform automatically.  We configure it with a simple
-	 * heuristic: we are willing to spend at most 2% of the time
-	 * transitioning between power states.  Therefore, when running
-	 * in any given state, we will enter the next lower-power
-	 * non-operational state after waiting 50 * (enlat + exlat)
-	 * microseconds, as long as that state's total latency is under
-	 * the requested maximum latency.
+	 * perform automatically.
 	 *
 	 * We will not autonomously enter any non-operational state for
 	 * which the total latency exceeds ps_max_latency_us.  Users
@@ -1282,6 +1276,7 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 
 	unsigned apste;
 	struct nvme_feat_auto_pst *table;
+	int first_idle_state = -1;
 	u64 max_lat_us = 0;
 	int max_ps = -1;
 	int ret;
@@ -1311,10 +1306,23 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 		int state;
 
 		/*
+		 * Find the first non-operational state.  We'll need it
+		 * for the idle time calculation.  NPSS, despite the
+		 * name, is the index of the lowest-power state, not the
+		 * number of states.
+		 */
+		for (state = 0; state <= (int)ctrl->npss; state++) {
+			if (ctrl->psd[state].flags &
+			    NVME_PS_FLAGS_NON_OP_STATE) {
+				first_idle_state = state;
+				break;
+			}
+		}
+
+		/*
 		 * Walk through all states from lowest- to highest-power.
 		 * According to the spec, lower-numbered states use more
-		 * power.  NPSS, despite the name, is the index of the
-		 * lowest-power state, not the number of states.
+		 * power.
 		 */
 		for (state = (int)ctrl->npss; state >= 0; state--) {
 			u64 total_latency_us, transition_ms;
@@ -1348,8 +1356,40 @@ static void nvme_configure_apst(struct nvme_ctrl *ctrl)
 			 * This state is good.  Use it as the APST idle
 			 * target for higher power states.
 			 */
-			transition_ms = total_latency_us + 19;
-			do_div(transition_ms, 20);
+			if (state == first_idle_state) {
+				/*
+				 * Intel RSTe supposedly uses 60ms for the
+				 * first idle state regardless of its latency.
+				 * This seems overly aggressive -- if the
+				 * first idle state is slow, we could spend
+				 * an excessive fraction of the time
+				 * transitioning back and forth.
+				 *
+				 * Instead, use 50 * (enlat + exlat) so
+				 * we spend at most 2% of the time
+				 * transitioning between power states.
+				 */
+				transition_ms = total_latency_us + 19;
+				do_div(transition_ms, 20);
+
+				/* Don't be more aggressive than Intel RSTe. */
+				if (transition_ms < 60)
+					transition_ms = 60;
+			} else {
+				/*
+				 * Intel RSTe supposedly sets the
+				 * transition time to states after the
+				 * first to 1000 * exlat.  This seems quite
+				 * conservative, but it also seems that vendors
+				 * don't test their hardware with more
+				 * aggressive settings.  (The factor of 1000
+				 * is implicit: exlat is in microsections.)
+				 */
+				transition_ms =
+					le32_to_cpu(ctrl->psd[state].exit_lat);
+			}
+
+			/* Clamp to the maximum time allowed by the spec. */
 			if (transition_ms > (1 << 24) - 1)
 				transition_ms = (1 << 24) - 1;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12  4:06 [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe Andy Lutomirski
@ 2017-05-12 13:58 ` Mario.Limonciello
  2017-05-12 14:04   ` Andy Lutomirski
  2017-05-13 12:27 ` Andy Lutomirski
  2017-05-19  1:13 ` Andy Lutomirski
  2 siblings, 1 reply; 17+ messages in thread
From: Mario.Limonciello @ 2017-05-12 13:58 UTC (permalink / raw)
  To: luto, axboe
  Cc: linux-kernel, kai.heng.feng, linux-nvme, hch, sagi, keith.busch

> Some testing reports suggest that this will fix the issues we've
> seen on Dell laptops.

It think it also makes sense to revert the quirk that was created based upon the previous aggressiveness of re-entry to PS4 on those machines.  Are you expecting to split that up into a second patch also targeted at 4.11 stable and 4.12, but later after some more testing?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12 13:58 ` Mario.Limonciello
@ 2017-05-12 14:04   ` Andy Lutomirski
  2017-05-12 14:34     ` Mario.Limonciello
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-12 14:04 UTC (permalink / raw)
  To: Mario.Limonciello
  Cc: Andrew Lutomirski, Jens Axboe, linux-kernel, Kai-Heng Feng,
	linux-nvme, Christoph Hellwig, Sagi Grimberg, Keith Busch

On Fri, May 12, 2017 at 6:58 AM,  <Mario.Limonciello@dell.com> wrote:
>> Some testing reports suggest that this will fix the issues we've
>> seen on Dell laptops.
>
> It think it also makes sense to revert the quirk that was created based upon the previous aggressiveness of re-entry to PS4 on those machines.  Are you expecting to split that up into a second patch also targeted at 4.11 stable and 4.12, but later after some more testing?

Yes, mostly.  I've written the patch, but I was planning to target it
at 4.12 or 4.13 but not -stable.  It's mostly just a cleanup and has
no real power saving benefit since the RSTe timeouts are so absurdly
conservative that I doubt PS4 will happen in practical usage.  Perhaps
in suspend-to-idle?  (For suspend-to-idle, I suspect we should really
be using D3 instead.  Do we already do that?)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12 14:04   ` Andy Lutomirski
@ 2017-05-12 14:34     ` Mario.Limonciello
  2017-05-12 23:49       ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Mario.Limonciello @ 2017-05-12 14:34 UTC (permalink / raw)
  To: luto
  Cc: axboe, linux-kernel, kai.heng.feng, linux-nvme, hch, sagi, keith.busch

>Yes, mostly.  I've written the patch, but I was planning to target it
>at 4.12 or 4.13 but not -stable.  It's mostly just a cleanup and has
>no real power saving benefit since the RSTe timeouts are so absurdly
>conservative that I doubt PS4 will happen in practical usage.  
OK.

>Perhaps
>in suspend-to-idle?  (For suspend-to-idle, I suspect we should really
>be using D3 instead.  Do we already do that?)

Well I think this will depend upon what the SSD "will" support.  
There isn't good documentation for how Linux "should" handle 
suspend-to-idle with disks yet, so the best you can follow is 
what Microsoft publicly mentions for Modern Standby. [1]

"Akin to AHCI PCIe SSDs, NVMe SSDs need to provide the host with a 
non-operational power state that is comparable to DEVSLP (<5mW draw, 
<100ms exit latency) in order to allow the host to perform appropriate
transitions into Modern Standby. Should the NVMe SSD not expose 
such a non-operational power state, autonomous power state 
transitions (APST) is the only other option to enter Modern 
Standby successfully.

Note that in the absence of DEVSLP or a comparable NVMe 
non-operational power state, the host can make no guarantees 
on the device’s power draw. In this case, if you observe 
non-optimal power consumption by the device/system, you 
will have to work with your device vendor to determine the cause."

Something important to consider though is that Microsoft keeps
more of the OS actually running during Modern Standby but
has deep control of what applications and kernel threads are
allowed to do.  For Linux I think that D3 is most likely going to
provide the best power for these disks.

[1] https://msdn.microsoft.com/en-us/windows/hardware/commercialize/design/device-experiences/part-selection#ssd-storage

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12 14:34     ` Mario.Limonciello
@ 2017-05-12 23:49       ` Andy Lutomirski
  2017-05-15 15:51         ` Mario.Limonciello
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-12 23:49 UTC (permalink / raw)
  To: Mario.Limonciello
  Cc: Andrew Lutomirski, Jens Axboe, linux-kernel, Kai-Heng Feng,
	linux-nvme, Christoph Hellwig, Sagi Grimberg, Keith Busch

On Fri, May 12, 2017 at 7:34 AM,  <Mario.Limonciello@dell.com> wrote:
>>Yes, mostly.  I've written the patch, but I was planning to target it
>>at 4.12 or 4.13 but not -stable.  It's mostly just a cleanup and has
>>no real power saving benefit since the RSTe timeouts are so absurdly
>>conservative that I doubt PS4 will happen in practical usage.
> OK.
>
>>Perhaps
>>in suspend-to-idle?  (For suspend-to-idle, I suspect we should really
>>be using D3 instead.  Do we already do that?)
>
> Well I think this will depend upon what the SSD "will" support.

I thought it was basically impossible for the SSD to fail to support
D3.  Afterall, cutting the power due to runtime D3 is more or less the
same thing (from the SSD's POV) as cutting the power during S3 or full
power off.

I've contemplated adding runtime D3 support to the nvme driver, but it
would probably be barely a win and quite slow.

When Linux suspends-to-idle, does it call driver suspend callbacks and
kick devices into D3?  It should work, but I don't know what actually
happens.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12  4:06 [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe Andy Lutomirski
  2017-05-12 13:58 ` Mario.Limonciello
@ 2017-05-13 12:27 ` Andy Lutomirski
  2017-05-15 16:11   ` Mario.Limonciello
  2017-05-19  1:13 ` Andy Lutomirski
  2 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-13 12:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jens Axboe, linux-kernel, Kai-Heng Feng, linux-nvme,
	Christoph Hellwig, Sagi Grimberg, Keith Busch, Mario Limonciello

On Thu, May 11, 2017 at 9:06 PM, Andy Lutomirski <luto@kernel.org> wrote:
> It seems like RSTe is much more conservative with transition timing
> that we are.  According to Mario, RSTe programs APST to transition from
> active states to the first idle state after 60ms and, thereafter, to
> 1000 * the exit latency of the target state.

Bad news, folks: this appears to be merely more stable, not all the way stable:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184/comments/65

I maintain my hypothesis that no one ever validated these disks and
that the very conservative parameters set by RSTe merely make it rare
to trigger the bug.  But maybe something else is going on.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12 23:49       ` Andy Lutomirski
@ 2017-05-15 15:51         ` Mario.Limonciello
  0 siblings, 0 replies; 17+ messages in thread
From: Mario.Limonciello @ 2017-05-15 15:51 UTC (permalink / raw)
  To: luto
  Cc: axboe, linux-kernel, kai.heng.feng, linux-nvme, hch, sagi, keith.busch

> On Fri, May 12, 2017 at 7:34 AM,  <Mario.Limonciello@dell.com> wrote:
> >>Yes, mostly.  I've written the patch, but I was planning to target it
> >>at 4.12 or 4.13 but not -stable.  It's mostly just a cleanup and has
> >>no real power saving benefit since the RSTe timeouts are so absurdly
> >>conservative that I doubt PS4 will happen in practical usage.
> > OK.
> >
> >>Perhaps
> >>in suspend-to-idle?  (For suspend-to-idle, I suspect we should really
> >>be using D3 instead.  Do we already do that?)
> >
> > Well I think this will depend upon what the SSD "will" support.
> 
> I thought it was basically impossible for the SSD to fail to support
> D3.  Afterall, cutting the power due to runtime D3 is more or less the
> same thing (from the SSD's POV) as cutting the power during S3 or full
> power off.
> 
You need to have an extra load switch in the system for this to work with
RTD3.

At least today - no Dell systems put the drive into RTD3 when going into
Modern Standby.  It's all PS4.

> I've contemplated adding runtime D3 support to the nvme driver, but it
> would probably be barely a win and quite slow.
> 
> When Linux suspends-to-idle, does it call driver suspend callbacks and
> kick devices into D3?  It should work, but I don't know what actually
> happens.

There are callback that happen for drivers.  It doesn't need to be a dedicated
callback specifically for freeze (what's invoked with S2I).
For example Qualcomm WLAN driver when the client disconnects from the 
AP will put the card into RTD3.  That will happen while going down to S2I.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-13 12:27 ` Andy Lutomirski
@ 2017-05-15 16:11   ` Mario.Limonciello
  2017-05-19  1:18     ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Mario.Limonciello @ 2017-05-15 16:11 UTC (permalink / raw)
  To: luto
  Cc: axboe, linux-kernel, kai.heng.feng, linux-nvme, hch, sagi, keith.busch

> -----Original Message-----
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Saturday, May 13, 2017 7:28 AM
> To: Andy Lutomirski <luto@kernel.org>
> Cc: Jens Axboe <axboe@kernel.dk>; linux-kernel@vger.kernel.org; Kai-Heng Feng
> <kai.heng.feng@canonical.com>; linux-nvme <linux-nvme@lists.infradead.org>;
> Christoph Hellwig <hch@lst.de>; Sagi Grimberg <sagi@grimberg.me>; Keith Busch
> <keith.busch@intel.com>; Limonciello, Mario <Mario_Limonciello@Dell.com>
> Subject: Re: [PATCH] nvme: Change our APST table to be no more aggressive than
> Intel RSTe
> 
> On Thu, May 11, 2017 at 9:06 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > It seems like RSTe is much more conservative with transition timing
> > that we are.  According to Mario, RSTe programs APST to transition from
> > active states to the first idle state after 60ms and, thereafter, to
> > 1000 * the exit latency of the target state.
> 
> Bad news, folks: this appears to be merely more stable, not all the way stable:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184/comments/65
> 
> I maintain my hypothesis that no one ever validated these disks and
> that the very conservative parameters set by RSTe merely make it rare
> to trigger the bug.  But maybe something else is going on.

This is really unfortunate to hear.  I think the conservative parameters set
by Intel are still best though.

I've been talking to folks about this.  There has been mentions to a possible 
signal integrity issue specifically on the quirked Dell systems and how it 
relates to this.  The current (working) theory is that when the drive is in PS4 
and is supposed to transition back that crosstalk causes problems with the 
link negotiation and thus fails.
So there's two possible ways I see to approach solving this (from Linux side):

1) Keep quirking those systems from going into PS4.
This isn't ideal as the jump to PS4 gets you the most power savings, but of course
stable system > power savings

2) Quirk those systems to redo link negotiation a few times if it fails
I don't know if this is actually possible.  Where is link negotiation invoked?

If our partners come up with a way to solve this from drive firmware though 
I'll let this group know.

Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-12  4:06 [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe Andy Lutomirski
  2017-05-12 13:58 ` Mario.Limonciello
  2017-05-13 12:27 ` Andy Lutomirski
@ 2017-05-19  1:13 ` Andy Lutomirski
  2017-05-19  1:32   ` Mario.Limonciello
  2017-05-19  6:35   ` Christoph Hellwig
  2 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-19  1:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jens Axboe, linux-kernel, Kai-Heng Feng, linux-nvme,
	Christoph Hellwig, Sagi Grimberg, Keith Busch, Mario Limonciello

On Thu, May 11, 2017 at 9:06 PM, Andy Lutomirski <luto@kernel.org> wrote:
> It seems like RSTe is much more conservative with transition timing
> that we are.  According to Mario, RSTe programs APST to transition from
> active states to the first idle state after 60ms and, thereafter, to
> 1000 * the exit latency of the target state.
>

I pondered this a bit, and I want to NAK my own patch.  This patch
stinks -- there's mounting evidence that what it really does is to
make any problems show up more rarely.  If a system is broken, I want
it to be obviously broken.

Here are two options to move forward:

a) Leave the Dell quirk in place until someone from Dell or Samsung
figures out what's actually going on.  Add a blanket quirk turning off
the deepest sleep state on all Intel devices [1] at least until
someone from Intel figures out what's going on -- Hi, Keith!  Deal
with any other problems as they're reported.

b) Turn off the deepest state across the board and add a whitelist.
Populate the whitelist a bit.  The problem is that I don't even know
what to whitelist.  My system works great, but does that mean that my
particular laptop is fine?  My particular disk is certainly *not* fine
when installed in other laptops.

Ideas?  (a) is a bit simpler to implement, I think, and may be good enough.

[1] There are problems on Intel NUC machines with Intel SSDs, for
crying out loud.  I realize that the team that designs the NUC is
probably totally unrelated to the SSD team, but they're both Intel and
it shouldn't be *that* hard for someone at Intel to get it debugged.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1686592

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-15 16:11   ` Mario.Limonciello
@ 2017-05-19  1:18     ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-19  1:18 UTC (permalink / raw)
  To: Mario.Limonciello, Bjorn Helgaas
  Cc: Andrew Lutomirski, Jens Axboe, linux-kernel, Kai-Heng Feng,
	linux-nvme, Christoph Hellwig, Sagi Grimberg, Keith Busch

On Mon, May 15, 2017 at 9:11 AM,  <Mario.Limonciello@dell.com> wrote:
>> -----Original Message-----
>> From: Andy Lutomirski [mailto:luto@kernel.org]
>> Sent: Saturday, May 13, 2017 7:28 AM
>> To: Andy Lutomirski <luto@kernel.org>
>> Cc: Jens Axboe <axboe@kernel.dk>; linux-kernel@vger.kernel.org; Kai-Heng Feng
>> <kai.heng.feng@canonical.com>; linux-nvme <linux-nvme@lists.infradead.org>;
>> Christoph Hellwig <hch@lst.de>; Sagi Grimberg <sagi@grimberg.me>; Keith Busch
>> <keith.busch@intel.com>; Limonciello, Mario <Mario_Limonciello@Dell.com>
>> Subject: Re: [PATCH] nvme: Change our APST table to be no more aggressive than
>> Intel RSTe
>>
>> On Thu, May 11, 2017 at 9:06 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > It seems like RSTe is much more conservative with transition timing
>> > that we are.  According to Mario, RSTe programs APST to transition from
>> > active states to the first idle state after 60ms and, thereafter, to
>> > 1000 * the exit latency of the target state.
>>
>> Bad news, folks: this appears to be merely more stable, not all the way stable:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184/comments/65
>>
>> I maintain my hypothesis that no one ever validated these disks and
>> that the very conservative parameters set by RSTe merely make it rare
>> to trigger the bug.  But maybe something else is going on.
>
> This is really unfortunate to hear.  I think the conservative parameters set
> by Intel are still best though.
>
> I've been talking to folks about this.  There has been mentions to a possible
> signal integrity issue specifically on the quirked Dell systems and how it
> relates to this.  The current (working) theory is that when the drive is in PS4
> and is supposed to transition back that crosstalk causes problems with the
> link negotiation and thus fails.
> So there's two possible ways I see to approach solving this (from Linux side):
>
> 1) Keep quirking those systems from going into PS4.
> This isn't ideal as the jump to PS4 gets you the most power savings, but of course
> stable system > power savings
>
> 2) Quirk those systems to redo link negotiation a few times if it fails
> I don't know if this is actually possible.  Where is link negotiation invoked?

Hi Bjorn-

As I understand it, we have a situation where some NVMe drives are
occasionally (due to signal quality issues or whatever) failing
"recovery".  I think this means that the disk is in an internal
low-power state and the PCIe link is in some L1 state and, when the
host and/or drive tries to wake up, the PCIe link fails to return to
L0.  Maybe I'm misunderstanding.

Is there some way to program the link to try harder?  Any other ideas?
 Are there better people to ask?

--Andy

>
> If our partners come up with a way to solve this from drive firmware though
> I'll let this group know.
>
> Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19  1:13 ` Andy Lutomirski
@ 2017-05-19  1:32   ` Mario.Limonciello
  2017-05-19  1:37     ` Andy Lutomirski
  2017-05-19  6:35   ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Mario.Limonciello @ 2017-05-19  1:32 UTC (permalink / raw)
  To: luto
  Cc: axboe, linux-kernel, kai.heng.feng, linux-nvme, hch, sagi, keith.busch

> I pondered this a bit, and I want to NAK my own patch.  This patch
> stinks -- there's mounting evidence that what it really does is to
> make any problems show up more rarely.  If a system is broken, I want
> it to be obviously broken.
>
> Here are two options to move forward:
>
> a) Leave the Dell quirk in place until someone from Dell or Samsung
> figures out what's actually going on.  Add a blanket quirk turning off
> the deepest sleep state on all Intel devices [1] at least until
> someone from Intel figures out what's going on -- Hi, Keith!  Deal
> with any other problems as they're reported.
>
> b) Turn off the deepest state across the board and add a whitelist.
> Populate the whitelist a bit.  The problem is that I don't even know
> what to whitelist.  My system works great, but does that mean that my
> particular laptop is fine?  My particular disk is certainly *not* fine
> when installed in other laptops.
> 
> Ideas?  (a) is a bit simpler to implement, I think, and may be good enough.
>
Until we have a proper solution for XPS 9550/Precision 5510 I agree that quirk should stay in place.
Of those two options I think A is better.  There are lots of machines this patch has helped that haven't been mentioned in this content.  Please make sure that it's not too aggressive on ITPT for PS4 if you're not aligning to RST behavior (should at "least" be 6s).

Kai-Heng and the Canonical team have also been looking at a lot of SSD/machine combinations with these various patches.  I hope they can speak up on what they've been finding.

> [1] There are problems on Intel NUC machines with Intel SSDs, for
> crying out loud.  I realize that the team that designs the NUC is
> probably totally unrelated to the SSD team, but they're both Intel and
> it shouldn't be *that* hard for someone at Intel to get it debugged.
> See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1686592

I would second this and love if Intel could speak up here on what direction they recommend to bring the NVME driver for APST.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19  1:32   ` Mario.Limonciello
@ 2017-05-19  1:37     ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-19  1:37 UTC (permalink / raw)
  To: Mario.Limonciello
  Cc: Andrew Lutomirski, Jens Axboe, linux-kernel, Kai-Heng Feng,
	linux-nvme, Christoph Hellwig, Sagi Grimberg, Keith Busch

On Thu, May 18, 2017 at 6:32 PM,  <Mario.Limonciello@dell.com> wrote:
>> I pondered this a bit, and I want to NAK my own patch.  This patch
>> stinks -- there's mounting evidence that what it really does is to
>> make any problems show up more rarely.  If a system is broken, I want
>> it to be obviously broken.
>>
>> Here are two options to move forward:
>>
>> a) Leave the Dell quirk in place until someone from Dell or Samsung
>> figures out what's actually going on.  Add a blanket quirk turning off
>> the deepest sleep state on all Intel devices [1] at least until
>> someone from Intel figures out what's going on -- Hi, Keith!  Deal
>> with any other problems as they're reported.
>>
>> b) Turn off the deepest state across the board and add a whitelist.
>> Populate the whitelist a bit.  The problem is that I don't even know
>> what to whitelist.  My system works great, but does that mean that my
>> particular laptop is fine?  My particular disk is certainly *not* fine
>> when installed in other laptops.
>>
>> Ideas?  (a) is a bit simpler to implement, I think, and may be good enough.
>>
> Until we have a proper solution for XPS 9550/Precision 5510 I agree that quirk should stay in place.
> Of those two options I think A is better.  There are lots of machines this patch has helped that haven't been mentioned in this content.  Please make sure that it's not too aggressive on ITPT for PS4 if you're not aligning to RST behavior (should at "least" be 6s).

I can get on board for 6s minimum for the deepest state.  I don't want
to touch any rule that says "PS4" in the description because PS4 isn't
even guaranteed to be a sleep state, let alone the deepest state.

>
> Kai-Heng and the Canonical team have also been looking at a lot of SSD/machine combinations with these various patches.  I hope they can speak up on what they've been finding.
>
>> [1] There are problems on Intel NUC machines with Intel SSDs, for
>> crying out loud.  I realize that the team that designs the NUC is
>> probably totally unrelated to the SSD team, but they're both Intel and
>> it shouldn't be *that* hard for someone at Intel to get it debugged.
>> See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1686592
>
> I would second this and love if Intel could speak up here on what direction they recommend to bring the NVME driver for APST.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19  1:13 ` Andy Lutomirski
  2017-05-19  1:32   ` Mario.Limonciello
@ 2017-05-19  6:35   ` Christoph Hellwig
  2017-05-19 14:18     ` Keith Busch
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2017-05-19  6:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jens Axboe, Sagi Grimberg, linux-kernel, linux-nvme, Keith Busch,
	Kai-Heng Feng, Mario Limonciello, Christoph Hellwig

On Thu, May 18, 2017 at 06:13:55PM -0700, Andy Lutomirski wrote:
> a) Leave the Dell quirk in place until someone from Dell or Samsung
> figures out what's actually going on.  Add a blanket quirk turning off
> the deepest sleep state on all Intel devices [1] at least until
> someone from Intel figures out what's going on -- Hi, Keith!  Deal
> with any other problems as they're reported.

I think we should just blacklist the 60p entirely.  It also seems to
corrupt data 100% reliable when used with XFS.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19 14:18     ` Keith Busch
@ 2017-05-19 14:15       ` Christoph Hellwig
  2017-05-19 18:24       ` Andy Lutomirski
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2017-05-19 14:15 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Andy Lutomirski, Jens Axboe, Sagi Grimberg,
	linux-kernel, linux-nvme, Kai-Heng Feng, Mario Limonciello,
	Christoph Hellwig

On Fri, May 19, 2017 at 10:18:34AM -0400, Keith Busch wrote:
> +
> +	if (ns->ctrl->quirks & NVME_QUIRK_PAGE_ALIGN)
> +		blk_queue_logical_block_size(ns->queue, ns->ctrl->page_size);
> +	else
> +		blk_queue_logical_block_size(ns->queue, bs);

Ugg.  That will invalidate the layout of any existing fs (in case the
device hasn't eaten it yet).  It would also invalidate things like
partitions tables, even if they are only used to share with windows.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19  6:35   ` Christoph Hellwig
@ 2017-05-19 14:18     ` Keith Busch
  2017-05-19 14:15       ` Christoph Hellwig
  2017-05-19 18:24       ` Andy Lutomirski
  0 siblings, 2 replies; 17+ messages in thread
From: Keith Busch @ 2017-05-19 14:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andy Lutomirski, Jens Axboe, Sagi Grimberg, linux-kernel,
	linux-nvme, Kai-Heng Feng, Mario Limonciello, Christoph Hellwig

On Thu, May 18, 2017 at 11:35:05PM -0700, Christoph Hellwig wrote:
> On Thu, May 18, 2017 at 06:13:55PM -0700, Andy Lutomirski wrote:
> > a) Leave the Dell quirk in place until someone from Dell or Samsung
> > figures out what's actually going on.  Add a blanket quirk turning off
> > the deepest sleep state on all Intel devices [1] at least until
> > someone from Intel figures out what's going on -- Hi, Keith!  Deal
> > with any other problems as they're reported.
> 
> I think we should just blacklist the 60p entirely.  It also seems to
> corrupt data 100% reliable when used with XFS.

I assume you're talking about the 600p/p3100. That family of devices
prefer 4k alignment, and patch below will enforce that, fixing all
access issues. I wasn't planning to post it because my understanding is
an imminent f/w update will make it unnecessary.

I understand there is a different issue specific to the KBL NUC platforms
that exposes some other errata, but I don't know much about that.

For all issues, though, a f/w update fixing them is undergoing validation,
but I don't have insider information on the release date.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c5e709d..49d8070 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -964,7 +964,11 @@ static void __nvme_revalidate_disk(struct gendisk
*disk, struct nvme_id_ns *id)
 		blk_integrity_unregister(disk);
 
 	ns->pi_type = pi_type;
-	blk_queue_logical_block_size(ns->queue, bs);
+
+	if (ns->ctrl->quirks & NVME_QUIRK_PAGE_ALIGN)
+		blk_queue_logical_block_size(ns->queue, ns->ctrl->page_size);
+	else
+		blk_queue_logical_block_size(ns->queue, bs);
 
 	if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
 		nvme_init_integrity(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fda6ebb..4b6e21f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -83,6 +83,11 @@ enum nvme_quirks {
 	 * APST should not be used.
 	 */
 	NVME_QUIRK_NO_APST			= (1 << 4),
+
+	/*
+	 * Require 4k aligned IO despite smaller LBA size
+	 */
+	NVME_QUIRK_PAGE_ALIGN			= (1 << 5),
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6a2f0d3..e08d1f7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2144,6 +2144,8 @@ static const struct pci_device_id nvme_id_table[] = {
 	{ PCI_VDEVICE(INTEL, 0x0a54),
 		.driver_data = NVME_QUIRK_STRIPE_SIZE |
 				NVME_QUIRK_DISCARD_ZEROES, },
+	{ PCI_VDEVICE(INTEL, 0xf1a5),
+		.driver_data = NVME_QUIRK_PAGE_ALIGN },
 	{ PCI_VDEVICE(INTEL, 0x5845),	/* Qemu emulated controller */
 		.driver_data = NVME_QUIRK_IDENTIFY_CNS, },
 	{ PCI_DEVICE(0x1c58, 0x0003),	/* HGST adapter */
--

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19 14:18     ` Keith Busch
  2017-05-19 14:15       ` Christoph Hellwig
@ 2017-05-19 18:24       ` Andy Lutomirski
  2017-05-19 21:42         ` Keith Busch
  1 sibling, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2017-05-19 18:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Andy Lutomirski, Jens Axboe, Sagi Grimberg,
	linux-kernel, linux-nvme, Kai-Heng Feng, Mario Limonciello,
	Christoph Hellwig

On Fri, May 19, 2017 at 7:18 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Thu, May 18, 2017 at 11:35:05PM -0700, Christoph Hellwig wrote:
>> On Thu, May 18, 2017 at 06:13:55PM -0700, Andy Lutomirski wrote:
>> > a) Leave the Dell quirk in place until someone from Dell or Samsung
>> > figures out what's actually going on.  Add a blanket quirk turning off
>> > the deepest sleep state on all Intel devices [1] at least until
>> > someone from Intel figures out what's going on -- Hi, Keith!  Deal
>> > with any other problems as they're reported.
>>
>> I think we should just blacklist the 60p entirely.  It also seems to
>> corrupt data 100% reliable when used with XFS.
>
> I assume you're talking about the 600p/p3100. That family of devices
> prefer 4k alignment, and patch below will enforce that, fixing all
> access issues. I wasn't planning to post it because my understanding is
> an imminent f/w update will make it unnecessary.
>
> I understand there is a different issue specific to the KBL NUC platforms
> that exposes some other errata, but I don't know much about that.

We can quirk by firmware version.  The report said:

vid : 0x8086
ssvid : 0x8086
sn : BTPY63850F281P0H
mn : INTEL SSDPEKKW010T7
fr : PSF104C

Any chance you can check what firmware versions have the issue?

--Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe
  2017-05-19 18:24       ` Andy Lutomirski
@ 2017-05-19 21:42         ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2017-05-19 21:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-kernel,
	linux-nvme, Kai-Heng Feng, Mario Limonciello, Christoph Hellwig

On Fri, May 19, 2017 at 11:24:39AM -0700, Andy Lutomirski wrote:
> On Fri, May 19, 2017 at 7:18 AM, Keith Busch <keith.busch@intel.com> wrote:
> > On Thu, May 18, 2017 at 11:35:05PM -0700, Christoph Hellwig wrote:
> >> On Thu, May 18, 2017 at 06:13:55PM -0700, Andy Lutomirski wrote:
> >> > a) Leave the Dell quirk in place until someone from Dell or Samsung
> >> > figures out what's actually going on.  Add a blanket quirk turning off
> >> > the deepest sleep state on all Intel devices [1] at least until
> >> > someone from Intel figures out what's going on -- Hi, Keith!  Deal
> >> > with any other problems as they're reported.
> >>
> >> I think we should just blacklist the 60p entirely.  It also seems to
> >> corrupt data 100% reliable when used with XFS.
> >
> > I assume you're talking about the 600p/p3100. That family of devices
> > prefer 4k alignment, and patch below will enforce that, fixing all
> > access issues. I wasn't planning to post it because my understanding is
> > an imminent f/w update will make it unnecessary.
> >
> > I understand there is a different issue specific to the KBL NUC platforms
> > that exposes some other errata, but I don't know much about that.
> 
> We can quirk by firmware version.  The report said:
> 
> vid : 0x8086
> ssvid : 0x8086
> sn : BTPY63850F281P0H
> mn : INTEL SSDPEKKW010T7
> fr : PSF104C
> 
> Any chance you can check what firmware versions have the issue?

All publicly available f/w's may experience the reported issues. The
update that is believed to resolve all known isues is currently in the
validation process and will be released with an update tool as soon as
that completes, but I have not been provided a specific release date.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-05-19 21:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12  4:06 [PATCH] nvme: Change our APST table to be no more aggressive than Intel RSTe Andy Lutomirski
2017-05-12 13:58 ` Mario.Limonciello
2017-05-12 14:04   ` Andy Lutomirski
2017-05-12 14:34     ` Mario.Limonciello
2017-05-12 23:49       ` Andy Lutomirski
2017-05-15 15:51         ` Mario.Limonciello
2017-05-13 12:27 ` Andy Lutomirski
2017-05-15 16:11   ` Mario.Limonciello
2017-05-19  1:18     ` Andy Lutomirski
2017-05-19  1:13 ` Andy Lutomirski
2017-05-19  1:32   ` Mario.Limonciello
2017-05-19  1:37     ` Andy Lutomirski
2017-05-19  6:35   ` Christoph Hellwig
2017-05-19 14:18     ` Keith Busch
2017-05-19 14:15       ` Christoph Hellwig
2017-05-19 18:24       ` Andy Lutomirski
2017-05-19 21:42         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).