From mboxrd@z Thu Jan 1 00:00:00 1970 From: rjw@rjwysocki.net (Rafael J. Wysocki) Date: Thu, 25 Jul 2019 21:48:57 +0200 Subject: [Regression] Commit "nvme/pci: Use host managed power state for suspend" has problems In-Reply-To: <20190725145209.GA6949@localhost.localdomain> References: <2332799.izEFUvJP67@kreacher> <20190725145209.GA6949@localhost.localdomain> Message-ID: <2428826.VBuqOhikiK@kreacher> On Thursday, July 25, 2019 4:52:10 PM CEST Keith Busch wrote: > On Thu, Jul 25, 2019@02:51:41AM -0700, Rafael J. Wysocki wrote: > > Hi Keith, > > > > Unfortunately, > > > > commit d916b1be94b6dc8d293abed2451f3062f6af7551 > > Author: Keith Busch > > Date: Thu May 23 09:27:35 2019 -0600 > > > > nvme-pci: use host managed power state for suspend > > > > doesn't universally improve things. In fact, in some cases it makes things worse. > > > > For example, on the Dell XPS13 9380 I have here it prevents the processor package > > from reaching idle states deeper than PC2 in suspend-to-idle (which, of course, also > > prevents the SoC from reaching any kind of S0ix). > > > > That can be readily explained too. Namely, with the commit above the NVMe device > > stays in D0 over suspend/resume, so the root port it is connected to also has to stay in > > D0 and that "blocks" package C-states deeper than PC2. > > > > In order for the root port to be able to go to D3, the device connected to it also needs > > to go into D3, so it looks like (at least on this particular machine, but maybe in > > general), both D3 and the NVMe-specific PM are needed. > > > > I'm not sure what to do here, because evidently there are systems where that commit > > helps. I was thinking about adding a module option allowing the user to override the > > default behavior which in turn should be compatible with 5.2 and earlier kernels. > > Darn, that's too bad. I don't think we can improve one thing at the > expense of another, so unless we find an acceptable criteria to select > what low power mode to use, I would be inclined to support a revert or > a kernel option to default to the previous behavior. > > One thing we might check before using NVMe power states is if the lowest > PS is non-operational with MP below some threshold. What does your device > report for: > > nvme id-ctrl /dev/nvme0 NVME Identify Controller: vid : 0x1c5c ssvid : 0x1c5c sn : MS92N171312902J0N mn : PC401 NVMe SK hynix 256GB fr : 80007E00 rab : 2 ieee : ace42e cmic : 0 mdts : 5 cntlid : 1 ver : 10200 rtd3r : 7a120 rtd3e : 1e8480 oaes : 0x200 ctratt : 0 oacs : 0x17 acl : 7 aerl : 3 frmw : 0x14 lpa : 0x2 elpe : 255 npss : 4 avscc : 0x1 apsta : 0x1 wctemp : 352 cctemp : 354 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 edstt : 10 dsto : 0 fwug : 0 kas : 0 hctma : 0 mntmt : 0 mxtmt : 0 sanicap : 0 hmminds : 0 hmmaxd : 0 nsetidmax : 0 anatt : 0 anacap : 0 anagrpmax : 0 nanagrpid : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x1f fuses : 0x1 fna : 0 vwc : 0x1 awun : 7 awupf : 7 nvscc : 1 acwu : 7 sgls : 0 mnan : 0 subnqn : ioccsz : 0 iorcsz : 0 icdoff : 0 ctrattr : 0 msdbd : 0 ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:3.80W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:2.40W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0700W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0070W non-operational enlat:1000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:-