* Possible nvme regression in 6.4.11
@ 2023-08-16 20:39 Genes Lists
2023-08-16 21:04 ` Keith Busch
2023-08-17 3:00 ` Bagas Sanjaya
0 siblings, 2 replies; 21+ messages in thread
From: Genes Lists @ 2023-08-16 20:39 UTC (permalink / raw)
To: linux-kernel; +Cc: kbusch, axboe, sagi, linux-nvme, hch
Also reported to bugzilla [1]
Failure happens on 1 laptop with samsung ssd.
Boot log manually transcribed:
kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff,
PCI_STATUS=0xffff
kernel: nvme nvme0: Does your device have a faulty power saving mode
enabled?
kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0
pcie_aspm=off" and report a bug
kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to
D0, device inaccessible
kernel: nvme nvme0: Disabling device after reset failure: -19
mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5.
mount[353]: dmesg(1) may have more information after failed moutn
system call.
kernel: nvme0m1: detected capacity change from 2000409264 to 0
kernel: EXT4-fs (nvme0n1p5): unable to read superblock
systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
...
All kernels are upstream, untainted and compiled on Arch using:
gcc version 13.2.1
Kernels Tested:
- 6.4.10 - works fine
- 6.4.11 - fails
- 6.5-rc6 - fails
- 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails
- 6.4.11 with 1 revert below - fails
Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G
and 512G"
This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473.
Hardware:
model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
stepping : 9
microcode : 0xf4
nvme:
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
SSD Controller SM961/PM961/SM963
Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD
Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
Memory at edb00000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
Kernel driver in use: nvme
Gene
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217802
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists
@ 2023-08-16 21:04 ` Keith Busch
2023-08-17 1:30 ` Genes Lists
2023-08-17 3:00 ` Bagas Sanjaya
1 sibling, 1 reply; 21+ messages in thread
From: Keith Busch @ 2023-08-16 21:04 UTC (permalink / raw)
To: Genes Lists; +Cc: linux-kernel, axboe, sagi, linux-nvme, hch
On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote:
> Also reported to bugzilla [1]
>
> Failure happens on 1 laptop with samsung ssd.
>
> Boot log manually transcribed:
>
> kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff,
> PCI_STATUS=0xffff
> kernel: nvme nvme0: Does your device have a faulty power saving mode
> enabled?
> kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0
> pcie_aspm=off" and report a bug
> kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0,
> device inaccessible
> kernel: nvme nvme0: Disabling device after reset failure: -19
> mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5.
> mount[353]: dmesg(1) may have more information after failed moutn
> system call.
> kernel: nvme0m1: detected capacity change from 2000409264 to 0
> kernel: EXT4-fs (nvme0n1p5): unable to read superblock
> systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
> ...
>
> All kernels are upstream, untainted and compiled on Arch using:
>
> gcc version 13.2.1
>
> Kernels Tested:
> - 6.4.10 - works fine
> - 6.4.11 - fails
> - 6.5-rc6 - fails
> - 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails
> - 6.4.11 with 1 revert below - fails
>
> Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and
> 512G"
> This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473.
It sounds like you can recreate this. Since .10 worked and .11 doesn't,
could you bisect the git commits? It looks like it will take 7 steps
between those two versions.
I don't think there are any nvme specific patches that could contribute
to what you're seeing, it's more likely some lower level platform patch
if a kernel change really did cause the regression. None of the recent
commits really stood out to me, so bisect is what I'd recommend.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-16 21:04 ` Keith Busch
@ 2023-08-17 1:30 ` Genes Lists
2023-08-17 9:16 ` Genes Lists
0 siblings, 1 reply; 21+ messages in thread
From: Genes Lists @ 2023-08-17 1:30 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-kernel, axboe, sagi, linux-nvme, hch
On 8/16/23 17:04, Keith Busch wrote:
...
> It sounds like you can recreate this. Since .10 worked and .11 doesn't,
> could you bisect the git commits? It looks like it will take 7 steps
> between those two versions.
>
> I don't think there are any nvme specific patches that could contribute
> to what you're seeing, it's more likely some lower level platform patch
> if a kernel change really did cause the regression. None of the recent
> commits really stood out to me, so bisect is what I'd recommend.
Thank you
Bisect done - This is result:
----------------------------------------------------------------
69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
Author: Ricky WU <ricky_wu@realtek.com>
Date: Tue Jul 25 09:10:54 2023 +0000
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
always set to HIGH during the initialization.
Cc: stable@vger.kernel.org
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link:
https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/misc/cardreader/rts5227.c | 2 +-
drivers/misc/cardreader/rts5228.c | 18 ------------------
drivers/misc/cardreader/rts5249.c | 3 +--
drivers/misc/cardreader/rts5260.c | 18 ------------------
drivers/misc/cardreader/rts5261.c | 18 ------------------
drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
6 files changed, 6 insertions(+), 58 deletions(-)
------------------------------------------------------
And the machine does have this hardware:
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
PCI Express Card Reader (rev 01)
Subsystem: Dell RTS525A PCI Express Card Reader
Physical Slot: 1
Flags: bus master, fast devsel, latency 0, IRQ 141
Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [b0] Express Endpoint, MSI 00
Kernel driver in use: rtsx_pci
Kernel modules: rtsx_pci
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists
2023-08-16 21:04 ` Keith Busch
@ 2023-08-17 3:00 ` Bagas Sanjaya
2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis)
1 sibling, 1 reply; 21+ messages in thread
From: Bagas Sanjaya @ 2023-08-17 3:00 UTC (permalink / raw)
To: Genes Lists, linux-kernel, Ricky WU, Arnd Bergmann
Cc: kbusch, axboe, sagi, linux-nvme, hch, Linux Regressions
[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]
On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote:
>
> Also reported to bugzilla [1]
>
> Failure happens on 1 laptop with samsung ssd.
>
> Boot log manually transcribed:
>
> kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff,
> PCI_STATUS=0xffff
> kernel: nvme nvme0: Does your device have a faulty power saving mode
> enabled?
> kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0
> pcie_aspm=off" and report a bug
> kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0,
> device inaccessible
> kernel: nvme nvme0: Disabling device after reset failure: -19
> mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5.
> mount[353]: dmesg(1) may have more information after failed moutn
> system call.
> kernel: nvme0m1: detected capacity change from 2000409264 to 0
> kernel: EXT4-fs (nvme0n1p5): unable to read superblock
> systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
> ...
>
> All kernels are upstream, untainted and compiled on Arch using:
>
> gcc version 13.2.1
>
> Kernels Tested:
> - 6.4.10 - works fine
> - 6.4.11 - fails
> - 6.5-rc6 - fails
> - 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails
> - 6.4.11 with 1 revert below - fails
>
> Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and
> 512G"
> This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473.
>
> Hardware:
> model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> stepping : 9
> microcode : 0xf4
>
> nvme:
> 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
> Controller SM961/PM961/SM963
> Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD
> Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
> Memory at edb00000 (64-bit, non-prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
> Capabilities: [70] Express Endpoint, MSI 00
> Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
> Kernel driver in use: nvme
>
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: 101bd907b4244a
#regzbot title: can't change Samsung SSD power state due to ASPM mode checking
#regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217802
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-17 1:30 ` Genes Lists
@ 2023-08-17 9:16 ` Genes Lists
2023-08-17 17:28 ` Keith Busch
2023-08-23 17:41 ` Keith Busch
0 siblings, 2 replies; 21+ messages in thread
From: Genes Lists @ 2023-08-17 9:16 UTC (permalink / raw)
To: Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On 8/16/23 21:30, Genes Lists wrote:
> On 8/16/23 17:04, Keith Busch wrote:
> ...
>> It sounds like you can recreate this. Since .10 worked and .11 doesn't,
>> could you bisect the git commits? It looks like it will take 7 steps
>> between those two versions.
>>
>> I don't think there are any nvme specific patches that could contribute
>> to what you're seeing, it's more likely some lower level platform patch
>> if a kernel change really did cause the regression. None of the recent
>> commits really stood out to me, so bisect is what I'd recommend.
>
> Thank you
>
> Bisect done - This is result:
>
> ----------------------------------------------------------------
> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
> Author: Ricky WU <ricky_wu@realtek.com>
> Date: Tue Jul 25 09:10:54 2023 +0000
>
> misc: rtsx: judge ASPM Mode to set PETXCFG Reg
>
> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
>
> ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
> always set to HIGH during the initialization.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
> Link:
> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> drivers/misc/cardreader/rts5227.c | 2 +-
> drivers/misc/cardreader/rts5228.c | 18 ------------------
> drivers/misc/cardreader/rts5249.c | 3 +--
> drivers/misc/cardreader/rts5260.c | 18 ------------------
> drivers/misc/cardreader/rts5261.c | 18 ------------------
> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
> 6 files changed, 6 insertions(+), 58 deletions(-)
>
> ------------------------------------------------------
>
> And the machine does have this hardware:
>
> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
> PCI Express Card Reader (rev 01)
> Subsystem: Dell RTS525A PCI Express Card Reader
> Physical Slot: 1
> Flags: bus master, fast devsel, latency 0, IRQ 141
> Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [80] Power Management version 3
> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [b0] Express Endpoint, MSI 00
> Kernel driver in use: rtsx_pci
> Kernel modules: rtsx_pci
>
>
>
Adding to CC list since bisect landed on
drivers/misc/cardreader/rtsx_pcr.c
Thread starts here: https://lkml.org/lkml/2023/8/16/1154
Thank you,
gene
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-17 9:16 ` Genes Lists
@ 2023-08-17 17:28 ` Keith Busch
2023-08-17 17:43 ` Genes Lists
2023-08-23 17:41 ` Keith Busch
1 sibling, 1 reply; 21+ messages in thread
From: Keith Busch @ 2023-08-17 17:28 UTC (permalink / raw)
To: Genes Lists
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote:
> On 8/16/23 21:30, Genes Lists wrote:
> > On 8/16/23 17:04, Keith Busch wrote:
> > ...
> > > It sounds like you can recreate this. Since .10 worked and .11 doesn't,
> > > could you bisect the git commits? It looks like it will take 7 steps
> > > between those two versions.
> > >
> > > I don't think there are any nvme specific patches that could contribute
> > > to what you're seeing, it's more likely some lower level platform patch
> > > if a kernel change really did cause the regression. None of the recent
> > > commits really stood out to me, so bisect is what I'd recommend.
> >
> > Thank you
> >
> > Bisect done - This is result:
Sounds like the driver's ASPM suspicion was justified, however the
recommended work-around doesn't appear to apply to this hardware.
Thanks for running the bisect!
> > ----------------------------------------------------------------
> > 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
> > commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
> > Author: Ricky WU <ricky_wu@realtek.com>
> > Date: Tue Jul 25 09:10:54 2023 +0000
> >
> > misc: rtsx: judge ASPM Mode to set PETXCFG Reg
> >
> > commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
> >
> > ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
> > to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
> > always set to HIGH during the initialization.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
> > Link:
> > https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >
> > drivers/misc/cardreader/rts5227.c | 2 +-
> > drivers/misc/cardreader/rts5228.c | 18 ------------------
> > drivers/misc/cardreader/rts5249.c | 3 +--
> > drivers/misc/cardreader/rts5260.c | 18 ------------------
> > drivers/misc/cardreader/rts5261.c | 18 ------------------
> > drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
> > 6 files changed, 6 insertions(+), 58 deletions(-)
> >
> > ------------------------------------------------------
> >
> > And the machine does have this hardware:
> >
> > 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
> > PCI Express Card Reader (rev 01)
> > Subsystem: Dell RTS525A PCI Express Card Reader
> > Physical Slot: 1
> > Flags: bus master, fast devsel, latency 0, IRQ 141
> > Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
> > Capabilities: [80] Power Management version 3
> > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > Capabilities: [b0] Express Endpoint, MSI 00
> > Kernel driver in use: rtsx_pci
> > Kernel modules: rtsx_pci
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-17 17:28 ` Keith Busch
@ 2023-08-17 17:43 ` Genes Lists
0 siblings, 0 replies; 21+ messages in thread
From: Genes Lists @ 2023-08-17 17:43 UTC (permalink / raw)
To: Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On 8/17/23 13:28, Keith Busch wrote:
...
>>>
>>> Bisect done - This is result:
>
> Sounds like the driver's ASPM suspicion was justified, however the
> recommended work-around doesn't appear to apply to this hardware.
> Thanks for running the bisect!
Happy to help :)
gene
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-17 9:16 ` Genes Lists
2023-08-17 17:28 ` Keith Busch
@ 2023-08-23 17:41 ` Keith Busch
2023-08-23 20:25 ` Genes Lists
2023-08-24 11:29 ` Genes Lists
1 sibling, 2 replies; 21+ messages in thread
From: Keith Busch @ 2023-08-23 17:41 UTC (permalink / raw)
To: Genes Lists
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote:
> > ----------------------------------------------------------------
> > 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
> > commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
> > Author: Ricky WU <ricky_wu@realtek.com>
> > Date: Tue Jul 25 09:10:54 2023 +0000
> >
> > misc: rtsx: judge ASPM Mode to set PETXCFG Reg
> >
> > commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
> >
> > ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
> > to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
> > always set to HIGH during the initialization.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
> > Link:
> > https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >
> > drivers/misc/cardreader/rts5227.c | 2 +-
> > drivers/misc/cardreader/rts5228.c | 18 ------------------
> > drivers/misc/cardreader/rts5249.c | 3 +--
> > drivers/misc/cardreader/rts5260.c | 18 ------------------
> > drivers/misc/cardreader/rts5261.c | 18 ------------------
> > drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
> > 6 files changed, 6 insertions(+), 58 deletions(-)
> >
> > ------------------------------------------------------
> >
> > And the machine does have this hardware:
> >
> > 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
> > PCI Express Card Reader (rev 01)
> > Subsystem: Dell RTS525A PCI Express Card Reader
> > Physical Slot: 1
> > Flags: bus master, fast devsel, latency 0, IRQ 141
> > Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
> > Capabilities: [80] Power Management version 3
> > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > Capabilities: [b0] Express Endpoint, MSI 00
> > Kernel driver in use: rtsx_pci
> > Kernel modules: rtsx_pci
> >
> >
> >
>
>
> Adding to CC list since bisect landed on
>
> drivers/misc/cardreader/rtsx_pcr.c
>
> Thread starts here: https://lkml.org/lkml/2023/8/16/1154
I realize you can work around this by blacklisting the rtsx_pci, but
that's not a pleasant solution. With only a few days left in 6.5, should
the commit just be reverted?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-23 17:41 ` Keith Busch
@ 2023-08-23 20:25 ` Genes Lists
2023-08-24 2:44 ` Ricky WU
2023-08-24 11:29 ` Genes Lists
1 sibling, 1 reply; 21+ messages in thread
From: Genes Lists @ 2023-08-23 20:25 UTC (permalink / raw)
To: Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On 8/23/23 13:41, Keith Busch wrote:
> On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote:
>>> ----------------------------------------------------------------
>>> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
>>> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
>>> Author: Ricky WU <ricky_wu@realtek.com>
>>> Date: Tue Jul 25 09:10:54 2023 +0000
>>>
>>> misc: rtsx: judge ASPM Mode to set PETXCFG Reg
>>>
>>> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
>>>
>>> ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
>>> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
>>> always set to HIGH during the initialization.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
>>> Link:
>>> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> drivers/misc/cardreader/rts5227.c | 2 +-
>>> drivers/misc/cardreader/rts5228.c | 18 ------------------
>>> drivers/misc/cardreader/rts5249.c | 3 +--
>>> drivers/misc/cardreader/rts5260.c | 18 ------------------
>>> drivers/misc/cardreader/rts5261.c | 18 ------------------
>>> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
>>> 6 files changed, 6 insertions(+), 58 deletions(-)
>>>
>>> ------------------------------------------------------
>>>
>>> And the machine does have this hardware:
>>>
>>> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
>>> PCI Express Card Reader (rev 01)
>>> Subsystem: Dell RTS525A PCI Express Card Reader
>>> Physical Slot: 1
>>> Flags: bus master, fast devsel, latency 0, IRQ 141
>>> Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
>>> Capabilities: [80] Power Management version 3
>>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> Capabilities: [b0] Express Endpoint, MSI 00
>>> Kernel driver in use: rtsx_pci
>>> Kernel modules: rtsx_pci
>>>
>>>
>>>
>>
>>
>> Adding to CC list since bisect landed on
>>
>> drivers/misc/cardreader/rtsx_pcr.c
>>
>> Thread starts here: https://lkml.org/lkml/2023/8/16/1154
>
> I realize you can work around this by blacklisting the rtsx_pci, but
> that's not a pleasant solution. With only a few days left in 6.5, should
> the commit just be reverted?
Keith - thanks for reminder.
The card reader device itself is non-critical and very low priority.
What perhaps is a little more worrisome is the change in rtsx somehow
prevented nvme from functioning normally and the machine then not
booting (at least for some combination(s) of hardware).
If there is a simple fix to prevent nvme from being impacted by the rtsx
driver that would be more than sufficient?
On the other hand 6.4.11 is out, and I'm guessing there isn't a lot of
noise on this either. From what I've seen, 1 other user with same
problem [1] and 1 with same card reader not having a problema [2].
And no 'me-too's in the kernel bugzilla [3] either.
Gene
[1] https://bbs.archlinux.org/viewtopic.php?id=288095
[2] https://bugs.archlinux.org/task/79439
[3] https://bugzilla.kernel.org/show_bug.cgi?id=217802
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Possible nvme regression in 6.4.11
2023-08-23 20:25 ` Genes Lists
@ 2023-08-24 2:44 ` Ricky WU
2023-08-24 9:48 ` Genes Lists
0 siblings, 1 reply; 21+ messages in thread
From: Ricky WU @ 2023-08-24 2:44 UTC (permalink / raw)
To: Genes Lists, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
Hi Gene,
I can't reproduce this issue on my side...
So if you only revert this patch (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine?
This patch only do is pull our clock request to HIGH if HOST need also can pull to LOW, and this only do on our device
I don’t think this will affect other ports...
BR,
Ricky
> -----Original Message-----
> From: Genes Lists <lists@sapience.com>
> Sent: Thursday, August 24, 2023 4:25 AM
> To: Keith Busch <kbusch@kernel.org>
> Cc: linux-kernel@vger.kernel.org; axboe@kernel.dk; sagi@grimberg.me;
> linux-nvme@lists.infradead.org; hch@lst.de; arnd@arndb.de; Ricky WU
> <ricky_wu@realtek.com>; gregkh@linuxfoundation.org
> Subject: Re: Possible nvme regression in 6.4.11
>
>
> External mail.
>
>
>
> On 8/23/23 13:41, Keith Busch wrote:
> > On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote:
> >>> ----------------------------------------------------------------
> >>> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
> >>> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
> >>> Author: Ricky WU <ricky_wu@realtek.com>
> >>> Date: Tue Jul 25 09:10:54 2023 +0000
> >>>
> >>> misc: rtsx: judge ASPM Mode to set PETXCFG Reg
> >>>
> >>> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
> >>>
> >>> ASPM Mode is ASPM_MODE_CFG need to judge the value of
> clkreq_0
> >>> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
> >>> always set to HIGH during the initialization.
> >>>
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
> >>> Link:
> >>>
> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
> >>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>>
> >>> drivers/misc/cardreader/rts5227.c | 2 +-
> >>> drivers/misc/cardreader/rts5228.c | 18 ------------------
> >>> drivers/misc/cardreader/rts5249.c | 3 +--
> >>> drivers/misc/cardreader/rts5260.c | 18 ------------------
> >>> drivers/misc/cardreader/rts5261.c | 18 ------------------
> >>> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
> >>> 6 files changed, 6 insertions(+), 58 deletions(-)
> >>>
> >>> ------------------------------------------------------
> >>>
> >>> And the machine does have this hardware:
> >>>
> >>> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
> >>> RTS525A PCI Express Card Reader (rev 01)
> >>> Subsystem: Dell RTS525A PCI Express Card Reader
> >>> Physical Slot: 1
> >>> Flags: bus master, fast devsel, latency 0, IRQ 141
> >>> Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
> >>> Capabilities: [80] Power Management version 3
> >>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >>> Capabilities: [b0] Express Endpoint, MSI 00
> >>> Kernel driver in use: rtsx_pci
> >>> Kernel modules: rtsx_pci
> >>>
> >>>
> >>>
> >>
> >>
> >> Adding to CC list since bisect landed on
> >>
> >> drivers/misc/cardreader/rtsx_pcr.c
> >>
> >> Thread starts here: https://lkml.org/lkml/2023/8/16/1154
> >
> > I realize you can work around this by blacklisting the rtsx_pci, but
> > that's not a pleasant solution. With only a few days left in 6.5,
> > should the commit just be reverted?
>
> Keith - thanks for reminder.
>
> The card reader device itself is non-critical and very low priority.
>
> What perhaps is a little more worrisome is the change in rtsx somehow
> prevented nvme from functioning normally and the machine then not booting
> (at least for some combination(s) of hardware).
>
> If there is a simple fix to prevent nvme from being impacted by the rtsx driver
> that would be more than sufficient?
>
> On the other hand 6.4.11 is out, and I'm guessing there isn't a lot of noise on
> this either. From what I've seen, 1 other user with same problem [1] and 1 with
> same card reader not having a problema [2].
> And no 'me-too's in the kernel bugzilla [3] either.
>
>
> Gene
>
>
> [1] https://bbs.archlinux.org/viewtopic.php?id=288095
> [2] https://bugs.archlinux.org/task/79439
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=217802
>
>
>
>
> ------Please consider the environment before printing this e-mail.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-24 2:44 ` Ricky WU
@ 2023-08-24 9:48 ` Genes Lists
2023-08-24 10:22 ` Genes Lists
0 siblings, 1 reply; 21+ messages in thread
From: Genes Lists @ 2023-08-24 9:48 UTC (permalink / raw)
To: Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
On 8/23/23 22:44, Ricky WU wrote:
> Hi Gene,
>
> I can't reproduce this issue on my side...
>
> So if you only revert this patch (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine?
> This patch only do is pull our clock request to HIGH if HOST need also can pull to LOW, and this only do on our device
> I don’t think this will affect other ports...
>
> BR,
> Ricky
Thanks Ricky - I will test revering just that commit and report back. I
wont be able to get to it till later today (sometime after 2pm EDT) but
I will do it today.
FYI, i see one mpre report of someone experiencing same problem [1]
gene
[1] https://bugs.archlinux.org/task/79439
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-24 9:48 ` Genes Lists
@ 2023-08-24 10:22 ` Genes Lists
2023-08-25 12:51 ` Ricky WU
0 siblings, 1 reply; 21+ messages in thread
From: Genes Lists @ 2023-08-24 10:22 UTC (permalink / raw)
To: Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
On 8/24/23 05:48, Genes Lists wrote:
> On 8/23/23 22:44, Ricky WU wrote:
>> Hi Gene,
>>
>> I can't reproduce this issue on my side...
>>
>> So if you only revert this patch
>> (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine?
>> This patch only do is pull our clock request to HIGH if HOST need also
>> can pull to LOW, and this only do on our device
>> I don’t think this will affect other ports...
>>
>> BR,
>> Ricky
>
> Thanks Ricky - I will test revering just that commit and report back. I
> wont be able to get to it till later today (sometime after 2pm EDT) but
> I will do it today.
>
> FYI, i see one mpre report of someone experiencing same problem [1]
>
> gene
>
> [1] https://bugs.archlinux.org/task/79439
>
>
That commit was what was reverted in the last step of the git bisect -
and indeed reverting that commit makes the problem go away and machine
then boots fine.
thanks
gene
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-23 17:41 ` Keith Busch
2023-08-23 20:25 ` Genes Lists
@ 2023-08-24 11:29 ` Genes Lists
1 sibling, 0 replies; 21+ messages in thread
From: Genes Lists @ 2023-08-24 11:29 UTC (permalink / raw)
To: Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh
On 8/23/23 13:41, Keith Busch wrote:
> On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote:
>>> ----------------------------------------------------------------
>>> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit
>>> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
>>> Author: Ricky WU <ricky_wu@realtek.com>
>>> Date: Tue Jul 25 09:10:54 2023 +0000
>>>
>>> misc: rtsx: judge ASPM Mode to set PETXCFG Reg
>>>
>>> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
>>>
>>> ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
>>> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
>>> always set to HIGH during the initialization.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
>>> Link:
>>> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> drivers/misc/cardreader/rts5227.c | 2 +-
>>> drivers/misc/cardreader/rts5228.c | 18 ------------------
>>> drivers/misc/cardreader/rts5249.c | 3 +--
>>> drivers/misc/cardreader/rts5260.c | 18 ------------------
>>> drivers/misc/cardreader/rts5261.c | 18 ------------------
>>> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
>>> 6 files changed, 6 insertions(+), 58 deletions(-)
>>>
>>> ------------------------------------------------------
>>>
>>> And the machine does have this hardware:
>>>
>>> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A
>>> PCI Express Card Reader (rev 01)
>>> Subsystem: Dell RTS525A PCI Express Card Reader
>>> Physical Slot: 1
>>> Flags: bus master, fast devsel, latency 0, IRQ 141
>>> Memory at ed100000 (32-bit, non-prefetchable) [size=4K]
>>> Capabilities: [80] Power Management version 3
>>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> Capabilities: [b0] Express Endpoint, MSI 00
>>> Kernel driver in use: rtsx_pci
>>> Kernel modules: rtsx_pci
>>>
>>>
>>>
>>
>>
>> Adding to CC list since bisect landed on
>>
>> drivers/misc/cardreader/rtsx_pcr.c
>>
>> Thread starts here: https://lkml.org/lkml/2023/8/16/1154
>
> I realize you can work around this by blacklisting the rtsx_pci, but
> that's not a pleasant solution. With only a few days left in 6.5, should
> the commit just be reverted?
Looks like here are more people having same problem than I was aware of
earlier [1].
My recommendation now is to revert this.
thanks
gene
[1] https://bugs.archlinux.org/task/79439#comment221262
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Possible nvme regression in 6.4.11
2023-08-24 10:22 ` Genes Lists
@ 2023-08-25 12:51 ` Ricky WU
2023-08-30 21:09 ` Genes Lists
0 siblings, 1 reply; 21+ messages in thread
From: Ricky WU @ 2023-08-25 12:51 UTC (permalink / raw)
To: Genes Lists, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
>On 8/24/23 05:48, Genes Lists wrote:
>> On 8/23/23 22:44, Ricky WU wrote:
>>> Hi Gene,
>>>
>>> I can't reproduce this issue on my side...
>>>
>>> So if you only revert this patch
>>> (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine?
>>> This patch only do is pull our clock request to HIGH if HOST need also
>>> can pull to LOW, and this only do on our device
>>> I don’t think this will affect other ports...
>>>
>>> BR,
>>> Ricky
>>
>> Thanks Ricky - I will test revering just that commit and report back. I
>> wont be able to get to it till later today (sometime after 2pm EDT) but
>> I will do it today.
>>
>> FYI, i see one mpre report of someone experiencing same problem [1]
>>
>> gene
>>
> > [1] https://bugs.archlinux.org/task/79439
>>
>>
>
>That commit was what was reverted in the last step of the git bisect -
>and indeed reverting that commit makes the problem go away and machine
>then boots fine.
>
>thanks
>
>gene
I think maybe it is a system power saving issue....
In the past if the BIOS(config space) not set L1-substate, our driver will keep drive low CLKREQ# when HOST want to enter power saving state that make whole system not enter the power saving state.
But this patch we release the CLKREQ# to HOST, make whole system can enter power saving state success when the HOST want to enter the power saving state, but I don't know why your system can not wake out success from power saving stat on the platform
Ricky
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-25 12:51 ` Ricky WU
@ 2023-08-30 21:09 ` Genes Lists
2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 21+ messages in thread
From: Genes Lists @ 2023-08-30 21:09 UTC (permalink / raw)
To: Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
...
> I think maybe it is a system power saving issue....
> In the past if the BIOS(config space) not set L1-substate, our driver will keep drive low CLKREQ# when HOST want to enter power saving state that make whole system not enter the power saving state.
> But this patch we release the CLKREQ# to HOST, make whole system can enter power saving state success when the HOST want to enter the power saving state, but I don't know why your system can not wake out success from power saving stat on the platform
>
> Ricky
>
Hi
Thanks for continuing to look into this. Can you share your thoughts
on best way to proceed going forward - do you plan to revert or
something else?
thanks
gene
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-30 21:09 ` Genes Lists
@ 2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11 11:38 ` Thorsten Leemhuis
2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato
0 siblings, 2 replies; 21+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-11 8:02 UTC (permalink / raw)
To: Genes Lists, Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh,
Linux kernel regressions list
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.08.23 23:09, Genes Lists wrote:
> ...
>> I think maybe it is a system power saving issue....
>> In the past if the BIOS(config space) not set L1-substate, our driver
>> will keep drive low CLKREQ# when HOST want to enter power saving state
>> that make whole system not enter the power saving state.
>> But this patch we release the CLKREQ# to HOST, make whole system can
>> enter power saving state success when the HOST want to enter the power
>> saving state, but I don't know why your system can not wake out
>> success from power saving stat on the platform
>
> Thanks for continuing to look into this. Can you share your thoughts
> on best way to proceed going forward - do you plan to revert or
> something else?
Hmmm. This looks like it fell through the cracks. Or am I missing something?
Anyway, 6.4.y will likely be EOL in a week or two. Which bears the
question: are 6.5.y and 6.6-rc1 working better for you? From the
bugzilla ticket (https://bugzilla.kernel.org/show_bug.cgi?id=217802) and
comments from others that are affected it sounds like that's not the
case. If that's how it is I guess it overdue that the 101bd907b4244a
("misc: rtsx: judge ASPM Mode to set PETXCFG Reg") is reverted. Or am I
missing something?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
#regzbot poke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-11 11:38 ` Thorsten Leemhuis
2023-09-18 17:07 ` [Revert] " Jade Lovelace
2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace
2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato
1 sibling, 2 replies; 21+ messages in thread
From: Thorsten Leemhuis @ 2023-09-11 11:38 UTC (permalink / raw)
To: Genes Lists, Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh,
Linux kernel regressions list
On 11.09.23 10:02, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
>
> On 30.08.23 23:09, Genes Lists wrote:
>> ...
>>> I think maybe it is a system power saving issue....
>>> In the past if the BIOS(config space) not set L1-substate, our driver
>>> will keep drive low CLKREQ# when HOST want to enter power saving state
>>> that make whole system not enter the power saving state.
>>> But this patch we release the CLKREQ# to HOST, make whole system can
>>> enter power saving state success when the HOST want to enter the power
>>> saving state, but I don't know why your system can not wake out
>>> success from power saving stat on the platform
>>
>> Thanks for continuing to look into this. Can you share your thoughts
>> on best way to proceed going forward - do you plan to revert or
>> something else?
>
> Hmmm. This looks like it fell through the cracks. Or am I missing something?
>
> Anyway, 6.4.y will likely be EOL in a week or two. Which bears the
> question: are 6.5.y and 6.6-rc1 working better for you? From the
> bugzilla ticket (https://bugzilla.kernel.org/show_bug.cgi?id=217802) and
> comments from others that are affected it sounds like that's not the
> case. If that's how it is I guess it overdue that the 101bd907b4244a
> ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg") is reverted. Or am I
> missing something?
According to feedback in bugzilla.kernel.org 6.5.y is affected as well.
And openSUSE apparently reverted the culprit about a week ago due to the
problems it causes:
https://bugzilla.suse.com/show_bug.cgi?id=1214428
Guess that means we should do the same for mainline with a CC:
stable@... tag.
Ricky WU, or do you have a better idea? Yes, from earlier in the thread
the root of the problem might not be in the patch you contributed, but
it exposes the problem, hence it should be reverted unless a better
solution can be found quickly. And that hasn't happened in the past two
weeks, hence it's afaics time for a revert.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11 11:38 ` Thorsten Leemhuis
@ 2023-09-11 15:41 ` Augusto Zanellato
1 sibling, 0 replies; 21+ messages in thread
From: Augusto Zanellato @ 2023-09-11 15:41 UTC (permalink / raw)
To: Linux regressions mailing list, Genes Lists, Ricky WU, Keith Busch
Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, gregkh
Hi,
I'm also experiencing the issue described in this thread, just wanted to
chime in with regards to
> Anyway, 6.4.y will likely be EOL in a week or two. Which bears the
> question: are 6.5.y and 6.6-rc1 working better for you?
I can confirm that the issue is still happening and preventing correct
boot on both 6.5.2 (Arch Linux package version 6.5.2.arch1-1) and on
6.6-rc1 (Arch Linux package linux-mainline built from AUR).
For reference: the affected machine is a Dell XPS 15 9560 with the
latest firmware revision (1.31.0) as of time of writing this email, the
NVMe drive is a Sabrent Rocket4 1TB drive, but the issue also happens
with the OEM provided Toshiba XG4.
Thanks,
Augusto
PS: it's my first time here in LKML, feel free to tell me if I did
anything wrong :)
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Revert] Re: Possible nvme regression in 6.4.11
2023-09-11 11:38 ` Thorsten Leemhuis
@ 2023-09-18 17:07 ` Jade Lovelace
2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace
1 sibling, 0 replies; 21+ messages in thread
From: Jade Lovelace @ 2023-09-18 17:07 UTC (permalink / raw)
To: Gene, Ricky WU, Keith Busch, Thorsten Leemhuis, linux-kernel
Cc: regressions, Alyssa Ross, Michal Suchanek, axboe @ kernel . dk ,
sagi @ grimberg . me , linux-nvme @ lists . infradead . org ,
hch, arnd, gregkh, stable
This regression affects all copies of the Dell XPS 15 9560 and Dell Precision
5520 with any SSD including aftermarket ones.
Per the bugzilla discussion here:
https://bugzilla.kernel.org/show_bug.cgi?id=217802 this regression has
been confirmed to also affect 6.5.2 and 6.6-rc1, and affects several
distros.
Known affected branches: 6.1, 6.4, 6.5, 6.6.
It has already been reverted by OpenSUSE and is soon to be reverted in
NixOS. A patch follows, hopefully with the right metadata tags.
I have compiled a kernel 6.1 with the revert and confirmed it now boots
again.
p.s. this is my first time pointing git-send-email at this particular
list, so I'm sorry if I got anything wrong.
Jade
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg"
2023-09-11 11:38 ` Thorsten Leemhuis
2023-09-18 17:07 ` [Revert] " Jade Lovelace
@ 2023-09-18 17:07 ` Jade Lovelace
1 sibling, 0 replies; 21+ messages in thread
From: Jade Lovelace @ 2023-09-18 17:07 UTC (permalink / raw)
To: Gene, Ricky WU, Keith Busch, Thorsten Leemhuis, linux-kernel
Cc: regressions, Alyssa Ross, Michal Suchanek, axboe @ kernel . dk ,
sagi @ grimberg . me , linux-nvme @ lists . infradead . org ,
hch, arnd, gregkh, stable, Gene
This reverts commit 101bd907b4244a726980ee67f95ed9cafab6ff7a.
This commit causes the NVMe controller to not work on the Dell XPS 15
9560, and similar laptop models. It appears to happen with any SSD
model.
This commit is broken on 6.1, 6.4, 6.5, and 6.6-rc1.
OpenSUSE has already reverted, and I have submitted a revert to NixOS.
As far as I can tell, this regression has fallen through the cracks.
Symptom:
kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
kernel: nvme nvme0: Disabling device after reset failure: -19
systemd-cryptsetup[169]: Device /dev/disk/by-uuid/b80aedf8-ddd4-46fa-8d09-5215d5f286b9 READ lock released.
systemd-cryptsetup[169]: IO error while decrypting keyslot.
systemd-cryptsetup[169]: Keyslot 0 (luks2) open failed with -5.
systemd-cryptsetup[169]: Keyslot open failed.
systemd-cryptsetup[169]: Failed to activate with specified passphrase: Input/output error
There are several downstream bugs, these are the ones I know of:
- https://bugzilla.suse.com/show_bug.cgi?id=1214428
- https://github.com/NixOS/nixpkgs/issues/253418
- https://bugs.archlinux.org/task/79439#comment221866
Upstream revert links:
- https://github.com/openSUSE/kernel-source/commit/1b02b1528a26f4e9b577e215c114d8c5e773ee10
- https://github.com/NixOS/nixpkgs/pull/255824
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217802
Reported-and-bisected-by: Gene <geneslists@sapience.com>
Link: https://lore.kernel.org/lkml/30b69186-5a6e-4f53-b24c-2221926fc3b4@sapience.com/
Signed-off-by: Jade Lovelace <lists@jade.fyi>
---
drivers/misc/cardreader/rts5227.c | 2 +-
drivers/misc/cardreader/rts5228.c | 18 ++++++++++++++++++
drivers/misc/cardreader/rts5249.c | 3 ++-
drivers/misc/cardreader/rts5260.c | 18 ++++++++++++++++++
drivers/misc/cardreader/rts5261.c | 18 ++++++++++++++++++
drivers/misc/cardreader/rtsx_pcr.c | 5 +----
6 files changed, 58 insertions(+), 6 deletions(-)
diff --git a/drivers/misc/cardreader/rts5227.c b/drivers/misc/cardreader/rts5227.c
index 3dae5e3a1697..d676cf63a966 100644
--- a/drivers/misc/cardreader/rts5227.c
+++ b/drivers/misc/cardreader/rts5227.c
@@ -195,7 +195,7 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr)
}
}
- if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG)
+ if (option->force_clkreq_0)
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG,
FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW);
else
diff --git a/drivers/misc/cardreader/rts5228.c b/drivers/misc/cardreader/rts5228.c
index f4ab09439da7..cfebad51d1d8 100644
--- a/drivers/misc/cardreader/rts5228.c
+++ b/drivers/misc/cardreader/rts5228.c
@@ -435,10 +435,17 @@ static void rts5228_init_from_cfg(struct rtsx_pcr *pcr)
option->ltr_enabled = false;
}
}
+
+ if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN
+ | PM_L1_1_EN | PM_L1_2_EN))
+ option->force_clkreq_0 = false;
+ else
+ option->force_clkreq_0 = true;
}
static int rts5228_extra_init_hw(struct rtsx_pcr *pcr)
{
+ struct rtsx_cr_option *option = &pcr->option;
rtsx_pci_write_register(pcr, RTS5228_AUTOLOAD_CFG1,
CD_RESUME_EN_MASK, CD_RESUME_EN_MASK);
@@ -469,6 +476,17 @@ static int rts5228_extra_init_hw(struct rtsx_pcr *pcr)
else
rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00);
+ /*
+ * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced
+ * to drive low, and we forcibly request clock.
+ */
+ if (option->force_clkreq_0)
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW);
+ else
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH);
+
rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB);
if (pcr->rtd3_en) {
diff --git a/drivers/misc/cardreader/rts5249.c b/drivers/misc/cardreader/rts5249.c
index 47ab72a43256..91d240dd68fa 100644
--- a/drivers/misc/cardreader/rts5249.c
+++ b/drivers/misc/cardreader/rts5249.c
@@ -327,11 +327,12 @@ static int rts5249_extra_init_hw(struct rtsx_pcr *pcr)
}
}
+
/*
* If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced
* to drive low, and we forcibly request clock.
*/
- if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG)
+ if (option->force_clkreq_0)
rtsx_pci_write_register(pcr, PETXCFG,
FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW);
else
diff --git a/drivers/misc/cardreader/rts5260.c b/drivers/misc/cardreader/rts5260.c
index 79b18f6f73a8..9b42b20a3e5a 100644
--- a/drivers/misc/cardreader/rts5260.c
+++ b/drivers/misc/cardreader/rts5260.c
@@ -517,10 +517,17 @@ static void rts5260_init_from_cfg(struct rtsx_pcr *pcr)
option->ltr_enabled = false;
}
}
+
+ if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN
+ | PM_L1_1_EN | PM_L1_2_EN))
+ option->force_clkreq_0 = false;
+ else
+ option->force_clkreq_0 = true;
}
static int rts5260_extra_init_hw(struct rtsx_pcr *pcr)
{
+ struct rtsx_cr_option *option = &pcr->option;
/* Set mcu_cnt to 7 to ensure data can be sampled properly */
rtsx_pci_write_register(pcr, 0xFC03, 0x7F, 0x07);
@@ -539,6 +546,17 @@ static int rts5260_extra_init_hw(struct rtsx_pcr *pcr)
rts5260_init_hw(pcr);
+ /*
+ * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced
+ * to drive low, and we forcibly request clock.
+ */
+ if (option->force_clkreq_0)
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW);
+ else
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH);
+
rtsx_pci_write_register(pcr, pcr->reg_pm_ctrl3, 0x10, 0x00);
return 0;
diff --git a/drivers/misc/cardreader/rts5261.c b/drivers/misc/cardreader/rts5261.c
index 94af6bf8a25a..b1e76030cafd 100644
--- a/drivers/misc/cardreader/rts5261.c
+++ b/drivers/misc/cardreader/rts5261.c
@@ -498,10 +498,17 @@ static void rts5261_init_from_cfg(struct rtsx_pcr *pcr)
option->ltr_enabled = false;
}
}
+
+ if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN
+ | PM_L1_1_EN | PM_L1_2_EN))
+ option->force_clkreq_0 = false;
+ else
+ option->force_clkreq_0 = true;
}
static int rts5261_extra_init_hw(struct rtsx_pcr *pcr)
{
+ struct rtsx_cr_option *option = &pcr->option;
u32 val;
rtsx_pci_write_register(pcr, RTS5261_AUTOLOAD_CFG1,
@@ -547,6 +554,17 @@ static int rts5261_extra_init_hw(struct rtsx_pcr *pcr)
else
rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00);
+ /*
+ * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced
+ * to drive low, and we forcibly request clock.
+ */
+ if (option->force_clkreq_0)
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW);
+ else
+ rtsx_pci_write_register(pcr, PETXCFG,
+ FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH);
+
rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB);
if (pcr->rtd3_en) {
diff --git a/drivers/misc/cardreader/rtsx_pcr.c b/drivers/misc/cardreader/rtsx_pcr.c
index a3f4b52bb159..32b7783e9d4f 100644
--- a/drivers/misc/cardreader/rtsx_pcr.c
+++ b/drivers/misc/cardreader/rtsx_pcr.c
@@ -1326,11 +1326,8 @@ static int rtsx_pci_init_hw(struct rtsx_pcr *pcr)
return err;
}
- if (pcr->aspm_mode == ASPM_MODE_REG) {
+ if (pcr->aspm_mode == ASPM_MODE_REG)
rtsx_pci_write_register(pcr, ASPM_FORCE_CTL, 0x30, 0x30);
- rtsx_pci_write_register(pcr, PETXCFG,
- FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH);
- }
/* No CD interrupt if probing driver with card inserted.
* So we need to initialize pcr->card_exist here.
--
2.42.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: Possible nvme regression in 6.4.11
2023-08-17 3:00 ` Bagas Sanjaya
@ 2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis)
0 siblings, 0 replies; 21+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 11:49 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-nvme, Linux Regressions
On 17.08.23 05:00, Bagas Sanjaya wrote:
> On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote:
> Thanks for the regression report. I'm adding it to regzbot:
>
> #regzbot ^introduced: 101bd907b4244a
> #regzbot title: can't change Samsung SSD power state due to ASPM mode checking
> #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217802
Fix is in Gregs tree and hopefully soon in mainline:
#regzbot fix: 0e4cac557531a4
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-09-29 11:49 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists
2023-08-16 21:04 ` Keith Busch
2023-08-17 1:30 ` Genes Lists
2023-08-17 9:16 ` Genes Lists
2023-08-17 17:28 ` Keith Busch
2023-08-17 17:43 ` Genes Lists
2023-08-23 17:41 ` Keith Busch
2023-08-23 20:25 ` Genes Lists
2023-08-24 2:44 ` Ricky WU
2023-08-24 9:48 ` Genes Lists
2023-08-24 10:22 ` Genes Lists
2023-08-25 12:51 ` Ricky WU
2023-08-30 21:09 ` Genes Lists
2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11 11:38 ` Thorsten Leemhuis
2023-09-18 17:07 ` [Revert] " Jade Lovelace
2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace
2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato
2023-08-24 11:29 ` Genes Lists
2023-08-17 3:00 ` Bagas Sanjaya
2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).